# sphinx-llms-txt Documentation
## Table of Contents
* [Main Document](#index)
* [Changelog](#changelog)
* [Contributing](#contributing)
* [Getting Started](#getting-started)
* [Configuration Values](#configuration-values)
* [Advanced Configuration](#advanced-configuration)
## Main Document
# Sphinx llms.txt Generator
A [Sphinx](http://sphinx-doc.org/) extension that generates a summary `llms.txt` file, written in Markdown, and a single combined documentation `llms-full.txt` file, written in reStructuredText.
[](https://pypi.python.org/pypi/sphinx-llms-txt) [](https://anaconda.org/conda-forge/sphinx-llms-txt) [](https://pepy.tech/project/sphinx-llms-txt) [](#) [](https://github.com/jdillard/sphinx-llms-txt)
## Demo
You can see this Sphinx project’s [llms.txt](https://sphinx-llms-txt.readthedocs.io/en/latest/llms.txt) and [llms-full.txt](https://sphinx-llms-txt.readthedocs.io/en/latest/llms-full.txt) files as a simple example.
### Highlights
1. **Content Collection**: Quickly gathers content from \_sources, without needing a separate build
2. **Directive Processing**: Resolves `include` directives by automatically incorporating their content
3. **Path Resolution**: Transforms relative paths in directives to full paths
4. **Output Generation**: Creates two optional files:
- `llms.txt`: A concise summary of your documentation, in Markdown
- `llms-full.txt`: A comprehensive version with all documentation content, in reStructuredText
5. **Content Filtering**: Allows you to exclude specific pages or sections
6. **Source Code**: Allows you to include specific source code files
## Changelog
# Changelog
## 0.7.0
- Add `llms_txt_uri_template` configuration option to control the link behavior in `llms_txt_filename`.
[#48](https://github.com/jdillard/sphinx-llms-txt/pull/48)
### 0.6.0
- Improve \_sources directory handling
[#47](https://github.com/jdillard/sphinx-llms-txt/pull/47)
#### 0.5.3
- Make sphinx a required dependency since there are imports from Sphinx
[#44](https://github.com/jdillard/sphinx-llms-txt/pull/44)
##### 0.5.2
- Remove support for singlehtml
[#40](https://github.com/jdillard/sphinx-llms-txt/pull/40)
###### 0.5.1
- Only allow builders that have \_sources directory
[#38](https://github.com/jdillard/sphinx-llms-txt/pull/38)
####### 0.5.0
- Add block_level_ignore and page_level_ignore
[#33](https://github.com/jdillard/sphinx-llms-txt/pull/33)
- Add `llms_txt_full_size_policy` configuration option to control behavior when `llms_txt_full_max_size` is exceeded.
[#35](https://github.com/jdillard/sphinx-llms-txt/pull/35)
######## 0.4.1
- Fix include paths and spacing
[#31](https://github.com/jdillard/sphinx-llms-txt/pull/31)
######### 0.4.0
- Add support for including source code files with `llms_txt_code_files` and `llms_txt_code_base_path` configuration options
[#24](https://github.com/jdillard/sphinx-llms-txt/pull/24)
########## 0.3.2
- Fix image paths to deployed images
[#30](https://github.com/jdillard/sphinx-llms-txt/pull/30)
########### 0.3.1
- Fix issue when `source_suffix` equals `source_link_suffix`
[#29](https://github.com/jdillard/sphinx-llms-txt/pull/29)
############ 0.3.0
- Use first paragraph as default for `llms_txt_summary`
[#22](https://github.com/jdillard/sphinx-llms-txt/pull/22)
############# 0.2.4
- Support source file suffix detection
[#21](https://github.com/jdillard/sphinx-llms-txt/pull/21)
############## 0.2.3
- Remove `get_and_resolve_toctree` method
[#19](https://github.com/jdillard/sphinx-llms-txt/pull/19)
- Simplify `_sources` lookup
[#18](https://github.com/jdillard/sphinx-llms-txt/pull/18)
- Add sphinx docs
[#16](https://github.com/jdillard/sphinx-llms-txt/pull/16)
############### 0.2.2
- Refactor LLMSFullManager with clearer class structure
- Add `html_baseurl` to **llms.txt** docs links
- Make glob pattern recursive
################ 0.2.1
- Add ability to exclude pages with `llms_txt_exclude`
################# 0.2.0
- Add `llms_txt_full_max_size` configuration option to limit llms-full.txt file size
- Automatically add content from **include** directives in **llms-full.txt**
- Add path resolution for a given set of directives in **llms-full.txt**
- Add **llms.txt** file option, with `llms_txt_title` and `llms_txt_summary` config values
################## 0.1.0
- Initial release
## Contributing
# Contributing
You will need to set up a development environment to make and test your changes before submitting them.
## Local development
1. Clone the [sphinx-llms-txt repository](https://github.com/jdillard/sphinx-llms-txt).
2. Create and activate a virtual environment:
```console
python3 -m venv .venv
source .venv/bin/activate
```
3. Install development dependencies:
```console
pip install -e . --group dev
```
4. Install pre-commit Git hook scripts:
```console
pre-commit install
```
### Testing changes
Run `pytest` before committing changes.
#### Current contributors
Thanks to all who have contributed!
The people that have improved the code:
* 
[jdillard](https://github.com/jdillard)
## Getting Started
# Getting Started
## Installation
Directly install by using:
```bash
pip install sphinx-llms-txt
```
```bash
conda install -c conda-forge sphinx-llms-txt
```
### Usage
Add the extension to your Sphinx configuration (`conf.py`):
```python
extensions = [
'sphinx_llms_txt',
]
```
After the HTML finishes building, **sphinx-llms-txt** will output the location of the output files:
```
sphinx-llms-txt: Created /path/to/_build/html/llms-full.txt with 45 sources and 6879 lines
sphinx-llms-txt: created /path/to/_build/html/llms.txt
```
See advanced-configuration for more information about how to use **sphinx-llms-txt**.
## Configuration Values
# Project Configuration Values
### llms_txt_full_file
- **Type**: boolean
- **Default**: `True`
- **Description**: Whether to write the single output file.
See disabling_file_generation.
#### Versionadded
Added in version 0.1.0.
### llms_txt_full_filename
- **Type**: string
- **Default**: `'llms-full.txt'`
- **Description**: Name of the single output file.
See changing_filenames.
#### Versionadded
Added in version 0.1.0.
### llms_txt_full_max_size
- **Type**: integer or `None`
- **Default**: `None` (no limit)
- **Description**: Sets a maximum line count for `llms_txt_full_filename`.
Behavior when exceeded is controlled by `llms_txt_full_size_policy`.
See handling_large_documentation.
#### Versionadded
Added in version 0.2.0.
### llms_txt_full_size_policy
- **Type**: string
- **Default**: `'warn_skip'`
- **Description**: Controls what happens when `llms_txt_full_max_size` is exceeded.
Format is `_`. Log levels: `warn`, `info`.
Actions: `skip`, `keep`, `note`.
See handling_large_documentation.
#### Versionadded
Added in version 0.5.0.
### llms_txt_file
- **Type**: boolean
- **Default**: `True`
- **Description**: Whether to write the summary information file.
See disabling_file_generation.
#### Versionadded
Added in version 0.2.0.
### llms_txt_filename
- **Type**: string
- **Default**: `llms.txt`
- **Description**: Name of the summary information file.
See changing_filenames.
#### Versionadded
Added in version 0.2.0.
### llms_txt_uri_template
- **Type**: string or `None`
- **Default**: `None`
- **Description**: Template string for generating URIs in `llms.txt`.
See customizing_uri_links.
#### Versionadded
Added in version 0.7.0.
### llms_txt_directives
- **Type**: list of strings
- **Default**: `[]` (empty list)
- **Description**: List of custom directive names to process for path resolution.
See path_resolution.
#### Versionadded
Added in version 0.1.0.
### llms_txt_title
- **Type**: string or `None`
- **Default**: `None`
- **Description**: Overrides the Sphinx project name as the heading in `llms.txt`.
See custom_title.
#### Versionadded
Added in version 0.2.0.
### llms_txt_summary
- **Type**: string
- **Default**: The first paragraph in the root document, else an empty string
- **Description**: Optional, but recommended, summary description for `llms.txt`.
See custom_summary.
#### Versionadded
Added in version 0.2.0.
### llms_txt_exclude
- **Type**: list of strings
- **Default**: `[]`
- **Description**: A list of pages to ignore using glob patterns.
See excluding_content.
#### Versionadded
Added in version 0.2.1.
### llms_txt_code_files
- **Type**: list of strings
- **Default**: `[]`
- **Description**: A list of glob patterns that appends source code files to `llms_txt_full_filename`.
See including_code_files.
#### Versionadded
Added in version 0.4.0.
### llms_txt_code_base_path
- **Type**: string or `None`
- **Default**: `None` (auto-detect from git root)
- **Description**: Base path to strip from code file paths when displaying titles.
When `None`, automatically detects the relative path from the Sphinx source
directory to the git root and strips that prefix from file paths.
#### Versionadded
Added in version 0.4.0.
## Advanced Configuration
# Advanced Configuration
This page covers advanced configuration options for the sphinx-llms-txt extension.
## Customizing the LLMs Files
By default, the extension generates two files:
1. `llms.txt` - A summary file in Markdown format
2. `llms-full.txt` - A complete documentation file in reStructuredText format
You can customize these files in several ways:
### Changing Filenames
You can change the default filenames by setting these values in your `conf.py`:
```python
llms_txt_filename = "custom-summary.txt"
llms_txt_full_filename = "custom-docs.txt"
```
#### Disabling File Generation
If you only want one of the files, you can disable generation of the other:
```python
# Disable summary file
llms_txt_file = False
# Disable full documentation file
llms_txt_full_file = False
```
##### Adding a Custom Summary
The summary file can include a custom description of your project:
```python
llms_txt_summary = """
This documentation explains how to use MyProject to build amazing
applications. The project provides a comprehensive API for handling
data processing and visualization.
"""
```
#### NOTE
The summary can span multiple lines and will be properly formatted in the output file.
###### Custom Title
By default, the project name from Sphinx is used as the title in `llms.txt`. You can override this:
```python
llms_txt_title = "My Custom Project Documentation"
```
####### Handling Large Documentation
For very large documentation sets, generating the full documentation file might exceed reasonable size limits.
You can set a maximum line count and control what happens when that limit is exceeded:
```python
llms_txt_full_max_size = 10000 # Maximum 10,000 lines
llms_txt_full_size_policy = "warn_skip" # Default behavior
```
The `llms_txt_full_size_policy` setting controls both the log level and action taken when the size limit is exceeded.
It uses the format `"_"`:
**Log levels:**
- `warn`: Log as a warning (default)
- `info`: Log as informational message
**Actions:**
- `skip`: Don’t create the file (default)
- `keep`: Create the file anyway, ignoring the size limit
- `note`: Create a placeholder file explaining why the full file wasn’t generated
######## Custom Directive Handling
######### Path Resolution
The extension resolves paths in the common directives `[ 'image', 'figure']` by default.
You can add custom directives to this list:
```python
llms_txt_directives = [
"my-custom-image-directive",
"another-directive-with-paths",
]
```
This ensures that paths in your custom directives are properly resolved in the generated files.
########## Excluding Content
There are several ways to exclude content from the generated `llms-full.txt` file:
########### Global Page Exclusion
You can exclude specific pages from being included in the generated files:
```python
llms_txt_exclude = [
"search", # Exclude the search page
"genindex", # Exclude the index page
"private_*", # Exclude all pages starting with 'private_'
]
```
This is useful for excluding auto-generated pages, indexes, or content that isn’t relevant for LLM consumption.
It can also be used to reduce the size of llms-full.txt.
############ Page-Level Ignore Metadata
You can exclude individual pages by adding metadata at the top of any reStructuredText file:
```restructuredtext
:llms-txt-ignore: true
Page Title
==========
This entire page will be excluded from llms-full.txt
```
When this metadata is present, the entire page is skipped during processing.
############# Block-Level Ignore Directives
You can exclude specific sections within a page using ignore directives:
```restructuredtext
Page Title
==========
This content will be included in llms-full.txt.
.. llms-txt-ignore-start
This content will be excluded from llms-full.txt.
Section To Ignore
-----------------
This entire section and any nested content will be ignored.
.. code-block:: python
# This code block will also be ignored
def ignored_function():
pass
.. llms-txt-ignore-end
This content will be included again.
```
Block-level ignores can be useful for:
- Removing internal notes or TODOs
- Hiding implementation details while keeping user-facing documentation
#### NOTE
- Multiple ignore blocks can be used within the same file
- Ignore directives work with any indentation level
############## Including Source Code Files
You can include source code files from your project at the end of `llms_txt_full_filename`.
Use include/exclude syntax to precisely control which files are included:
```python
llms_txt_code_files = [
"+:src/**/*.py", # Include all Python files in src
"-:src/**/__pycache__/**", # Exclude Python cache files
]
```
Pattern syntax:
- **+:pattern**: Include files matching the pattern. Processed first to collect matching files.
- **-:pattern**: Exclude files matching the pattern. Applied to filter out unwanted files.
Code files are processed as follows:
- **Glob patterns**: Use standard glob patterns (`*`, `**`, `?`) to match files
- **Relative paths**: Patterns are resolved relative to your Sphinx source directory
- **Formatting**: Each file is presented with a title and syntax-highlighted code block
############### Customizing Code File Paths
By default, the extension automatically detects the relative path from your Sphinx source directory to the git root and strips that prefix from displayed file paths. You can customize this behavior:
```python
# Manually specify base path to strip
llms_txt_code_base_path = "../../"
# Disable path stripping entirely
llms_txt_code_base_path = ""
```
This helps create cleaner, more readable file paths in the generated documentation.
################ Using HTML Base URL
If you want to include absolute URLs for resources in your documentation, you can use Sphinx’s built-in `html_baseurl` configuration:
```python
html_baseurl = "https://example.com/docs/"
```
When this option is set, all resolved paths in directives will be prefixed with this URL, creating absolute paths in the generated files.
################# Customizing URI Links in llms.txt
By default, the `llms.txt` file links to source files in the `_sources` directory when available, falling back to HTML pages when sources aren’t available.
You can customize this behavior using URI templates with `llms_txt_uri_template`:
```python
# Default: Link to source files, if _sources exists
llms_txt_uri_template = "{base_url}_sources/{docname}{suffix}{sourcelink_suffix}"
# Default: Link to HTML pages instead, if _sources doesn't exist
llms_txt_uri_template = "{base_url}{docname}.html"
# Manual: Link to a custom markdown build
llms_txt_uri_template = "{base_url}{docname}.md"
```
################## Available Template Variables
Your URI template can use the following variables:
- `{base_url}` - The base URL from `html_baseurl` configuration (includes trailing slash)
- `{docname}` - The document name (e.g., `index`, `guide/intro`)
- `{suffix}` - The source file suffix (e.g., `.rst`, `.md`) - may be empty if no source file exists
- `{sourcelink_suffix}` - The suffix from `html_sourcelink_suffix` configuration (e.g., `.txt`)
################### Integration Examples
#################### Complete Configuration Example
Here’s a complete example showing multiple configuration-values:
```python
# File names and generation options
llms_txt_filename = "ai-summary.txt"
llms_txt_full_filename = "ai-full-docs.txt"
llms_txt_full_max_size = 50000
llms_txt_full_size_policy = "warn_note"
# Content customization
llms_txt_title = "Project Documentation for AI Assistants"
llms_txt_summary = """
This is a comprehensive documentation set for our project.
It includes API references, usage examples, and tutorials.
"""
llms_txt_uri_template = "{base_url}{docname}.md"
# Path handling
html_baseurl = "https://docs.example.com/"
llms_txt_directives = ["custom-image", "custom-include"]
# Content filtering
llms_txt_exclude = ["search", "genindex", "404", "private_*"]
# Source code inclusion with include/exclude patterns
llms_txt_code_files = [
"+:../../src/**/*.py", # Include Python files
"+:../../config/*.yaml", # Include config files
"-:../../src/**/__pycache__/**", # Exclude cache files
]
llms_txt_code_base_path = "../../"
```