> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tempestai.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Token Intelligence System (Atlas)

> Local-first code knowledge graph for semantic code intelligence, reducing token usage and improving agent context

# Token Intelligence (Atlas)

Token Intelligence, powered by **Atlas**, is a local-first code knowledge graph system that runs entirely on your machine. It extracts semantic information from your codebase once, then provides that structured data to agents instead of requiring raw file reads—dramatically reducing token usage and improving response quality.

## What is Atlas?

Atlas is a semantic code intelligence engine that:

* Builds a **local knowledge graph** from your entire codebase
* Extracts **code structure** (functions, classes, variables, imports, calls, types)
* Models **relationships** between symbols (calls, references, extensions, implementations)
* Stores everything in an **SQLite database** at `.atlas/atlas.db`
* Runs **100% locally** with no cloud processing or data transmission
* Exposes a **Model Context Protocol (MCP) server** so agents get surgical context

Unlike file-based context, which requires sending entire files to LLMs, Atlas provides:

* Structured node/edge metadata (start/end line, visibility, type info)
* Semantic search results ranked by relevance
* Call graphs, type hierarchies, and data flow paths
* Per-symbol code snippets instead of whole files

## How Indexing Works

### Initial Indexing

When you enable Token Intelligence for a project, Tempest:

1. Spawns `node .../atlas/dist/mcp/server-entry.js --init --path <project>`
2. Initializes the `.atlas/` directory structure
3. Creates an empty SQLite database at `.atlas/atlas.db`
4. Begins scanning the project root

The scan phase:

* Discovers all source files matching supported extensions
* Applies `.gitignore` rules and default ignore patterns
* Respects custom exclusions in `atlas.json` (project root)
* Skips vendor directories, build output, caches, test resources, Android resource directories, and other generated content by default

**What files are indexed:**

* All recognized source code files (`.ts`, `.js`, `.py`, `.go`, `.rs`, `.java`, `.cpp`, `.cs`, `.php`, `.rb`, `.swift`, `.kt`, `.dart`, `.lua`, `.vue`, `.svelte`, `.astro`, and 20+ more)
* Configuration files (`.json`, `.yaml`, `.toml`, `.xml`) that define routes or dependencies
* Framework files (routes, modules, decorators, resolvers)

**Files excluded by default:**

* `node_modules/`, `.venv/`, `venv/`, `target/`, `vendor/`, `dist/`, `build/`, and \~30 other dependency/build directories
* Android resource directories (`res/layout`, `res/values`, `res/drawable`, etc.)
* Test/spec files (unless explicitly included in the query)
* Generated code (detected by heuristics: `@generated`, `autogenerated`, `do not edit` markers)
* Files larger than 1 MB (minified bundles, compiled assets)

### Parsing & Extraction

Atlas uses **tree-sitter** (with WebAssembly grammars) to parse source files in parallel:

* Worker threads pool parses multiple files concurrently

* Each parser extracts **nodes** (symbols) and **edges** (relationships)

* Extracted data includes:
  * Node kind (function, class, variable, import, route, component, etc.)
  * Name and fully qualified name (e.g., `src/utils.ts::MathHelper.calculateTotal`)
  * Location (file, start/end line, start/end column)
  * Metadata (visibility, type parameters, return type, docstring, signature)
  * Decorators and modifiers (async, static, abstract, exported)

* Edge kinds tracked:
  * `contains` (file contains class, class contains method)
  * `calls` (function calls another)
  * `imports` (file imports from another)
  * `exports` (symbol exported from file)
  * `extends` / `implements` (inheritance)
  * `references` (generic reference)
  * `type_of` (variable has type)
  * `returns` (function return type)
  * `instantiates`, `overrides`, `decorates`

**Extraction performance:**

* Typical scan: 100-1000 files/second (I/O bound)
* Typical parse: 1000-10000 files/second (CPU bound, parallelized)
* A 10k-file project typically indexes in 5-15 minutes on first run
* Database size: \~50-500 MB depending on complexity (heavily indexed repos on the high end)

**Progress updates:**

Indexing reports progress via a callback with:

* `phase`: 'scanning' | 'parsing' | 'storing' | 'resolving'
* `current`: Files processed so far
* `total`: Total files to process
* `currentFile`: (optional) Current file being parsed

### Incremental Sync

After indexing, Atlas can sync with file changes:

* `atlas.sync()` checks disk for added/modified/removed files
* Only re-parses changed files (fast path)
* Re-indexes only affected references
* Typical sync: \< 1 second on small changes

The Tauri backend installs a file watcher that automatically syncs on changes. The watcher tracks pending files and reports staleness to agents via the `atlas_explore` tool.

### Reference Resolution

After extraction, Atlas performs multi-pass reference resolution:

1. **Import-based resolution**: Follows `import X from './file'` to map names to definitions
2. **Framework-specific**: React Routes, Express handlers, NestJS controllers, Laravel middleware, etc.
3. **Name-based matching**: Falls back to symbol-name lookup in the same package
4. **Type hierarchy traversal**: Finds inherited members through extends/implements chains
5. **Chained calls via conformance**: Resolves method calls on protocol/interface implementations

This creates the actual edges in the graph. Once resolved, agents can traverse:

* "Who calls this function?"
* "What does this class extend?"
* "Which routes are handled by this controller?"
* "What symbols are exported from this module?"

## The AtlasIndexToast

The **AtlasIndexToast** is a React component shown in Tempest's UI during initial indexing:

```tsx theme={null}
<AtlasIndexToast projectPath={...} projectName={...} onDismiss={...} />
```

**Behavior:**

* Polls every 2 seconds for the existence of `.atlas/atlas.db` in the project
* Shows a spinner and "Indexing project" message while indexing runs
* Once the database file appears (indexing complete), displays "Index ready" and auto-dismisses after 2.5 seconds
* Users can manually dismiss at any time

**Why this works:**

The Tauri background thread spawns the Node.js indexing process (`server-entry.js --init`) detached from the Tempest process. The toast doesn't wait for the process to exit; it just watches for the database file to materialize, which happens early in the indexing run. The Tauri backend also streams stdout/stderr from the indexing process as `atlas:log` events so users see real-time progress in the logs panel.

## How Atlas Reduces Token Usage

Raw file-based context sends entire files to the LLM:

* "Show me src/handlers.ts" → 500+ lines → 2000+ tokens per file
* "Show me 10 related files" → 20k tokens before any actual reasoning
* Agents must parse file structure themselves, extract only relevant pieces
* Duplicated context when multiple symbols from the same file are relevant

With Atlas, agents use structured tools:

* `atlas_explore "How does authentication work?"` → Returns a focused subgraph with:
  * Only relevant files (5-8 instead of 20+)
  * Per-symbol code snippets (20-50 lines) instead of whole files
  * Symbol names, signatures, and locations
  * Relationships between symbols (who calls who, what implements what)
  * \~300-800 tokens instead of 2000+

* `atlas_node "Symbol/QualifiedName"` → Returns just that symbol's definition + immediate context

* `atlas_graph "find_callers UserService.authenticate"` → Returns call chain as a traversable graph

* `atlas_search "authentication"` → Returns FTS-ranked search results, top N only

**Impact:**

* Token savings: 60-80% reduction in context tokens for typical agent queries
* Faster responses: Smaller context means faster inference
* Better accuracy: Agents reason about structure, not raw text parsing
* Cross-file awareness: Agents see relationships without reading every file

## Data Storage

All Atlas data lives in the project at `.atlas/`:

```
.atlas/
  atlas.db          # Main SQLite database
  atlas.db-wal      # WAL (write-ahead log) for concurrent access
  atlas.db-shm      # Shared memory for WAL synchronization
  daemon.log        # (Optional) MCP daemon logs
  daemon.lock       # (Optional) Daemon process lock
  daemon.sock       # (Optional) Unix socket for daemon communication
```

### Database Schema

The SQLite schema (in `schema.sql`) defines:

**Nodes table:**

* `id`: Primary key (hash of file path + qualified name)
* `kind`: Node type (function, class, import, route, etc.)
* `name`: Simple name (e.g., "calculateTotal")
* `qualified_name`: Full path (e.g., "MathHelper.calculateTotal")
* `file_path`: Relative to project root
* `language`: Detected language
* `start_line`, `end_line`, `start_column`, `end_column`: Location
* `docstring`, `signature`: Documentation and type info
* `visibility`, `is_exported`, `is_async`, `is_static`: Modifiers
* `decorators`, `type_parameters`, `return_type`: Extra metadata
* `updated_at`: Last modified timestamp

**Edges table:**

* `source`, `target`: Node IDs
* `kind`: Edge type (calls, imports, extends, etc.)
* `metadata`: JSON with context (line, column, parameter info)
* Unique index on (source, target, kind, line, col) to prevent duplicates

**Files table:**

* `path`: File path (primary key)
* `content_hash`: SHA256 of file contents
* `language`, `size`: File metadata
* `modified_at`, `indexed_at`: Timestamps
* `node_count`: Count of extracted symbols
* `errors`: JSON array of parse errors

**Unresolved References table:**

* Tracks references waiting for resolution
* Clears after successful resolution pass

**Full-text search:**

* `nodes_fts` virtual table indexes name, qualified\_name, docstring, signature
* Enables semantic search across the graph

**Performance indexes:**

* Indexes on `kind`, `name`, `qualified_name`, `file_path`, `language`
* Composite indexes on `(file_path, start_line)`, `(source, kind)`, `(target, kind)` for fast traversal
* UNIQUE index on edge identity to prevent duplicates

### Database Configuration

Atlas uses SQLite in **WAL (Write-Ahead Log) mode**:

* Readers never block on a concurrent writer
* Writers don't block readers
* Multiple processes can connect simultaneously (MCP daemon + git hooks)

**Pragmas set:**

* `journal_mode = WAL`: Write-ahead logging
* `synchronous = NORMAL`: Safe with WAL
* `busy_timeout = 5s`: Wait up to 5 seconds if database is locked
* `cache_size = 64 MB`: Large page cache for fast queries
* `mmap_size = 256 MB`: Memory-mapped I/O for sequential scans

**Database size estimation:**

* Small project (\< 1k files): 10-50 MB
* Medium project (1k-10k files): 50-200 MB
* Large project (10k-100k files): 200-500 MB
* Very large (100k+ files): 500 MB-2+ GB

Size scales roughly with total symbol count, not file count.

## Per-Project Indexing

Each project has its own `.atlas/` directory and database:

* Switching to a new project workspace automatically points Atlas to its `.atlas/`
* Multiple projects can be indexed simultaneously in separate processes
* The MCP daemon (when running) maintains one connection per project
* No shared index across projects

If you work in a monorepo or multi-workspace setup:

* Each workspace root needs its own index
* Use `atlas.json` at the workspace root to configure exclusions/extensions

## Re-indexing

Indexing happens in three scenarios:

### Automatic (on first enable)

When you enable Token Intelligence in Tempest:

1. Tempest detects no `.atlas/` directory
2. Spawns `atlas init --path <project>`
3. Wauri AtlasIndexToast polls for `.atlas/atlas.db`
4. Background indexing runs, user is notified via toast

### Automatic (on file changes)

The file watcher installed by Atlas syncs automatically on detected changes:

* Debounced every 500ms
* Only re-parses changed files
* Happens in background without blocking the UI

You can pause/resume watching programmatically via the Rust backend.

### Manual (via CLI or API)

```bash theme={null}
# Full re-index (discards old database, starts fresh)
atlas index --path <project>

# Sync only (incremental update)
atlas sync --path <project>

# Initialize if not already indexed
atlas init --path <project>
```

From TypeScript:

```ts theme={null}
import Atlas from '@tempest/atlas';

const atlas = await Atlas.open('/path/to/project');

// Full re-index (clears database, reindexes everything)
await atlas.indexAll({ 
  onProgress: (progress) => console.log(progress),
  verbose: true 
});

// Incremental sync (only changed files)
await atlas.sync();

// Check if index is stale
if (atlas.isIndexStale()) {
  console.log('Index was built with an older extraction engine');
}
```

### When to Re-index

Re-index when:

* Enabling Token Intelligence for the first time
* After major framework/dependency updates (npm install, pip install)
* After local branch changes that rewrote history
* Atlas recommends it (run `atlas status --json` to check)
* You see "database is locked" errors (indicates corruption)

You don't need to re-index for:

* Normal code edits (auto-sync handles these)
* Switching branches with similar structure
* Temporary file changes

## Supported Languages

Atlas extracts structure from 30+ languages:

**Web & scripting:**

* TypeScript, JavaScript, TSX, JSX
* Vue.js, Svelte, Astro
* Python, Ruby, PHP, Lua, Luau

**Systems & compiled:**

* Go, Rust, C, C++, C#
* Java, Kotlin, Scala
* Swift, Objective-C
* Dart

**Configuration & markup:**

* YAML, XML, JSON, Properties files
* Liquid (Jekyll templates)
* Twig (Symfony templates)
* Pascal

**File type detection:**

Atlas detects language by file extension. The built-in map covers \~100 extensions across all languages. Use `atlas.json` to add custom mappings:

```json theme={null}
{
  "extensions": {
    ".dota_lua": "lua",
    ".tpl": "php",
    ".h": "cpp"
  }
}
```

## Configuration

Create `atlas.json` at your project root to configure indexing:

```json theme={null}
{
  "extensions": {
    ".custom": "typescript"
  },
  "includeIgnored": [
    "vendor/important-vendored-lib/"
  ],
  "exclude": [
    "static/themes/**",
    "assets/vendor/**"
  ]
}
```

**Extensions:**

Map custom file extensions to languages. Overwrites built-ins on conflict.

**includeIgnored:**

Gitignore-style patterns for directories that ARE in `.gitignore` but should be indexed anyway. Useful for vendored libraries you want in the graph.

**exclude:**

Gitignore-style patterns for files to skip, even if tracked in git. Useful for checked-in themes or SDKs that bloat the graph.

Both fields accept gitignore patterns: `vendor/`, `**/*.min.js`, `src/generated/**`, etc.

## Graph Operations

Once indexed, agents can query the graph via MCP tools:

**Search:**

```
atlas_search "authentication middleware"
```

Returns FTS-ranked results: matching symbols, sorted by relevance.

**Explore:**

```
atlas_explore "How does user login work?" --maxFiles 6
```

Returns relevant subgraph: entry points + related code + relationships.

**Node details:**

```
atlas_node "src/auth.ts::authenticate" --includeCode
```

Returns symbol metadata + code snippet + callers/callees.

**Graph traversal:**

```
atlas_graph "find_callers authenticate"
atlas_graph "find_usages UserService"
atlas_graph "type_hierarchy BaseController"
```

Returns call graphs, usages, type hierarchies.

**Project structure:**

```
atlas_status
```

Returns graph statistics, index freshness, database size.

## Performance Considerations

**Indexing speed:**

* Typical: 100-1000 files/second
* Parallelized across worker threads
* Bottleneck: tree-sitter parsing, not I/O or database writes

**Database queries:**

* Symbol lookup: \< 1ms (indexed by name)
* Full-text search: 10-100ms depending on query selectivity
* Graph traversal: 10-500ms depending on depth
* File dependencies: \< 100ms

**Memory usage:**

* Parsing workers: \~100 MB each (WASM heap grows during parsing, shrinks after)
* Database: 50-200 MB resident (SQLite page cache)
* CLI tool: 200-500 MB during indexing

**Disk space:**

* Index database: 50 MB - 2 GB depending on project size
* WAL files: 0-500 MB (temporary during concurrent writes, cleaned after sync)

## Troubleshooting

**"Database is locked" errors:**

* Check if another process is indexing (look for `atlas` processes)
* Ensure journal mode is WAL: `sqlite3 .atlas/atlas.db "PRAGMA journal_mode;"`
* If corrupted, remove `.atlas/` and re-index

**Indexing hangs or takes too long:**

* Check for very large files (> 1 MB) that slow parsing
* Verify `.gitignore` is working (should exclude vendor dirs)
* Look for symbolic links to huge directories
* Use `verbose: true` to see per-file progress

**Index is stale:**

* Run `atlas index --path <project>` to rebuild with latest extraction engine

**Certain symbols aren't indexed:**

* Check file language detection (`atlas status --json` shows language distribution)
* Verify file is not in excluded patterns
* Ensure file is under project root

**MCP daemon won't start:**

* Set `ATLAS_NO_DAEMON=1` to run in direct mode
* Check `.atlas/daemon.log` for errors
* Ensure no stale daemon lock at `.atlas/daemon.lock`

## What's Next: Nexus

Tempest includes a **Nexus page** (code graph visualization interface) that is currently a placeholder for future functionality. Planned features include:

* Interactive visualization of the code graph
* Node/edge filtering and search
* Call graph exploration UI
* Data flow visualization
* Type hierarchy browser

This will provide a visual complement to the agent-facing MCP tools, letting developers understand their codebase structure at a glance.
