Token Intelligence (Atlas)

Token Intelligence, powered by Atlas, is a local-first code knowledge graph system that runs entirely on your machine. It extracts semantic information from your codebase once, then provides that structured data to agents instead of requiring raw file reads—dramatically reducing token usage and improving response quality.

What is Atlas?

Atlas is a semantic code intelligence engine that:

Builds a local knowledge graph from your entire codebase
Extracts code structure (functions, classes, variables, imports, calls, types)
Models relationships between symbols (calls, references, extensions, implementations)
Stores everything in an SQLite database at .atlas/atlas.db
Runs 100% locally with no cloud processing or data transmission
Exposes a Model Context Protocol (MCP) server so agents get surgical context

Unlike file-based context, which requires sending entire files to LLMs, Atlas provides:

Structured node/edge metadata (start/end line, visibility, type info)
Semantic search results ranked by relevance
Call graphs, type hierarchies, and data flow paths
Per-symbol code snippets instead of whole files

How Indexing Works

Initial Indexing

When you enable Token Intelligence for a project, Tempest:

Spawns node .../atlas/dist/mcp/server-entry.js --init --path <project>
Initializes the .atlas/ directory structure
Creates an empty SQLite database at .atlas/atlas.db
Begins scanning the project root

The scan phase:

Discovers all source files matching supported extensions
Applies .gitignore rules and default ignore patterns
Respects custom exclusions in atlas.json (project root)
Skips vendor directories, build output, caches, test resources, Android resource directories, and other generated content by default

What files are indexed:

All recognized source code files (.ts, .js, .py, .go, .rs, .java, .cpp, .cs, .php, .rb, .swift, .kt, .dart, .lua, .vue, .svelte, .astro, and 20+ more)
Configuration files (.json, .yaml, .toml, .xml) that define routes or dependencies
Framework files (routes, modules, decorators, resolvers)

Files excluded by default:

node_modules/, .venv/, venv/, target/, vendor/, dist/, build/, and ~30 other dependency/build directories
Android resource directories (res/layout, res/values, res/drawable, etc.)
Test/spec files (unless explicitly included in the query)
Generated code (detected by heuristics: @generated, autogenerated, do not edit markers)
Files larger than 1 MB (minified bundles, compiled assets)

Parsing & Extraction

Atlas uses tree-sitter (with WebAssembly grammars) to parse source files in parallel:

Worker threads pool parses multiple files concurrently
Each parser extracts nodes (symbols) and edges (relationships)
Extracted data includes:
- Node kind (function, class, variable, import, route, component, etc.)
- Name and fully qualified name (e.g., src/utils.ts::MathHelper.calculateTotal)
- Location (file, start/end line, start/end column)
- Metadata (visibility, type parameters, return type, docstring, signature)
- Decorators and modifiers (async, static, abstract, exported)
Edge kinds tracked:
- contains (file contains class, class contains method)
- calls (function calls another)
- imports (file imports from another)
- exports (symbol exported from file)
- extends / implements (inheritance)
- references (generic reference)
- type_of (variable has type)
- returns (function return type)
- instantiates, overrides, decorates

Extraction performance:

Typical scan: 100-1000 files/second (I/O bound)
Typical parse: 1000-10000 files/second (CPU bound, parallelized)
A 10k-file project typically indexes in 5-15 minutes on first run
Database size: ~50-500 MB depending on complexity (heavily indexed repos on the high end)

Progress updates: Indexing reports progress via a callback with:

phase: ‘scanning’ | ‘parsing’ | ‘storing’ | ‘resolving’
current: Files processed so far
total: Total files to process
currentFile: (optional) Current file being parsed

Incremental Sync

After indexing, Atlas can sync with file changes:

atlas.sync() checks disk for added/modified/removed files
Only re-parses changed files (fast path)
Re-indexes only affected references
Typical sync: < 1 second on small changes

The Tauri backend installs a file watcher that automatically syncs on changes. The watcher tracks pending files and reports staleness to agents via the atlas_explore tool.

Reference Resolution

After extraction, Atlas performs multi-pass reference resolution:

Import-based resolution: Follows import X from './file' to map names to definitions
Framework-specific: React Routes, Express handlers, NestJS controllers, Laravel middleware, etc.
Name-based matching: Falls back to symbol-name lookup in the same package
Type hierarchy traversal: Finds inherited members through extends/implements chains
Chained calls via conformance: Resolves method calls on protocol/interface implementations

This creates the actual edges in the graph. Once resolved, agents can traverse:

“Who calls this function?”
“What does this class extend?”
“Which routes are handled by this controller?”
“What symbols are exported from this module?”

The AtlasIndexToast

The AtlasIndexToast is a React component shown in Tempest’s UI during initial indexing:

<AtlasIndexToast projectPath={...} projectName={...} onDismiss={...} />

Behavior:

Polls every 2 seconds for the existence of .atlas/atlas.db in the project
Shows a spinner and “Indexing project” message while indexing runs
Once the database file appears (indexing complete), displays “Index ready” and auto-dismisses after 2.5 seconds
Users can manually dismiss at any time

Why this works: The Tauri background thread spawns the Node.js indexing process (server-entry.js --init) detached from the Tempest process. The toast doesn’t wait for the process to exit; it just watches for the database file to materialize, which happens early in the indexing run. The Tauri backend also streams stdout/stderr from the indexing process as atlas:log events so users see real-time progress in the logs panel.

How Atlas Reduces Token Usage

Raw file-based context sends entire files to the LLM:

“Show me src/handlers.ts” → 500+ lines → 2000+ tokens per file
“Show me 10 related files” → 20k tokens before any actual reasoning
Agents must parse file structure themselves, extract only relevant pieces
Duplicated context when multiple symbols from the same file are relevant

With Atlas, agents use structured tools:

atlas_explore "How does authentication work?" → Returns a focused subgraph with:
- Only relevant files (5-8 instead of 20+)
- Per-symbol code snippets (20-50 lines) instead of whole files
- Symbol names, signatures, and locations
- Relationships between symbols (who calls who, what implements what)
- ~300-800 tokens instead of 2000+
atlas_node "Symbol/QualifiedName" → Returns just that symbol’s definition + immediate context
atlas_graph "find_callers UserService.authenticate" → Returns call chain as a traversable graph
atlas_search "authentication" → Returns FTS-ranked search results, top N only

Impact:

Token savings: 60-80% reduction in context tokens for typical agent queries
Faster responses: Smaller context means faster inference
Better accuracy: Agents reason about structure, not raw text parsing
Cross-file awareness: Agents see relationships without reading every file

Data Storage

All Atlas data lives in the project at .atlas/:

.atlas/
  atlas.db          # Main SQLite database
  atlas.db-wal      # WAL (write-ahead log) for concurrent access
  atlas.db-shm      # Shared memory for WAL synchronization
  daemon.log        # (Optional) MCP daemon logs
  daemon.lock       # (Optional) Daemon process lock
  daemon.sock       # (Optional) Unix socket for daemon communication

Database Schema

The SQLite schema (in schema.sql) defines: Nodes table:

id: Primary key (hash of file path + qualified name)
kind: Node type (function, class, import, route, etc.)
name: Simple name (e.g., “calculateTotal”)
qualified_name: Full path (e.g., “MathHelper.calculateTotal”)
file_path: Relative to project root
language: Detected language
start_line, end_line, start_column, end_column: Location
docstring, signature: Documentation and type info
visibility, is_exported, is_async, is_static: Modifiers
decorators, type_parameters, return_type: Extra metadata
updated_at: Last modified timestamp

Edges table:

source, target: Node IDs
kind: Edge type (calls, imports, extends, etc.)
metadata: JSON with context (line, column, parameter info)
Unique index on (source, target, kind, line, col) to prevent duplicates

Files table:

path: File path (primary key)
content_hash: SHA256 of file contents
language, size: File metadata
modified_at, indexed_at: Timestamps
node_count: Count of extracted symbols
errors: JSON array of parse errors

Unresolved References table:

Tracks references waiting for resolution
Clears after successful resolution pass

Full-text search:

nodes_fts virtual table indexes name, qualified_name, docstring, signature
Enables semantic search across the graph

Performance indexes:

Indexes on kind, name, qualified_name, file_path, language
Composite indexes on (file_path, start_line), (source, kind), (target, kind) for fast traversal
UNIQUE index on edge identity to prevent duplicates

Database Configuration

Atlas uses SQLite in WAL (Write-Ahead Log) mode:

Readers never block on a concurrent writer
Writers don’t block readers
Multiple processes can connect simultaneously (MCP daemon + git hooks)

Pragmas set:

journal_mode = WAL: Write-ahead logging
synchronous = NORMAL: Safe with WAL
busy_timeout = 5s: Wait up to 5 seconds if database is locked
cache_size = 64 MB: Large page cache for fast queries
mmap_size = 256 MB: Memory-mapped I/O for sequential scans

Database size estimation:

Small project (< 1k files): 10-50 MB
Medium project (1k-10k files): 50-200 MB
Large project (10k-100k files): 200-500 MB
Very large (100k+ files): 500 MB-2+ GB

Size scales roughly with total symbol count, not file count.

Per-Project Indexing

Each project has its own .atlas/ directory and database:

Switching to a new project workspace automatically points Atlas to its .atlas/
Multiple projects can be indexed simultaneously in separate processes
The MCP daemon (when running) maintains one connection per project
No shared index across projects

If you work in a monorepo or multi-workspace setup:

Each workspace root needs its own index
Use atlas.json at the workspace root to configure exclusions/extensions

Re-indexing

Indexing happens in three scenarios:

Automatic (on first enable)

When you enable Token Intelligence in Tempest:

Tempest detects no .atlas/ directory
Spawns atlas init --path <project>
Wauri AtlasIndexToast polls for .atlas/atlas.db
Background indexing runs, user is notified via toast

Automatic (on file changes)

The file watcher installed by Atlas syncs automatically on detected changes:

Debounced every 500ms
Only re-parses changed files
Happens in background without blocking the UI

You can pause/resume watching programmatically via the Rust backend.

Manual (via CLI or API)

# Full re-index (discards old database, starts fresh)
atlas index --path <project>

# Sync only (incremental update)
atlas sync --path <project>

# Initialize if not already indexed
atlas init --path <project>

From TypeScript:

import Atlas from '@tempest/atlas';

const atlas = await Atlas.open('/path/to/project');

// Full re-index (clears database, reindexes everything)
await atlas.indexAll({ 
  onProgress: (progress) => console.log(progress),
  verbose: true 
});

// Incremental sync (only changed files)
await atlas.sync();

// Check if index is stale
if (atlas.isIndexStale()) {
  console.log('Index was built with an older extraction engine');
}

When to Re-index

Re-index when:

Enabling Token Intelligence for the first time
After major framework/dependency updates (npm install, pip install)
After local branch changes that rewrote history
Atlas recommends it (run atlas status --json to check)
You see “database is locked” errors (indicates corruption)

You don’t need to re-index for:

Normal code edits (auto-sync handles these)
Switching branches with similar structure
Temporary file changes

Supported Languages

Atlas extracts structure from 30+ languages: Web & scripting:

TypeScript, JavaScript, TSX, JSX
Vue.js, Svelte, Astro
Python, Ruby, PHP, Lua, Luau

Systems & compiled:

Go, Rust, C, C++, C#
Java, Kotlin, Scala
Swift, Objective-C
Dart

Configuration & markup:

YAML, XML, JSON, Properties files
Liquid (Jekyll templates)
Twig (Symfony templates)
Pascal

File type detection: Atlas detects language by file extension. The built-in map covers ~100 extensions across all languages. Use atlas.json to add custom mappings:

{
  "extensions": {
    ".dota_lua": "lua",
    ".tpl": "php",
    ".h": "cpp"
  }
}

Configuration

Create atlas.json at your project root to configure indexing:

{
  "extensions": {
    ".custom": "typescript"
  },
  "includeIgnored": [
    "vendor/important-vendored-lib/"
  ],
  "exclude": [
    "static/themes/**",
    "assets/vendor/**"
  ]
}

Extensions: Map custom file extensions to languages. Overwrites built-ins on conflict. includeIgnored: Gitignore-style patterns for directories that ARE in .gitignore but should be indexed anyway. Useful for vendored libraries you want in the graph. exclude: Gitignore-style patterns for files to skip, even if tracked in git. Useful for checked-in themes or SDKs that bloat the graph. Both fields accept gitignore patterns: vendor/, **/*.min.js, src/generated/**, etc.

Graph Operations

Once indexed, agents can query the graph via MCP tools: Search:

atlas_search "authentication middleware"

Returns FTS-ranked results: matching symbols, sorted by relevance. Explore:

atlas_explore "How does user login work?" --maxFiles 6

Returns relevant subgraph: entry points + related code + relationships. Node details:

atlas_node "src/auth.ts::authenticate" --includeCode

Returns symbol metadata + code snippet + callers/callees. Graph traversal:

atlas_graph "find_callers authenticate"
atlas_graph "find_usages UserService"
atlas_graph "type_hierarchy BaseController"

Returns call graphs, usages, type hierarchies. Project structure:

atlas_status

Returns graph statistics, index freshness, database size.

Performance Considerations

Indexing speed:

Typical: 100-1000 files/second
Parallelized across worker threads
Bottleneck: tree-sitter parsing, not I/O or database writes

Database queries:

Symbol lookup: < 1ms (indexed by name)
Full-text search: 10-100ms depending on query selectivity
Graph traversal: 10-500ms depending on depth
File dependencies: < 100ms

Memory usage:

Parsing workers: ~100 MB each (WASM heap grows during parsing, shrinks after)
Database: 50-200 MB resident (SQLite page cache)
CLI tool: 200-500 MB during indexing

Disk space:

Index database: 50 MB - 2 GB depending on project size
WAL files: 0-500 MB (temporary during concurrent writes, cleaned after sync)

Troubleshooting

“Database is locked” errors:

Check if another process is indexing (look for atlas processes)
Ensure journal mode is WAL: sqlite3 .atlas/atlas.db "PRAGMA journal_mode;"
If corrupted, remove .atlas/ and re-index

Indexing hangs or takes too long:

Check for very large files (> 1 MB) that slow parsing
Verify .gitignore is working (should exclude vendor dirs)
Look for symbolic links to huge directories
Use verbose: true to see per-file progress

Index is stale:

Run atlas index --path <project> to rebuild with latest extraction engine

Certain symbols aren’t indexed:

Check file language detection (atlas status --json shows language distribution)
Verify file is not in excluded patterns
Ensure file is under project root

MCP daemon won’t start:

Set ATLAS_NO_DAEMON=1 to run in direct mode
Check .atlas/daemon.log for errors
Ensure no stale daemon lock at .atlas/daemon.lock

What’s Next: Nexus

Tempest includes a Nexus page (code graph visualization interface) that is currently a placeholder for future functionality. Planned features include:

Interactive visualization of the code graph
Node/edge filtering and search
Call graph exploration UI
Data flow visualization
Type hierarchy browser

This will provide a visual complement to the agent-facing MCP tools, letting developers understand their codebase structure at a glance.

​Token Intelligence (Atlas)

​What is Atlas?

​How Indexing Works

​Initial Indexing

​Parsing & Extraction

​Incremental Sync

​Reference Resolution

​The AtlasIndexToast

​How Atlas Reduces Token Usage

​Data Storage

​Database Schema

​Database Configuration

​Per-Project Indexing

​Re-indexing

​Automatic (on first enable)

​Automatic (on file changes)

​Manual (via CLI or API)

​When to Re-index

​Supported Languages

​Configuration

​Graph Operations

​Performance Considerations

​Troubleshooting

​What’s Next: Nexus