How Parsing Works
What Serpentine builds from your code, how edges are determined, and what makes the graph useful.
Serpentine’s goal is to answer one question: when entity A mentions entity B by name, which specific definition does that reference resolve to? The result is a graph where every node is a named entity in your code and every edge is a statically-resolved reference — not a file import, not a string match, but a traced pointer from one definition to another.
The graph structure
Every node represents a named entity — a module, class, function, method, or variable assignment. Nodes are identified by a dotted path that reflects their location in the codebase (src.auth.models.User.get). Each node has a type:
| Type | What it represents |
|---|---|
module | A source file or package |
class | A class definition |
function | A function, method, or lambda |
assignment | A variable or constant assignment |
interface | A TypeScript interface |
Edges are directional and typed. There are five kinds:
| Edge type | What it represents | Example |
|---|---|---|
calls | A function or method invocation | parse(data) → parse |
has-a | A constructor assignment or attribute-form decorator | loader = CSVLoader(path) → CSVLoader |
is-a | Class inheritance | class Stats(Base) → Base |
references | A name used as a value, or a plain decorator | default = MISSING → MISSING |
imports | A module-level import statement | from auth import User → auth.User |
Variable-type inference
One of Serpentine’s more useful properties is that it resolves method calls through local variables without needing type annotations. Consider this Python:
def main():
car = Car(engine)
car.drive()
Most tools would need a type annotation (car: Car) to know that car.drive() resolves to Car.drive. Serpentine infers it from the assignment:
car = Car(engine)→ ASSIGNED pass recordsmain.car --has-a--> Carcar.drive()→ CALLS pass seescar.drive, resolvescarvia LEGB tomain.car, follows thehas-aedge to find thatmain.caris aCar, then looks updriveonCar- Result:
main --calls--> Car.drive
This works for chained attribute access too. If self.engine = Engine() is set in __init__, then self.engine.start() elsewhere in the class resolves to Engine.start. The same inference applies to factory functions: if build_car() has a single return type, variables assigned from it are typed accordingly.
How the graph is built
The parser is built on tree-sitter and processes each file through a subscriber pipeline. As the AST is walked, events are emitted — EnterScope, ExitScope, DefineName, UseName, ImportStatement, and so on — and subscribers collect them into structured data. A GraphBuilder then runs a sequence of passes over that data to produce the final graph.
The passes, in order:
- Scope tree — builds the module/class/function containment hierarchy
- Definitions — registers variable and assignment nodes
- Re-export map — scans
__init__.py,index.ts, andmod.rs/lib.rsfiles to transparently resolve re-exported names back to their definition site - Import bindings — builds the name resolution table for the “G” (global/module) level; follows import chains up to depth 20
- ASSIGNED pass — creates
has-aedges for constructor assignments (x = Foo()) - RETURNS pass — infers function return types from
returnstatements - RETYPE pass — upgrades factory-function
has-aedges to point at the returned class instead of the function - CONSTRUCTOR-ARG pass — propagates argument types at call sites into
__init__parameter nodes - PARAM-TYPE pass — propagates
__init__parameter types throughself.x = paramassignments - CALLS pass — resolves all remaining function and method call targets
Name resolution
Name resolution uses LEGB rules (Local → Enclosing → Global → Built-in), the same scoping model Python uses. Given a name and the scope it appears in, the resolver walks up the scope hierarchy checking:
- Local definitions in the current scope
- Enclosing scope definitions
- Module-level import bindings (after following the import chain to the original definition)
- Language builtins
Submodule names are intentionally excluded from hierarchy lookup — a submodule foo.bar existing does not mean bar is in foo’s namespace. Only an explicit import binding makes it so.
For dotted names like self.method() or obj.method(), the resolver:
- Identifies the enclosing class for
selfaccess - For
self.attr.method(), followshas-aedges to find the attribute’s type and then looks up the method on that type - For
obj.method(), resolvesobjvia LEGB and then looks up the method on the resolved type
Import resolution
The import resolver handles all common forms:
| Import form | How it’s resolved |
|---|---|
import foo | Binding: current_module.foo → foo |
import foo.bar | Binding: current_module.foo → foo (top-level) |
from foo import bar | Binding: current_module.bar → foo.bar |
from foo import bar as b | Binding: current_module.b → foo.bar |
Re-exports are transparent. If server/__init__.py does from serpentine.server.app import create_app, an import of from serpentine.server import create_app creates an edge pointing at serpentine.server.app.create_app — the actual definition — not at the __init__.py re-export.
Current limitations on imports:
- Relative imports (
from . import foo,from ..utils import x) are not yet resolved and produce no edges from foo import *creates an edge to the module but not to individual names
Decorators
Decorators are handled as a special case of name reference:
- Attribute-form decorator (
@router.get("/"),@app.route(...)) where the object (router,app) resolves to a local definition → ahas-aedge from the object to the decorated function. This captures registration patterns like Flask routes or FastAPI endpoints. - Plain decorator (
@dataclass,@staticmethod) → areferencesedge from the decorated function to the decorator.
What Serpentine does not track
Serpentine is a static analyzer. It reads source files — it does not run your code. This means:
- Dynamic references are not tracked.
getattr(obj, name),importlib.import_module(path), and similar runtime lookups produce no edges. - Relative imports are not yet resolved. If your codebase relies heavily on
from . import x, those edges will be absent. from foo import *is partially handled. Animportsedge to the module is created, but individual name references through the star import won’t resolve to their definitions.- Generated code is not analyzed unless the generated files are present on disk at analysis time.
If a reference you expect to see is missing, it is most likely one of these cases.
Supported languages
| Language | File extensions |
|---|---|
| Python | .py |
| JavaScript | .js, .jsx |
| TypeScript | .ts, .tsx |
| Rust | .rs |
For a deeper look at the architecture decisions behind the parser — why tree-sitter, how the subscriber pipeline came together, and the tradeoffs of doing this in Rust — see How We Built a Polyglot Code Graph with Rust and Tree-Sitter.
Reporting a bug
If you see a missing edge, a phantom edge, or a node that should exist but doesn’t, please open an issue on GitHub. The most useful reports include:
- The language and a minimal source snippet that reproduces the problem
- The
serpentine analyzecommand you ran and the output you got - What you expected to see instead