Extraction Pipeline
Soul extraction transforms a corpus of tweets into a structured soul.md document through a two-pass LLM pipeline.
Input
- Standard path — up to 3,200 tweets via the X API (most recent, sorted by engagement)
- Archive path — full Twitter data export for deeper extraction (demo/premium)
Retweets are filtered out. Remaining tweets are sorted by engagement (likes + retweets) so the most representative content is prioritized. Replies are kept but flagged — they reveal relationship patterns and debate positions.
Pass 1: Categorize (Haiku)
Tweets are batched into groups of ~500 and processed in parallel (up to 5 concurrent). Each batch goes through Claude Haiku with a categorization prompt that extracts:
- Themes — specific recurring topics (not “crypto” but “MEV resistance”, “DAO governance failures”)
- Values — what they defend, promote, care about consistently
- Positions — strong opinions with the actual stance, not just the topic
- Communication patterns — blunt vs diplomatic, questions vs declarations
- Relationships — communities, allies, tribes, arguments
- Boundaries — what they reject, block, refuse to engage with
- Decision signals — priorities, tradeoffs, sacrifice patterns
Haiku is chosen for categorization because it's fast, cheap, and good enough for pattern extraction. The synthesis step (which requires judgment and prose quality) uses a more capable model.
Pass 2: Synthesize (Sonnet)
All batch analyses are merged and fed to Claude Sonnet with a synthesis prompt. The prompt instructs Sonnet to write in second person (“you”), be radically specific, and produce a document where every statement is falsifiable.
The output must fit within 10KB (the onchain storage limit). If the first pass exceeds this, Sonnet is asked to condense while preserving structure. As a last resort, the document is truncated at the nearest clean line break.
Cost
| Component | Model | Typical Cost |
|---|---|---|
| Categorization (all batches) | Haiku 3.5 | ~$0.08 |
| Synthesis | Sonnet 4 | ~$0.15 |
| Condensation (if needed) | Sonnet 4 | ~$0.10 |
| Total | ~$0.23 |
Quality Loop
The generated soul.md is presented in an editable markdown editor before minting. Users can refine, cut, or rewrite any section. The same editor is available for post-mint updates via tba.execute().
Quality validation: feed the soul.md to any LLM as a system prompt, ask questions the person hasn't publicly answered, and check if the responses pass the vibe check — not just factually plausible but tonally and ethically correct. A good soul.md produces consistent identity regardless of which model reads it.