Skip to content

ADR-002: Markdown Model

Status

Accepted

Context

A note-taking app must decide how to represent notes internally. The key question:

What is the source of truth for a note's content?

Options:

  1. Markdown text - Store raw markdown, parse on demand
  2. AST (Abstract Syntax Tree) - Store parsed structure
  3. Hybrid - Store both, keep them in sync

This decision affects:

  • Data portability
  • Export fidelity
  • Parse error handling
  • Feature implementation complexity

Decision

Markdown text is the canonical source of truth.

Preservation Invariants

InvariantRule
Raw text is sacredUser-typed markdown is NEVER auto-modified
AST is ephemeralExists only for parsing, never persisted as authority
Parse errors don't blockIf parse fails, save raw markdown anyway
No normalizationNever reformat whitespace, headings, lists
No serialization from ASTNever reconstruct markdown from parsed AST

Implementation

typescript
interface Note {
  id: NoteId;
  content: string; // RAW MARKDOWN - canonical source
  metadata: NoteMetadata; // Derived from content, computed on save
}

interface NoteMetadata {
  title: string; // Extracted from first H1 or filename
  createdAt: Timestamp;
  updatedAt: Timestamp;
  tags: readonly Tag[]; // Extracted from #tags in content
  wordCount: number;
}

Derived Data

These are computed from markdown on save, stored for queries:

  • Title (first H1 or "Untitled")
  • Tags (from #hashtags in content)
  • Word count
  • Backlinks (from [[wikilinks]])

Golden Rule

"If the user typed it, we keep it."

Consequences:

  • No "prettify markdown" feature
  • No auto-fix for broken links
  • No whitespace normalization
  • Export = exact copy of stored markdown

Consequences

Positive

  • Portable: Export is always valid, standalone markdown
  • Predictable: User knows exactly what's stored
  • Recoverable: Parse errors can't corrupt data
  • Simple: One source of truth, no sync issues
  • Compatible: Works with any markdown ecosystem

Negative

  • Performance: Must parse on every read for features
  • Consistency: Derived data could get out of sync
  • Features limited: Can't do AST-based transformations

Risks

  • Performance with large notes (>100KB)
  • Mitigation: Cache parsed results, lazy parsing

Alternatives Considered

1. AST as Source of Truth

Store the parsed AST, serialize to markdown on export.

Rejected because:

  • Information loss during parsing (whitespace, formatting choices)
  • Different markdown flavors produce different ASTs
  • Reconstruction never produces identical output
  • Lock-in to specific parser implementation

2. Hybrid Storage

Store both raw markdown and AST, keep in sync.

Rejected because:

  • Complexity of keeping them synchronized
  • Which one wins on conflict?
  • Double storage space
  • More code to maintain

3. Block-Based (Notion-style)

Use structured blocks as the model.

Rejected because:

  • Not markdown-first
  • Export requires serialization
  • Different product identity
  • Higher complexity