idconvert/CLAUDE.md

# CLAUDE.md — IDconvert Project Instructions

This file is the persistent context for Claude Code sessions on the IDconvert project.
Read this fully before writing any code or making any architectural decisions.
This file supersedes all previous versions.

---

## What We Are Building

Two products that work together as a complete system:

### 1. IDexport (Free InDesign ExtendScript)
A free `.jsx` script the designer runs inside InDesign before converting.
It reads the live InDesign document through Adobe's DOM — the fully resolved
document state — and exports a structured JSON file containing everything the
web converter needs to produce an accurate DOCX.

This is the core architectural decision: the script runs where the data is clean.
InDesign has already resolved coordinates, applied master pages, calculated font
metrics, and resolved colour swatches. We capture that resolved state as JSON
rather than trying to reconstruct it from IDML.

**What IDexport does:**
- Reads every page, frame, story, table, and image through InDesign's DOM
- Exports resolved coordinates — no offset errors, no spread geometry math
- Exports text with exact character-level style application per run
- Exports font names as InDesign has resolved them
- Exports colour values resolved from swatches to hex
- Exports thread order as InDesign knows it
- Exports images with actual placed dimensions
- Exports table structure with exact column widths
- Exports paragraph and character style definitions
- Writes `idconvert_export.json` to the same folder as the INDD file

**What IDexport does NOT do:**
- No licensing, no phone home, no activation
- Does not require internet connection
- Does not modify the InDesign document
- Distributed as a plain `.jsx` file from the website, completely free

### 2. IDconvert (Paid Web SaaS)
A web tool that accepts the `idconvert_export.json` produced by IDexport
and outputs a clean, brand-consistent, fully editable DOCX.

Because the input JSON contains clean resolved data from InDesign's own
layout engine, the converter can produce accurate output without coordinate
reconstruction, IDML parsing complexity, or guessing.

**The value delivered:**
- Paragraph styles mapped to real Word styles — not manually formatted text
- Character styles (bold, italic, colour) as Word character styles
- Tables as real editable Word tables with correct column widths
- Images embedded at correct proportions in the right position in the flow
- Fonts applied correctly per run — brand-consistent in Word
- Hyperlinks active and clickable
- Document hierarchy preserved — headings, body, captions, tables of contents work
- Page numbers as native Word footer fields matched to InDesign styling
- Pre-conversion scan report with font warnings before credits are spent

**What IDconvert explicitly does NOT promise:**
- Pixel-perfect visual layout replication — this is not possible in Word
- Column layouts side by side — these flow sequentially for editability
- Master page decorative elements

---

## The User Flow

```
1. Designer opens document in InDesign
2. Designer runs IDexport script (Scripts panel → IDexport.jsx)
3. Script exports idconvert_export.json to same folder as INDD
4. Designer uploads idconvert_export.json to IDconvert web tool
5. IDconvert scans the JSON — shows font report and warnings
6. Designer confirms → conversion runs
7. Designer downloads DOCX
8. Designer sends DOCX to client (Richard)
9. Richard opens in Word — edits text, updates figures, re-exports PDF
```

---

## IDexport Script — Complete Specification

### Output Format: idconvert_export.json

```json
{
  "version": "1.0",
  "generator": "IDexport",
  "document": {
    "name": "Annual Report 2025",
    "page_width_pt": 595.28,
    "page_height_pt": 841.89,
    "page_count": 12,
    "facing_pages": true
  },
  "styles": {
    "paragraph": [
      {
        "name": "Body Text",
        "font": "Freight Text Pro",
        "size_pt": 10.5,
        "leading_pt": 14,
        "space_before_pt": 0,
        "space_after_pt": 6,
        "alignment": "left",
        "color_hex": "1A1A1A",
        "bold": false,
        "italic": false
      }
    ],
    "character": [
      {
        "name": "Bold Run",
        "font": "Freight Text Pro",
        "bold": true,
        "color_hex": null
      }
    ]
  },
  "colors": {
    "Brand Blue": "1a56db",
    "Black": "1A1A1A",
    "White": "FFFFFF"
  },
  "fonts_used": [
    { "name": "Freight Text Pro", "postscript": "FreightTextPro-Book" },
    { "name": "Proxima Nova", "postscript": "ProximaNova-Regular" }
  ],
  "pages": [
    {
      "page_number": 1,
      "width_pt": 595.28,
      "height_pt": 841.89,
      "items": [
        {
          "type": "text_frame",
          "id": "frame_001",
          "thread_id": "story_A",
          "thread_position": 1,
          "x_pt": 56.69,
          "y_pt": 70.87,
          "width_pt": 481.89,
          "height_pt": 200,
          "column_count": 1,
          "paragraphs": [
            {
              "style": "Heading 1",
              "text": "Annual Report 2025",
              "runs": [
                {
                  "text": "Annual Report 2025",
                  "style": null,
                  "bold": false,
                  "italic": false,
                  "color_hex": null,
                  "font": null,
                  "size_pt": null
                }
              ]
            },
            {
              "style": "Body Text",
              "text": "This report covers the financial year ending December 2025.",
              "runs": [
                {
                  "text": "This report covers the ",
                  "style": null,
                  "bold": false,
                  "italic": false,
                  "color_hex": null,
                  "font": null,
                  "size_pt": null
                },
                {
                  "text": "financial year",
                  "style": "Bold Run",
                  "bold": true,
                  "italic": false,
                  "color_hex": null,
                  "font": null,
                  "size_pt": null
                },
                {
                  "text": " ending December 2025.",
                  "style": null,
                  "bold": false,
                  "italic": false,
                  "color_hex": null,
                  "font": null,
                  "size_pt": null
                }
              ]
            }
          ]
        },
        {
          "type": "image_frame",
          "id": "frame_002",
          "x_pt": 56.69,
          "y_pt": 280,
          "width_pt": 481.89,
          "height_pt": 300,
          "image_name": "hero.jpg",
          "image_data_b64": "...",
          "fit_mode": "fill_proportionally"
        },
        {
          "type": "table",
          "id": "frame_003",
          "x_pt": 56.69,
          "y_pt": 600,
          "width_pt": 481.89,
          "column_widths_pt": [120, 120, 120, 121.89],
          "rows": [
            {
              "is_header": true,
              "cells": [
                {
                  "text": "Region",
                  "style": "Table Header",
                  "col_span": 1,
                  "row_span": 1,
                  "background_hex": "1a56db",
                  "text_color_hex": "FFFFFF"
                }
              ]
            }
          ]
        },
        {
          "type": "page_number",
          "id": "frame_004",
          "x_pt": 56.69,
          "y_pt": 800,
          "font": "Proxima Nova",
          "size_pt": 9,
          "color_hex": "666666",
          "alignment": "center"
        }
      ]
    }
  ]
}
```

### IDexport Script Implementation

```javascript
// IDexport.jsx — runs inside Adobe InDesign

#target indesign

function exportDocument() {
    if (app.documents.length === 0) {
        alert('No document is open. Please open an InDesign document first.');
        return;
    }

    var doc = app.activeDocument;

    if (!doc.saved) {
        alert('Please save your document before exporting.');
        return;
    }

    var output = {
        version: '1.0',
        generator: 'IDexport',
        document: extractDocumentInfo(doc),
        styles: extractStyles(doc),
        colors: extractColors(doc),
        fonts_used: extractFonts(doc),
        pages: extractPages(doc)
    };

    var exportPath = doc.filePath + '/idconvert_export.json';
    var file = new File(exportPath);
    file.encoding = 'UTF-8';
    file.open('w');
    file.write(JSON.stringify(output, null, 2));
    file.close();

    alert('IDexport complete.\nFile saved to:\n' + exportPath +
          '\n\nUpload idconvert_export.json to IDconvert to generate your Word document.');
}

function extractDocumentInfo(doc) {
    var page = doc.pages[0];
    return {
        name: doc.name.replace('.indd', ''),
        page_width_pt: page.bounds[3] - page.bounds[1],
        page_height_pt: page.bounds[2] - page.bounds[0],
        page_count: doc.pages.length,
        facing_pages: doc.documentPreferences.facingPages
    };
}

function extractStyles(doc) {
    var paragraphStyles = [];
    for (var i = 0; i < doc.paragraphStyles.length; i++) {
        var s = doc.paragraphStyles[i];
        if (s.name === '[No Paragraph Style]') continue;
        paragraphStyles.push({
            name: s.name,
            font: safeGet(s, 'appliedFont', null) ?
                  s.appliedFont.name : null,
            size_pt: safeGet(s, 'pointSize', null),
            leading_pt: safeGet(s, 'leading', null),
            space_before_pt: safeGet(s, 'spaceBefore', 0),
            space_after_pt: safeGet(s, 'spaceAfter', 0),
            alignment: alignmentToString(safeGet(s, 'justification', null)),
            color_hex: resolveColor(safeGet(s, 'fillColor', null)),
            bold: safeGet(s, 'fontStyle', '').toLowerCase().indexOf('bold') >= 0,
            italic: safeGet(s, 'fontStyle', '').toLowerCase().indexOf('italic') >= 0
        });
    }

    var characterStyles = [];
    for (var j = 0; j < doc.characterStyles.length; j++) {
        var cs = doc.characterStyles[j];
        if (cs.name === '[No Character Style]') continue;
        characterStyles.push({
            name: cs.name,
            font: safeGet(cs, 'appliedFont', null) ?
                  cs.appliedFont.name : null,
            bold: safeGet(cs, 'fontStyle', '').toLowerCase().indexOf('bold') >= 0,
            italic: safeGet(cs, 'fontStyle', '').toLowerCase().indexOf('italic') >= 0,
            color_hex: resolveColor(safeGet(cs, 'fillColor', null))
        });
    }

    return { paragraph: paragraphStyles, character: characterStyles };
}

function extractColors(doc) {
    var colors = {};
    for (var i = 0; i < doc.colors.length; i++) {
        var c = doc.colors[i];
        try {
            colors[c.name] = colorToHex(c);
        } catch(e) {}
    }
    return colors;
}

function extractFonts(doc) {
    var fonts = [];
    var seen = {};
    for (var i = 0; i < doc.fonts.length; i++) {
        var f = doc.fonts[i];
        if (!seen[f.name]) {
            seen[f.name] = true;
            fonts.push({
                name: f.name,
                postscript: safeGet(f, 'postscriptName', f.name)
            });
        }
    }
    return fonts;
}

function extractPages(doc) {
    var pages = [];
    for (var i = 0; i < doc.pages.length; i++) {
        var page = doc.pages[i];
        pages.push({
            page_number: page.name,
            width_pt: page.bounds[3] - page.bounds[1],
            height_pt: page.bounds[2] - page.bounds[0],
            items: extractPageItems(page, doc)
        });
    }
    return pages;
}

function extractPageItems(page, doc) {
    var items = [];
    var allItems = page.allPageItems;

    for (var i = 0; i < allItems.length; i++) {
        var item = allItems[i];
        try {
            var extracted = extractItem(item, page, doc);
            if (extracted) items.push(extracted);
        } catch(e) {
            // skip items that can't be extracted
        }
    }

    // Sort by Y then X — reading order top-to-bottom, left-to-right
    items.sort(function(a, b) {
        if (Math.abs(a.y_pt - b.y_pt) < 5) return a.x_pt - b.x_pt;
        return a.y_pt - b.y_pt;
    });

    return items;
}

function extractItem(item, page, doc) {
    var bounds = item.geometricBounds;
    // bounds = [top, left, bottom, right] relative to page
    var pageBounds = page.bounds;
    var x = bounds[1] - pageBounds[1];
    var y = bounds[0] - pageBounds[0];
    var width = bounds[3] - bounds[1];
    var height = bounds[2] - bounds[0];

    // Text frame
    if (item.constructor.name === 'TextFrame') {
        return extractTextFrame(item, x, y, width, height);
    }

    // Image / graphic frame
    if (item.constructor.name === 'Rectangle' ||
        item.constructor.name === 'Oval' ||
        item.constructor.name === 'GraphicFrame') {
        if (item.graphics.length > 0) {
            return extractImageFrame(item, x, y, width, height);
        }
        // Rectangle with no image — extract fill colour if significant
        return extractShapeFrame(item, x, y, width, height);
    }

    return null;
}

function extractTextFrame(frame, x, y, width, height) {
    // Detect page number marker
    if (hasAutoPageNumber(frame)) {
        return extractPageNumber(frame, x, y);
    }

    var paragraphs = [];
    try {
        var story = frame.parentStory;
        for (var i = 0; i < story.paragraphs.length; i++) {
            var para = story.paragraphs[i];
            paragraphs.push(extractParagraph(para));
        }
    } catch(e) {}

    // Thread info
    var threadId = frame.parentStory.id;
    var threadPos = 1;
    try {
        var prev = frame.previousTextFrame;
        while (prev !== null) {
            threadPos++;
            prev = prev.previousTextFrame;
        }
    } catch(e) {}

    return {
        type: 'text_frame',
        id: frame.id.toString(),
        thread_id: threadId.toString(),
        thread_position: threadPos,
        x_pt: x,
        y_pt: y,
        width_pt: width,
        height_pt: height,
        column_count: frame.textFramePreferences.textColumnCount || 1,
        paragraphs: paragraphs
    };
}

function extractParagraph(para) {
    var styleName = '[No Paragraph Style]';
    try { styleName = para.appliedParagraphStyle.name; } catch(e) {}

    var runs = [];
    for (var i = 0; i < para.characters.length; i++) {
        var char = para.characters[i];
        // Group into runs by style change
        var run = buildRun(char);
        if (runs.length > 0 && runsMatch(runs[runs.length-1], run)) {
            runs[runs.length-1].text += run.text;
        } else {
            runs.push(run);
        }
    }

    return {
        style: styleName,
        text: para.contents,
        runs: runs
    };
}

function buildRun(char) {
    var styleName = null;
    try {
        if (char.appliedCharacterStyle.name !== '[No Character Style]') {
            styleName = char.appliedCharacterStyle.name;
        }
    } catch(e) {}

    return {
        text: char.contents,
        style: styleName,
        bold: safeGet(char, 'fontStyle', '').toLowerCase().indexOf('bold') >= 0,
        italic: safeGet(char, 'fontStyle', '').toLowerCase().indexOf('italic') >= 0,
        color_hex: resolveColor(safeGet(char, 'fillColor', null)),
        font: safeGet(char, 'appliedFont', null) ? char.appliedFont.name : null,
        size_pt: safeGet(char, 'pointSize', null)
    };
}

function runsMatch(a, b) {
    return a.style === b.style &&
           a.bold === b.bold &&
           a.italic === b.italic &&
           a.color_hex === b.color_hex &&
           a.font === b.font &&
           a.size_pt === b.size_pt;
}

function extractImageFrame(frame, x, y, width, height) {
    var imageData = null;
    var imageName = 'image';
    try {
        var graphic = frame.graphics[0];
        imageName = graphic.itemLink.name;
        // Export image as base64 JPEG at screen resolution
        var tempFile = new File(Folder.temp + '/idconvert_temp.jpg');
        frame.exportFile(ExportFormat.JPG, tempFile, false);
        imageData = fileToBase64(tempFile);
        tempFile.remove();
    } catch(e) {}

    return {
        type: 'image_frame',
        id: frame.id.toString(),
        x_pt: x,
        y_pt: y,
        width_pt: width,
        height_pt: height,
        image_name: imageName,
        image_data_b64: imageData,
        fit_mode: 'fill_proportionally'
    };
}

function extractShapeFrame(frame, x, y, width, height) {
    // Only export shapes that have a visible fill — skip empty rectangles
    var fillColor = null;
    try {
        if (frame.fillColor.name !== 'None' &&
            frame.fillColor.name !== '[None]') {
            fillColor = colorToHex(frame.fillColor);
        }
    } catch(e) {}

    if (!fillColor) return null;

    return {
        type: 'shape',
        id: frame.id.toString(),
        x_pt: x,
        y_pt: y,
        width_pt: width,
        height_pt: height,
        fill_hex: fillColor
    };
}

function extractPageNumber(frame, x, y) {
    var run = null;
    try {
        run = frame.parentStory.characters[0];
    } catch(e) {}

    return {
        type: 'page_number',
        id: frame.id.toString(),
        x_pt: x,
        y_pt: y,
        font: run ? (run.appliedFont ? run.appliedFont.name : null) : null,
        size_pt: run ? safeGet(run, 'pointSize', 9) : 9,
        color_hex: run ? resolveColor(safeGet(run, 'fillColor', null)) : '000000',
        alignment: run ?
            alignmentToString(safeGet(run, 'justification', null)) : 'center'
    };
}

function hasAutoPageNumber(frame) {
    try {
        for (var i = 0; i < frame.parentStory.texts[0].characters.length; i++) {
            var char = frame.parentStory.texts[0].characters[i];
            if (char.contents === SpecialCharacters.AUTO_PAGE_NUMBER) {
                return true;
            }
        }
    } catch(e) {}
    return false;
}

// --- Utility functions ---

function resolveColor(colorObj) {
    if (!colorObj) return null;
    try {
        if (colorObj.name === 'None' || colorObj.name === '[None]') return null;
        return colorToHex(colorObj);
    } catch(e) { return null; }
}

function colorToHex(colorObj) {
    try {
        var vals = colorObj.colorValue;
        if (colorObj.space === ColorSpace.CMYK) {
            // Convert CMYK to RGB
            var c = vals[0]/100, m = vals[1]/100,
                y = vals[2]/100, k = vals[3]/100;
            var r = Math.round(255 * (1-c) * (1-k));
            var g = Math.round(255 * (1-m) * (1-k));
            var b = Math.round(255 * (1-y) * (1-k));
            return toHex(r) + toHex(g) + toHex(b);
        }
        if (colorObj.space === ColorSpace.RGB) {
            return toHex(vals[0]) + toHex(vals[1]) + toHex(vals[2]);
        }
    } catch(e) {}
    return '000000';
}

function toHex(n) {
    var h = Math.max(0, Math.min(255, Math.round(n))).toString(16);
    return h.length === 1 ? '0' + h : h;
}

function alignmentToString(justification) {
    if (!justification) return 'left';
    var map = {
        1: 'left', 2: 'center', 3: 'right', 4: 'justify',
        1514227313: 'left', 1514731619: 'center',
        1514731618: 'right', 1514599026: 'justify'
    };
    return map[justification] || 'left';
}

function fileToBase64(file) {
    file.open('r');
    file.encoding = 'BINARY';
    var content = file.read();
    file.close();
    // InDesign ExtendScript doesn't have btoa — use manual base64
    return btoa(content);
}

function safeGet(obj, prop, fallback) {
    try { return obj[prop]; } catch(e) { return fallback; }
}

// Run
exportDocument();
```

---

## Web Tool — JSON to DOCX Conversion Pipeline

### Input Validation Gate

The upload endpoint validates `idconvert_export.json` before processing:

```python
# idml/validator.py — renamed to export/validator.py
def validate_export_json(content: bytes) -> dict:
    # 1. Size check — JSON should not exceed 100MB
    if len(content) > 100 * 1024 * 1024:
        raise FileValidationError('File too large', 'FILE_TOO_LARGE')

    # 2. Valid JSON
    try:
        data = json.loads(content)
    except json.JSONDecodeError:
        raise FileValidationError(
            'This does not appear to be a valid IDexport file. '
            'Please run IDexport.jsx in InDesign and upload the '
            'idconvert_export.json file it produces.',
            'INVALID_JSON'
        )

    # 3. IDexport signature check
    if data.get('generator') != 'IDexport':
        raise FileValidationError(
            'This JSON file was not produced by IDexport. '
            'Please run IDexport.jsx in InDesign to generate '
            'a compatible export file.',
            'WRONG_GENERATOR'
        )

    # 4. Version check
    if data.get('version') not in ['1.0']:
        raise FileValidationError(
            'This IDexport file was created with an incompatible version. '
            'Please download the latest IDexport.jsx from IDconvert.',
            'VERSION_MISMATCH'
        )

    # 5. Required fields
    required = ['document', 'styles', 'pages', 'fonts_used']
    for field in required:
        if field not in data:
            raise FileValidationError(
                f'IDexport file is missing required field: {field}. '
                'Please re-run IDexport.jsx and try again.',
                'MISSING_FIELD'
            )

    return data
```

### Conversion Pipeline

```python
# services/conversion_service.py

class ConversionService:
    def convert(self, export_data: dict) -> bytes:

        # 1. Build document model from JSON
        document = IDExportParser.parse(export_data)

        # 2. Create Word document
        doc = Document()
        self._apply_page_setup(doc, document)
        self._register_styles(doc, document.styles)

        # 3. Process pages in order
        for page in document.pages:
            self._process_page(doc, page, document)

        # 4. Add footer with page numbers
        if document.page_number_style:
            FooterFactory.apply(doc, document.page_number_style)

        # 5. Return DOCX bytes
        buffer = BytesIO()
        doc.save(buffer)
        return buffer.getvalue()

    def _process_page(self, doc, page, document):
        # Items are already sorted in reading order by IDexport
        for item in page.items:
            if item.type == 'text_frame':
                self._add_text_frame(doc, item, document)
            elif item.type == 'image_frame':
                self._add_image(doc, item)
            elif item.type == 'table':
                self._add_table(doc, item, document)
            elif item.type == 'page_number':
                pass  # handled by footer
            elif item.type == 'shape':
                pass  # shapes not converted at MVP

        # Page break between pages (except last)
        doc.add_page_break()

    def _add_text_frame(self, doc, item, document):
        for para_data in item.paragraphs:
            p = doc.add_paragraph()
            style = StyleFactory.get_word_style(
                para_data.style, document.styles
            )
            if style:
                p.style = style

            for run_data in para_data.runs:
                run = p.add_run(run_data.text)
                StyleFactory.apply_run_formatting(run, run_data)
```

### Style Mapping — The Core Value

```python
# docx/factories.py

HEADING_STYLE_MAP = {
    # InDesign style name patterns → Word built-in style
    'heading 1': 'Heading 1',
    'h1': 'Heading 1',
    'title': 'Heading 1',
    'heading 2': 'Heading 2',
    'h2': 'Heading 2',
    'subheading': 'Heading 2',
    'heading 3': 'Heading 3',
    'h3': 'Heading 3',
    'caption': 'Caption',
    'table header': 'Table Grid',
}

class StyleFactory:
    @staticmethod
    def register_styles(doc: Document, styles: IDExportStyles) -> None:
        """Register all InDesign paragraph styles as Word custom styles"""
        for style_data in styles.paragraph:
            word_style_name = StyleFactory._map_to_builtin(style_data.name)
            if word_style_name:
                # Map to Word built-in — modify it to match InDesign values
                style = doc.styles[word_style_name]
            else:
                # Create custom style
                try:
                    style = doc.styles.add_style(
                        style_data.name, WD_STYLE_TYPE.PARAGRAPH
                    )
                except:
                    style = doc.styles[style_data.name]

            # Apply InDesign style properties
            if style_data.font:
                style.font.name = style_data.font
            if style_data.size_pt:
                style.font.size = Pt(style_data.size_pt)
            if style_data.color_hex:
                style.font.color.rgb = RGBColor.from_string(style_data.color_hex)
            if style_data.bold is not None:
                style.font.bold = style_data.bold
            if style_data.italic is not None:
                style.font.italic = style_data.italic
            if style_data.space_before_pt:
                style.paragraph_format.space_before = Pt(style_data.space_before_pt)
            if style_data.space_after_pt:
                style.paragraph_format.space_after = Pt(style_data.space_after_pt)
            if style_data.leading_pt:
                style.paragraph_format.line_spacing = Pt(style_data.leading_pt)
            if style_data.alignment:
                style.paragraph_format.alignment = ALIGNMENT_MAP[style_data.alignment]

    @staticmethod
    def apply_run_formatting(run, run_data) -> None:
        """Apply character-level formatting to a Word run"""
        if run_data.bold:
            run.bold = True
        if run_data.italic:
            run.italic = True
        if run_data.font:
            run.font.name = run_data.font
        if run_data.size_pt:
            run.font.size = Pt(run_data.size_pt)
        if run_data.color_hex:
            run.font.color.rgb = RGBColor.from_string(run_data.color_hex)

    @staticmethod
    def _map_to_builtin(style_name: str) -> str | None:
        return HEADING_STYLE_MAP.get(style_name.lower().strip())
```

---

## Architecture Patterns

Three patterns throughout. No exceptions without discussion.

```
FACTORIES     → build objects from raw data
SERVICES      → orchestrate business logic
REPOSITORIES  → own all data access
ROUTERS       → handle HTTP only
```

### Routers — HTTP only
```python
@router.post('/scan', response_model=ScanReport)
async def scan(
    file: UploadFile,
    user=Depends(get_current_user),
    service: ScanService = Depends(get_scan_service)
):
    return await service.scan(file, user.id)
```

### Services — business logic only
```python
class ScanService:
    async def scan(self, file: UploadFile, user_id: str) -> ScanReport:
        content = await file.read()
        data = self.validator.validate(content)
        report = self.scanner.scan(data)
        await self.repo.cache_scan(user_id, report)
        return report
```

### Repositories — data access only
```python
class ConversionRepository:
    async def cache_scan(self, user_id: str, report: dict) -> str:
        record = await self.pb.collection('scan_cache').create({
            'user': user_id, 'report': report
        })
        return record.id
```

### Factories — object construction only
```python
class StyleFactory:
    @staticmethod
    def from_export_style(data: dict) -> IDExportStyle:
        return IDExportStyle(
            name=data['name'],
            font=data.get('font'),
            size_pt=data.get('size_pt'),
            # ...
        )
```

---

## Repository Structure

```
idconvert/
├── CLAUDE.md
├── BRIEF.md
│
├── IDexport.jsx                   ← InDesign script — single file, distributed free
│
├── pb_hooks/
│   └── credits.pb.js              ← atomic credit logic
│
├── backend/
│   ├── main.py                    ← FastAPI app init, middleware, router registration
│   ├── config.py                  ← pydantic-settings, env vars only
│   ├── dependencies.py            ← FastAPI DI wiring
│   │
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── upload.py              ← POST /scan, POST /convert
│   │   ├── payments.py            ← POST /payments/create-intent, /webhook
│   │   └── users.py               ← GET /users/me
│   │
│   ├── services/
│   │   ├── __init__.py
│   │   ├── scan_service.py        ← scan JSON, build report
│   │   ├── conversion_service.py  ← parse JSON, build DOCX
│   │   ├── payment_service.py     ← Stripe handling
│   │   └── user_service.py        ← user reads
│   │
│   ├── repositories/
│   │   ├── __init__.py
│   │   ├── user_repository.py
│   │   ├── conversion_repository.py
│   │   └── transaction_repository.py
│   │
│   ├── export/                    ← IDexport JSON domain
│   │   ├── __init__.py
│   │   ├── validator.py           ← JSON validation gate
│   │   ├── scanner.py             ← lightweight scan for report
│   │   ├── parser.py              ← full parse into domain models
│   │   ├── factories.py           ← IDExportDocumentFactory, IDExportPageFactory
│   │   ├── fonts.py               ← font classification and substitution map
│   │   └── models.py              ← IDExportDocument, IDExportPage,
│   │                                 IDExportFrame, IDExportParagraph,
│   │                                 IDExportRun, IDExportStyle
│   │
│   ├── docx/                      ← DOCX output domain
│   │   ├── __init__.py
│   │   ├── builder.py             ← orchestrates DOCX construction
│   │   ├── factories.py           ← StyleFactory, TableFactory,
│   │   │                             ImageFactory, FooterFactory
│   │   ├── units.py               ← Pt(), Inches() helpers
│   │   └── models.py              ← internal DOCX domain models
│   │
│   ├── models/                    ← Pydantic request/response schemas
│   │   ├── __init__.py
│   │   ├── scan.py                ← ScanReport, FontEntry, Warning
│   │   ├── conversion.py          ← ConversionRequest, ConversionResult
│   │   ├── payment.py
│   │   └── user.py
│   │
│   └── core/
│       ├── __init__.py
│       ├── exceptions.py
│       ├── logging.py
│       └── storage.py
│
├── frontend/
│   ├── src/
│   │   ├── main.js
│   │   ├── App.vue
│   │   ├── router/index.js
│   │   ├── stores/
│   │   │   ├── auth.js
│   │   │   ├── upload.js
│   │   │   └── ui.js
│   │   ├── services/
│   │   │   ├── api.js
│   │   │   ├── uploadService.js
│   │   │   ├── paymentService.js
│   │   │   ├── userService.js
│   │   │   └── factories/
│   │   │       ├── scanFactory.js
│   │   │       └── userFactory.js
│   │   ├── components/
│   │   │   ├── common/
│   │   │   │   ├── BaseButton.vue
│   │   │   │   ├── BaseCard.vue
│   │   │   │   ├── BaseBadge.vue
│   │   │   │   ├── BaseSpinner.vue
│   │   │   │   └── BaseAlert.vue
│   │   │   ├── upload/
│   │   │   │   ├── UploadZone.vue
│   │   │   │   └── FilePreview.vue
│   │   │   ├── scan/
│   │   │   │   ├── ScanReport.vue
│   │   │   │   ├── StatCards.vue
│   │   │   │   ├── FontReport.vue
│   │   │   │   └── WarningList.vue
│   │   │   ├── conversion/
│   │   │   │   ├── ProcessingState.vue
│   │   │   │   └── DownloadState.vue
│   │   │   ├── pricing/
│   │   │   │   └── PricingCard.vue
│   │   │   └── layout/
│   │   │       ├── AppNav.vue
│   │   │       └── AppFooter.vue
│   │   └── views/
│   │       ├── HomeView.vue
│   │       ├── PricingView.vue
│   │       ├── DashboardView.vue
│   │       └── LoginView.vue
│   ├── vite.config.js
│   ├── tailwind.config.js
│   └── package.json
│
├── tests/
│   ├── backend/
│   │   ├── export/
│   │   │   ├── test_validator.py
│   │   │   ├── test_factories.py
│   │   │   └── test_scanner.py
│   │   ├── docx/
│   │   │   ├── test_factories.py
│   │   │   └── test_builder.py
│   │   └── services/
│   │       ├── test_scan_service.py
│   │       └── test_conversion_service.py
│   └── fixtures/
│       └── sample_export.json     ← real IDexport output for integration tests
│
├── docker-compose.yml
└── .env.example
```

---

## Tech Stack

| Layer | Technology |
|---|---|
| InDesign script | ExtendScript (.jsx) — IDexport.jsx |
| Web frontend | Vue 3 + Vite + Tailwind CSS |
| Backend API | FastAPI (Python) |
| Export JSON parsing | Python `json` + Pydantic |
| DOCX generation | `python-docx` |
| Font registry | Custom Python classification dict |
| Auth + credits | PocketBase + JS hooks |
| File storage | S3-compatible object storage |
| Payments | Stripe |
| Deployment | Dokploy |

Note: `zipfile` and `lxml` are no longer needed — IDML parsing is replaced
by JSON parsing. Remove these from requirements.txt.

---

## Pre-Conversion Scan Report

Same scan report UX as before — runs on the uploaded JSON before credits are spent.
The scanner reads the JSON and collects:

- Page count, frame count, image count, table count
- Font classification (safe / professional / unknown) with substitute mapping
- Warnings for any items that cannot be converted (shapes, unsupported elements)
- IDexport version compatibility check

Font classification and substitution map — unchanged from previous version.

---

## Credit Pack Pricing

**Every registered user gets 1 free conversion automatically.**
Tracked via `free_used: bool` on the user record. Not a tier.

| Pack | Price | Credits | Per Conversion |
|---|---|---|---|
| Starter | $19 | 5 credits | $3.80 |
| Studio | $59 | 20 credits | $2.95 |
| Agency | $149 | 60 credits | $2.48 |

---

## Credits — PocketBase Hooks

All credit logic in `pb_hooks/credits.pb.js` using SQLite transactions.
Unchanged from previous version — hook fires on conversions collection create,
deducts atomically, refunds on failed status update.

---

## File Validation

Upload accepts only `idconvert_export.json` files produced by IDexport.jsx.
Validation checks: size limit, valid JSON, generator signature, version
compatibility, required fields present.
Reject with specific helpful message if generator is not IDexport.

---

## Anti-Abuse

- Email verification with disposable domain blocking
- Browser fingerprint via FingerprintJS
- File hash — same JSON file cannot be converted free twice
- IP rate limiting via slowapi (Redis-backed for multi-instance)

File hash is still effective: each `idconvert_export.json` is unique per
InDesign document and per export run (includes timestamp).

---

## UI Design

Exact ilovepdf.com replication — all specs unchanged from previous version.
Upload zone accepts `.json` files only.
Update file type label: "Drop idconvert_export.json here"
Update trust line: "Run IDexport.jsx in InDesign to generate your export file"

---

## User-Facing Feature List

- **Text and styles preserved** — every paragraph style from InDesign is recreated as a real Word style your team can edit globally
- **Character formatting carried over** — bold, italic, colour and font applied at the character level, not lost in conversion
- **Images included** — all placed images are embedded at their correct proportions
- **Tables converted** — tables come across as real Word tables with correct column widths, ready to edit
- **Hyperlinks active** — all links survive the conversion and remain clickable
- **Page numbers matched** — font, size and colour of page numbers are matched as live Word page numbers
- **Font report included** — before converting, you are told exactly which fonts need to be installed on your client's machine
- **Honest warnings upfront** — anything that cannot be converted is flagged before you spend a credit

---

## Accepted Limitations (Communicate Clearly in UI)

- Column layouts flow sequentially — side-by-side columns become top-to-bottom flow
- Decorative shapes and rules are not converted
- Master page decorative elements are excluded
- Pixel-perfect visual replication is not the goal — accurate, editable content is

---

## Key Changes From Previous Architecture

| Previous | Now |
|---|---|
| Parses IDML ZIP + XML | Parses IDexport JSON |
| Reconstructs coordinates from IDML | Uses InDesign-resolved coordinates from JSON |
| IDML threading detection | Thread order resolved by InDesign, exported in JSON |
| Anchored text boxes for layout | Flowing document — sequential paragraphs |
| `zipfile` + `lxml` dependencies | `json` + Pydantic only |
| IDtag (Phase 2 companion script) | IDexport (Phase 1 core script) |
| Upload .idml file | Upload idconvert_export.json |
| IDML domain (`idml/`) | Export domain (`export/`) |

---

## Competitive Positioning

| | ID2Office | IDconvert |
|---|---|---|
| Requires InDesign on designer machine | Yes | Yes (to run IDexport) |
| Requires InDesign on client machine | No | No |
| Price | $229/year | From $19 |
| Pre-conversion warnings | No | Yes |
| Font report | No | Yes |
| Style fidelity | High (native plugin) | High (resolved JSON) |
| Layout fidelity | High | Low (intentionally flows) |
| Best for | Designers needing visual layout | Designers needing editable content handoff |

---

## Notes for Claude Code

- Always check this file before making architectural decisions
- The input is `idconvert_export.json` — not IDML, not .indd
- The output is a flowing Word document — not positioned text boxes
- `python-docx` is the DOCX library — use `doc.add_paragraph()`, `doc.add_picture()`, `doc.add_table()`
- Style registration happens once at document creation — all styles registered before any content is added
- All PocketBase calls go through repository classes only
- All frontend API calls go through service files only
- Use dependency injection for all services and repositories
- All exceptions use the custom hierarchy in `core/exceptions.py`
- Use structlog for all logging — never print()
- Ask before adding dependencies not listed in the tech stack
- Phase 2 items stubbed with TODO comments only — not implemented
- Write docstrings on all public methods in services and repositories
- Every new module gets a corresponding stub test file in tests/