idconvert/BRIEF.md

377 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# IDconvert — Project Brief
**Last updated:** April 2026
**Status:** Pre-development — Phase 1 MVP ready to build
---
## The Problem
Designers build documents in Adobe InDesign. Their clients — NGOs, professional associations, corporates — need editable Word versions to update content themselves. Rebuilding a document from scratch in Word is unbillable, time-consuming, and beneath a professional designer's workflow.
The only serious existing solution is ID2Office by Recosoft — a native InDesign plugin at $229/year that requires InDesign to be installed. There is no standalone web tool that accepts an exported IDML file and returns a clean DOCX without InDesign.
---
## The Solution
**IDconvert** — a web-based IDML to DOCX conversion tool. No InDesign required. Upload the IDML file, receive an editable Word document that preserves layout, typography, images, tables and reading flow.
**IDtag** — a free InDesign ExtendScript that prepares complex multi-column documents for more accurate conversion. Phase 2 only.
---
## Products
### IDtag (Phase 2 — Free)
- Plain `.jsx` ExtendScript file
- Designer runs it inside InDesign before exporting IDML
- Tags threaded text frames with metadata for accurate multi-column reconstruction
- No licensing, no activation, distributed freely from the website
- Serves as a top-of-funnel entry point into IDconvert
### IDconvert (Phase 1 — Paid)
- Web SaaS, no software installation required
- Upload IDML → scan → confirm → download DOCX
- Credit-based pricing
- Pre-conversion scan report with font warnings and layout notices
---
## How It Works (User Flow)
```
1. Designer runs IDtag in InDesign (Phase 2, optional for single column)
2. Designer exports IDML from InDesign normally
3. Designer or client uploads IDML to IDconvert
4. IDconvert scans the file — shows font report and warnings
5. User reviews notices, confirms conversion
6. DOCX downloads — ready to edit in Word
```
---
## What the DOCX Contains
- All text, fully editable, with paragraph and character styles mapped to Word equivalents
- Layout preserved via anchored linked text boxes — single and multi-column
- Images embedded and positioned to match original layout
- Tables with structure and formatting intact
- Hyperlinks active and clickable
- Page numbers as native Word footer fields, matched font/size/colour
- Clear warnings for anything that could not be perfectly replicated
---
## What It Does Not Do
- Pixel-perfect PDF replication (not possible in Word)
- Master page headers and footers (Phase 1 exclusion)
- Complex contour text wrap (simplified to square wrap with warning)
- Keynote export
- Require InDesign to be installed
---
## Target Users
### Maya — Graphic Designer
Freelance, 6 years experience. Designs annual reports, brand guides, cookbooks, association publications in InDesign. Needs to hand off editable Word documents to clients as a standard project deliverable. Builds IDconvert into her production workflow and bills clients for the conversion as part of her production fee.
**Pain:** Client always asks for Word version after the PDF is approved. Rebuilding it is unpaid time.
### Richard — Operations Manager
Regional engineering firm. Receives designed documents from an external studio. Needs to update figures, swap names, and edit body copy before board submissions — in Word, because that is what his team uses.
**Pain:** Designer sends a PDF. Richard cannot edit it. He needs a Word version that still looks professional.
---
## Competitive Positioning
| | ID2Office | IDconvert |
|---|---|---|
| Requires InDesign | Yes | No |
| Price | $229/year | From $9 (credit pack) |
| Delivery model | Native plugin | Web tool |
| Multi-column support | Yes (years of iteration) | Phase 2 |
| Pre-conversion warnings | No | Yes |
| Font report | No | Yes |
| Caribbean market focus | No | Yes |
---
## Pricing
### Credit Packs
| Pack | Price | Per Conversion |
|---|---|---|
| Starter — 5 credits | $19 | $3.80 |
| Studio — 20 credits | $59 | $2.95 |
| Agency — 60 credits | $149 | $2.48 |
Free tier: 1 conversion. Gated by email verification, browser fingerprint, and file hash.
No ads. Premium positioning — ID2Office charges $229/year with no scan report, no font intelligence, and requires InDesign installed.
No subscription at MVP — introduce after observing real usage frequency.
---
## Technical Architecture
### IDML Structure
IDML is a ZIP archive of XML files:
- `designmap.xml` — master manifest
- `Spreads/*.xml` — page geometry, frame positions
- `Stories/*.xml` — text content with formatting
- `Resources/Fonts.xml` — all fonts used
- `Resources/Styles.xml` — paragraph and character styles
Content (Stories) and layout (Spreads) are separate data sources joined by a `ParentStory` attribute on each text frame.
### Conversion Pipeline
```
Upload IDML
SCAN: lightweight XML parse
- Count pages, stories, images, tables
- Classify fonts (safe / professional / unknown)
- Detect text wrap types
- Detect threading complexity
- Collect warnings array
Display scan report to user
User confirms → CONVERT
- Parse Spreads for frame geometry
- Parse Stories for text content
- Join on ParentStory ID
- Build DOCX:
Every frame → anchored text box
Threaded frames → linked text box chain
Images → anchored DrawingML
Page numbers → native Word footer field
Styles → mapped Word paragraph styles
Download DOCX
```
### DOCX Layout Model
All content uses anchored text boxes — unified model, no strategy switching:
- Single frame → anchored text box, no linking
- Threaded story → linked text box chain via `w:linkTxbx`
- Images → anchored DrawingML at matching coordinates
- Text wrap → `wrapSquare` for all detected wraps (downgraded, user warned)
### Unit Conversion
- IDML: points
- DOCX positions: EMUs (1 inch = 914400 EMUs)
- DOCX font sizes: half-points (DXA)
- Formula: `emu = points * 914400 / 72`
- IDML frame positions are relative to spread center — must offset by page width
### Two API Endpoints
```
POST /scan costs 0 credits — returns scan report JSON
POST /convert costs 1 credit — returns DOCX file
```
Scan result cached by session. Convert reuses cached parse.
---
## Tech Stack
| Layer | Technology |
|---|---|
| InDesign script (Phase 2) | ExtendScript .jsx |
| Web frontend | Vue 3 + Vite + Tailwind CSS |
| Backend API | FastAPI (Python) |
| IDML parsing | Python zipfile + lxml |
| DOCX generation | python-docx |
| Font registry | Custom Python classification dict |
| Auth + credits | PocketBase |
| File storage | S3-compatible object storage |
| Payments | Stripe |
| Deployment | Dokploy |
| Phase 2 pipeline | n8n |
---
## Pre-Conversion Font Report
Every font in the document is classified and reported before the user spends a credit:
```
FONTS
─────────────────────────────────────────
✓ Arial Available in Word — no action needed
✓ Georgia Available in Word — no action needed
⚠ Freight Text Pro Not a Word system font
Used for: Body text (pages 18)
Action: Install on client machine
If missing: Word substitutes Georgia — minor reflow possible
⚠ Proxima Nova Not a Word system font
Used for: Headings, captions (all pages)
Action: Install on client machine
If missing: Word substitutes Calibri — spacing may differ
◎ DM Mono Unknown font
Used for: Pull quotes (pages 2, 8)
Action: Verify availability or supply to client
If missing: Word substitutes Courier New
─────────────────────────────────────────
💡 Supply a /Fonts folder alongside the DOCX
and ask your client to install before opening.
```
---
## File Validation and Security
Every uploaded file passes a strict validation gate before any processing occurs. This protects against malicious uploads, renamed files, raw .indd files, and ZIP bombs.
### Validation sequence (in order):
1. File size check — reject above 50MB before touching the file
2. Magic byte check — read actual file signature, not the extension. Must be `application/zip`
3. Open as ZIP — reject corrupted or fake archives
4. ZIP bomb protection — sum uncompressed sizes before extracting anything. Reject above 200MB uncompressed
5. Entry count limit — reject archives with more than 500 entries
6. Path traversal check — reject any entry with `../` in the path
7. IDML structure check — must contain `designmap.xml`, `Spreads/`, `Stories/`, `Resources/`
8. XML validity check — parse `designmap.xml` to confirm it is real XML, not injected content
9. Executable content check — reject if any entry has an executable extension (.exe, .sh, .py, .js etc.)
### Common innocent mistake — .indd instead of .idml
The most frequent user error is uploading a raw InDesign `.indd` file instead of an IDML export.
This gets a specific, helpful error message:
> "This appears to be an InDesign document file. Please export it as IDML first: File → Export → InDesign Markup (IDML)"
All other rejections get a clear but non-specific error — never tell a malicious actor which check they failed.
---
Layer 1 — Email verification with disposable domain blocking (Abstract API or Kickbox)
Layer 2 — Browser fingerprinting via FingerprintJS (free tier) — persists across sessions
Layer 3 — File hashing — same IDML file cannot be converted free across multiple accounts
Layer 4 — IP rate limiting via slowapi — 1 free conversion per IP per day
Start with layers 1, 2, and 3 at MVP. Layer 3 is especially effective for this product since IDML files are project-specific assets.
---
## UI Design Direction
Reference: ilovepdf.com and tools.pdf24.org
Core principle: upload zone is the entire hero. No marketing content above the fold on the tool page.
```
┌───────────────────────────────────────┐
│ IDconvert logo Login/Signup│
├───────────────────────────────────────┤
│ Convert InDesign to Word. │
│ Upload your IDML file to begin. │
│ │
│ ┌─────────────────────────────────┐ │
│ │ Drop IDML file here or │ │
│ │ [ Browse files ] │ │
│ └─────────────────────────────────┘ │
│ 🔒 Files deleted after 1 hour │
├───────────────────────────────────────┤
│ SCAN REPORT (appears after upload) │
│ ┌──────────┐ ┌──────────┐ │
│ │ 12 pages │ │ 8 stories│ ... │
│ └──────────┘ └──────────┘ │
│ FONTS │
│ ✓ Arial Safe │
│ ⚠ Freight Text Pro Install needed │
│ NOTICES │
│ ⚠ Text wrap simplified — pages 4, 7 │
│ 1 credit · 4 remaining │
│ [ Cancel ] [ Convert → ] │
├───────────────────────────────────────┤
│ Footer — minimal, links only │
└───────────────────────────────────────┘
```
Palette:
- Background: #F8F9FA
- Primary CTA: #1a56db (Convert button only)
- Warning: #F59E0B
- Success: #10B981
- UI type: Inter or DM Sans
- Technical/filenames: DM Mono
---
## Development Timeline
### Phase 1 — IDconvert MVP (Weeks 15)
**Week 1 — IDML Parser**
- Unzip and read IDML structure
- Extract tagged text frames in correct order
- Extract paragraph styles and map to Word styles
- Extract inline images
- Font extraction and classification
**Week 2 — DOCX Builder**
- Anchored text box generation from frame geometry
- Linked text box chains for threaded stories
- Paragraph and character style mapping
- Image embedding as anchored DrawingML
- Page number footer generation
**Week 3 — Scan Endpoint + Warning System**
- Lightweight pre-conversion parse
- Font report with substitute mapping and usage context
- Wrap detection and downgrade warnings
- Warnings array with page-level detail
- Session-based scan cache
**Week 4 — Web Tool UI + Credits**
- Vue 3 frontend — upload zone, scan report, font report, warning list
- PocketBase auth and credit system
- Stripe credit pack checkout
- Convert endpoint wiring
- File deletion after 1 hour
**Week 5 — Testing and Launch**
- End-to-end test with real annual report, brand guide, and cookbook files
- Edge case cleanup
- Soft launch
### Phase 2 — IDtag + Multi-Column (Weeks 69)
**Week 6** — IDtag ExtendScript (frame tagging, thread metadata, panel UI)
**Week 7** — Parser update (read IDtag metadata from IDML labels)
**Week 8** — DOCX builder update (enhanced multi-column reconstruction)
**Week 9** — Testing with magazine layouts, multi-column reports, release
---
## MVP Feature List (User-Facing Language)
- **Layout preserved** — your document opens in Word with the same page structure, columns and content positioning as the original design
- **Text is fully editable** — all text can be selected, edited and reformatted directly in Word without any special software
- **Styles carried over** — headings, body text, captions and other text styles are mapped to equivalent Word styles so formatting stays consistent when you edit
- **Images included** — all images from the original design are embedded in the Word document and positioned to match the layout
- **Tables converted** — tables come across with their structure, content and basic formatting intact and ready to edit
- **Clickable links preserved** — any hyperlinks in the original document remain active and clickable in the Word version
- **Page numbers matched** — page numbers are reproduced in Word using the same font, size and colour as the original design
- **Font report included** — before converting, you are told exactly which fonts need to be installed on your computer for the document to display correctly
- **Honest warnings upfront** — any design features that cannot be perfectly replicated in Word are clearly explained before you convert, so there are no surprises
---
## Accepted Limitations (Document Clearly in UI)
- Text wrap inside anchored text boxes is unreliable — simplified to square wrap, user warned
- Font reflow when client lacks designer's fonts — user warned with install instructions
- Heavy text editing may cause text box overflow in Word — include how-to-edit guide with download
- Master page headers and footers excluded at MVP
- Pixel-perfect PDF match is not the goal — structurally identical and editable is the goal