377 lines
15 KiB
Markdown
377 lines
15 KiB
Markdown
# IDconvert — Project Brief
|
||
|
||
**Last updated:** April 2026
|
||
**Status:** Pre-development — Phase 1 MVP ready to build
|
||
|
||
---
|
||
|
||
## The Problem
|
||
|
||
Designers build documents in Adobe InDesign. Their clients — NGOs, professional associations, corporates — need editable Word versions to update content themselves. Rebuilding a document from scratch in Word is unbillable, time-consuming, and beneath a professional designer's workflow.
|
||
|
||
The only serious existing solution is ID2Office by Recosoft — a native InDesign plugin at $229/year that requires InDesign to be installed. There is no standalone web tool that accepts an exported IDML file and returns a clean DOCX without InDesign.
|
||
|
||
---
|
||
|
||
## The Solution
|
||
|
||
**IDconvert** — a web-based IDML to DOCX conversion tool. No InDesign required. Upload the IDML file, receive an editable Word document that preserves layout, typography, images, tables and reading flow.
|
||
|
||
**IDtag** — a free InDesign ExtendScript that prepares complex multi-column documents for more accurate conversion. Phase 2 only.
|
||
|
||
---
|
||
|
||
## Products
|
||
|
||
### IDtag (Phase 2 — Free)
|
||
- Plain `.jsx` ExtendScript file
|
||
- Designer runs it inside InDesign before exporting IDML
|
||
- Tags threaded text frames with metadata for accurate multi-column reconstruction
|
||
- No licensing, no activation, distributed freely from the website
|
||
- Serves as a top-of-funnel entry point into IDconvert
|
||
|
||
### IDconvert (Phase 1 — Paid)
|
||
- Web SaaS, no software installation required
|
||
- Upload IDML → scan → confirm → download DOCX
|
||
- Credit-based pricing
|
||
- Pre-conversion scan report with font warnings and layout notices
|
||
|
||
---
|
||
|
||
## How It Works (User Flow)
|
||
|
||
```
|
||
1. Designer runs IDtag in InDesign (Phase 2, optional for single column)
|
||
2. Designer exports IDML from InDesign normally
|
||
3. Designer or client uploads IDML to IDconvert
|
||
4. IDconvert scans the file — shows font report and warnings
|
||
5. User reviews notices, confirms conversion
|
||
6. DOCX downloads — ready to edit in Word
|
||
```
|
||
|
||
---
|
||
|
||
## What the DOCX Contains
|
||
|
||
- All text, fully editable, with paragraph and character styles mapped to Word equivalents
|
||
- Layout preserved via anchored linked text boxes — single and multi-column
|
||
- Images embedded and positioned to match original layout
|
||
- Tables with structure and formatting intact
|
||
- Hyperlinks active and clickable
|
||
- Page numbers as native Word footer fields, matched font/size/colour
|
||
- Clear warnings for anything that could not be perfectly replicated
|
||
|
||
---
|
||
|
||
## What It Does Not Do
|
||
|
||
- Pixel-perfect PDF replication (not possible in Word)
|
||
- Master page headers and footers (Phase 1 exclusion)
|
||
- Complex contour text wrap (simplified to square wrap with warning)
|
||
- Keynote export
|
||
- Require InDesign to be installed
|
||
|
||
---
|
||
|
||
## Target Users
|
||
|
||
### Maya — Graphic Designer
|
||
Freelance, 6 years experience. Designs annual reports, brand guides, cookbooks, association publications in InDesign. Needs to hand off editable Word documents to clients as a standard project deliverable. Builds IDconvert into her production workflow and bills clients for the conversion as part of her production fee.
|
||
|
||
**Pain:** Client always asks for Word version after the PDF is approved. Rebuilding it is unpaid time.
|
||
|
||
### Richard — Operations Manager
|
||
Regional engineering firm. Receives designed documents from an external studio. Needs to update figures, swap names, and edit body copy before board submissions — in Word, because that is what his team uses.
|
||
|
||
**Pain:** Designer sends a PDF. Richard cannot edit it. He needs a Word version that still looks professional.
|
||
|
||
---
|
||
|
||
## Competitive Positioning
|
||
|
||
| | ID2Office | IDconvert |
|
||
|---|---|---|
|
||
| Requires InDesign | Yes | No |
|
||
| Price | $229/year | From $9 (credit pack) |
|
||
| Delivery model | Native plugin | Web tool |
|
||
| Multi-column support | Yes (years of iteration) | Phase 2 |
|
||
| Pre-conversion warnings | No | Yes |
|
||
| Font report | No | Yes |
|
||
| Caribbean market focus | No | Yes |
|
||
|
||
---
|
||
|
||
## Pricing
|
||
|
||
### Credit Packs
|
||
| Pack | Price | Per Conversion |
|
||
|---|---|---|
|
||
| Starter — 5 credits | $19 | $3.80 |
|
||
| Studio — 20 credits | $59 | $2.95 |
|
||
| Agency — 60 credits | $149 | $2.48 |
|
||
|
||
Free tier: 1 conversion. Gated by email verification, browser fingerprint, and file hash.
|
||
No ads. Premium positioning — ID2Office charges $229/year with no scan report, no font intelligence, and requires InDesign installed.
|
||
No subscription at MVP — introduce after observing real usage frequency.
|
||
|
||
---
|
||
|
||
## Technical Architecture
|
||
|
||
### IDML Structure
|
||
IDML is a ZIP archive of XML files:
|
||
- `designmap.xml` — master manifest
|
||
- `Spreads/*.xml` — page geometry, frame positions
|
||
- `Stories/*.xml` — text content with formatting
|
||
- `Resources/Fonts.xml` — all fonts used
|
||
- `Resources/Styles.xml` — paragraph and character styles
|
||
|
||
Content (Stories) and layout (Spreads) are separate data sources joined by a `ParentStory` attribute on each text frame.
|
||
|
||
### Conversion Pipeline
|
||
```
|
||
Upload IDML
|
||
↓
|
||
SCAN: lightweight XML parse
|
||
- Count pages, stories, images, tables
|
||
- Classify fonts (safe / professional / unknown)
|
||
- Detect text wrap types
|
||
- Detect threading complexity
|
||
- Collect warnings array
|
||
↓
|
||
Display scan report to user
|
||
↓
|
||
User confirms → CONVERT
|
||
- Parse Spreads for frame geometry
|
||
- Parse Stories for text content
|
||
- Join on ParentStory ID
|
||
- Build DOCX:
|
||
Every frame → anchored text box
|
||
Threaded frames → linked text box chain
|
||
Images → anchored DrawingML
|
||
Page numbers → native Word footer field
|
||
Styles → mapped Word paragraph styles
|
||
↓
|
||
Download DOCX
|
||
```
|
||
|
||
### DOCX Layout Model
|
||
All content uses anchored text boxes — unified model, no strategy switching:
|
||
- Single frame → anchored text box, no linking
|
||
- Threaded story → linked text box chain via `w:linkTxbx`
|
||
- Images → anchored DrawingML at matching coordinates
|
||
- Text wrap → `wrapSquare` for all detected wraps (downgraded, user warned)
|
||
|
||
### Unit Conversion
|
||
- IDML: points
|
||
- DOCX positions: EMUs (1 inch = 914400 EMUs)
|
||
- DOCX font sizes: half-points (DXA)
|
||
- Formula: `emu = points * 914400 / 72`
|
||
- IDML frame positions are relative to spread center — must offset by page width
|
||
|
||
### Two API Endpoints
|
||
```
|
||
POST /scan costs 0 credits — returns scan report JSON
|
||
POST /convert costs 1 credit — returns DOCX file
|
||
```
|
||
Scan result cached by session. Convert reuses cached parse.
|
||
|
||
---
|
||
|
||
## Tech Stack
|
||
|
||
| Layer | Technology |
|
||
|---|---|
|
||
| InDesign script (Phase 2) | ExtendScript .jsx |
|
||
| Web frontend | Vue 3 + Vite + Tailwind CSS |
|
||
| Backend API | FastAPI (Python) |
|
||
| IDML parsing | Python zipfile + lxml |
|
||
| DOCX generation | python-docx |
|
||
| Font registry | Custom Python classification dict |
|
||
| Auth + credits | PocketBase |
|
||
| File storage | S3-compatible object storage |
|
||
| Payments | Stripe |
|
||
| Deployment | Dokploy |
|
||
| Phase 2 pipeline | n8n |
|
||
|
||
---
|
||
|
||
## Pre-Conversion Font Report
|
||
|
||
Every font in the document is classified and reported before the user spends a credit:
|
||
|
||
```
|
||
FONTS
|
||
─────────────────────────────────────────
|
||
✓ Arial Available in Word — no action needed
|
||
✓ Georgia Available in Word — no action needed
|
||
|
||
⚠ Freight Text Pro Not a Word system font
|
||
Used for: Body text (pages 1–8)
|
||
Action: Install on client machine
|
||
If missing: Word substitutes Georgia — minor reflow possible
|
||
|
||
⚠ Proxima Nova Not a Word system font
|
||
Used for: Headings, captions (all pages)
|
||
Action: Install on client machine
|
||
If missing: Word substitutes Calibri — spacing may differ
|
||
|
||
◎ DM Mono Unknown font
|
||
Used for: Pull quotes (pages 2, 8)
|
||
Action: Verify availability or supply to client
|
||
If missing: Word substitutes Courier New
|
||
─────────────────────────────────────────
|
||
💡 Supply a /Fonts folder alongside the DOCX
|
||
and ask your client to install before opening.
|
||
```
|
||
|
||
---
|
||
|
||
## File Validation and Security
|
||
|
||
Every uploaded file passes a strict validation gate before any processing occurs. This protects against malicious uploads, renamed files, raw .indd files, and ZIP bombs.
|
||
|
||
### Validation sequence (in order):
|
||
1. File size check — reject above 50MB before touching the file
|
||
2. Magic byte check — read actual file signature, not the extension. Must be `application/zip`
|
||
3. Open as ZIP — reject corrupted or fake archives
|
||
4. ZIP bomb protection — sum uncompressed sizes before extracting anything. Reject above 200MB uncompressed
|
||
5. Entry count limit — reject archives with more than 500 entries
|
||
6. Path traversal check — reject any entry with `../` in the path
|
||
7. IDML structure check — must contain `designmap.xml`, `Spreads/`, `Stories/`, `Resources/`
|
||
8. XML validity check — parse `designmap.xml` to confirm it is real XML, not injected content
|
||
9. Executable content check — reject if any entry has an executable extension (.exe, .sh, .py, .js etc.)
|
||
|
||
### Common innocent mistake — .indd instead of .idml
|
||
The most frequent user error is uploading a raw InDesign `.indd` file instead of an IDML export.
|
||
This gets a specific, helpful error message:
|
||
> "This appears to be an InDesign document file. Please export it as IDML first: File → Export → InDesign Markup (IDML)"
|
||
|
||
All other rejections get a clear but non-specific error — never tell a malicious actor which check they failed.
|
||
|
||
---
|
||
|
||
|
||
|
||
Layer 1 — Email verification with disposable domain blocking (Abstract API or Kickbox)
|
||
Layer 2 — Browser fingerprinting via FingerprintJS (free tier) — persists across sessions
|
||
Layer 3 — File hashing — same IDML file cannot be converted free across multiple accounts
|
||
Layer 4 — IP rate limiting via slowapi — 1 free conversion per IP per day
|
||
|
||
Start with layers 1, 2, and 3 at MVP. Layer 3 is especially effective for this product since IDML files are project-specific assets.
|
||
|
||
---
|
||
|
||
## UI Design Direction
|
||
|
||
Reference: ilovepdf.com and tools.pdf24.org
|
||
|
||
Core principle: upload zone is the entire hero. No marketing content above the fold on the tool page.
|
||
|
||
```
|
||
┌───────────────────────────────────────┐
|
||
│ IDconvert logo Login/Signup│
|
||
├───────────────────────────────────────┤
|
||
│ Convert InDesign to Word. │
|
||
│ Upload your IDML file to begin. │
|
||
│ │
|
||
│ ┌─────────────────────────────────┐ │
|
||
│ │ Drop IDML file here or │ │
|
||
│ │ [ Browse files ] │ │
|
||
│ └─────────────────────────────────┘ │
|
||
│ 🔒 Files deleted after 1 hour │
|
||
├───────────────────────────────────────┤
|
||
│ SCAN REPORT (appears after upload) │
|
||
│ ┌──────────┐ ┌──────────┐ │
|
||
│ │ 12 pages │ │ 8 stories│ ... │
|
||
│ └──────────┘ └──────────┘ │
|
||
│ FONTS │
|
||
│ ✓ Arial Safe │
|
||
│ ⚠ Freight Text Pro Install needed │
|
||
│ NOTICES │
|
||
│ ⚠ Text wrap simplified — pages 4, 7 │
|
||
│ 1 credit · 4 remaining │
|
||
│ [ Cancel ] [ Convert → ] │
|
||
├───────────────────────────────────────┤
|
||
│ Footer — minimal, links only │
|
||
└───────────────────────────────────────┘
|
||
```
|
||
|
||
Palette:
|
||
- Background: #F8F9FA
|
||
- Primary CTA: #1a56db (Convert button only)
|
||
- Warning: #F59E0B
|
||
- Success: #10B981
|
||
- UI type: Inter or DM Sans
|
||
- Technical/filenames: DM Mono
|
||
|
||
---
|
||
|
||
## Development Timeline
|
||
|
||
### Phase 1 — IDconvert MVP (Weeks 1–5)
|
||
|
||
**Week 1 — IDML Parser**
|
||
- Unzip and read IDML structure
|
||
- Extract tagged text frames in correct order
|
||
- Extract paragraph styles and map to Word styles
|
||
- Extract inline images
|
||
- Font extraction and classification
|
||
|
||
**Week 2 — DOCX Builder**
|
||
- Anchored text box generation from frame geometry
|
||
- Linked text box chains for threaded stories
|
||
- Paragraph and character style mapping
|
||
- Image embedding as anchored DrawingML
|
||
- Page number footer generation
|
||
|
||
**Week 3 — Scan Endpoint + Warning System**
|
||
- Lightweight pre-conversion parse
|
||
- Font report with substitute mapping and usage context
|
||
- Wrap detection and downgrade warnings
|
||
- Warnings array with page-level detail
|
||
- Session-based scan cache
|
||
|
||
**Week 4 — Web Tool UI + Credits**
|
||
- Vue 3 frontend — upload zone, scan report, font report, warning list
|
||
- PocketBase auth and credit system
|
||
- Stripe credit pack checkout
|
||
- Convert endpoint wiring
|
||
- File deletion after 1 hour
|
||
|
||
**Week 5 — Testing and Launch**
|
||
- End-to-end test with real annual report, brand guide, and cookbook files
|
||
- Edge case cleanup
|
||
- Soft launch
|
||
|
||
### Phase 2 — IDtag + Multi-Column (Weeks 6–9)
|
||
|
||
**Week 6** — IDtag ExtendScript (frame tagging, thread metadata, panel UI)
|
||
**Week 7** — Parser update (read IDtag metadata from IDML labels)
|
||
**Week 8** — DOCX builder update (enhanced multi-column reconstruction)
|
||
**Week 9** — Testing with magazine layouts, multi-column reports, release
|
||
|
||
---
|
||
|
||
## MVP Feature List (User-Facing Language)
|
||
|
||
- **Layout preserved** — your document opens in Word with the same page structure, columns and content positioning as the original design
|
||
- **Text is fully editable** — all text can be selected, edited and reformatted directly in Word without any special software
|
||
- **Styles carried over** — headings, body text, captions and other text styles are mapped to equivalent Word styles so formatting stays consistent when you edit
|
||
- **Images included** — all images from the original design are embedded in the Word document and positioned to match the layout
|
||
- **Tables converted** — tables come across with their structure, content and basic formatting intact and ready to edit
|
||
- **Clickable links preserved** — any hyperlinks in the original document remain active and clickable in the Word version
|
||
- **Page numbers matched** — page numbers are reproduced in Word using the same font, size and colour as the original design
|
||
- **Font report included** — before converting, you are told exactly which fonts need to be installed on your computer for the document to display correctly
|
||
- **Honest warnings upfront** — any design features that cannot be perfectly replicated in Word are clearly explained before you convert, so there are no surprises
|
||
|
||
---
|
||
|
||
## Accepted Limitations (Document Clearly in UI)
|
||
|
||
- Text wrap inside anchored text boxes is unreliable — simplified to square wrap, user warned
|
||
- Font reflow when client lacks designer's fonts — user warned with install instructions
|
||
- Heavy text editing may cause text box overflow in Word — include how-to-edit guide with download
|
||
- Master page headers and footers excluded at MVP
|
||
- Pixel-perfect PDF match is not the goal — structurally identical and editable is the goal
|