A Weekend: From Idea to 1,000 Memorable Flashcards — AI in Design, UI Mocking, Brainstorming, Coding, and Mistakes

Sometimes technology genuinely makes life better. That’s why I love building things.

1. Why I Built This

😩 The problem:

My daughter was struggling with French vocabulary
Shared Anki decks didn’t match her textbook or her level
Most decks are dull — text-only, no images, no audio, no analysis
Making good flashcards by hand? 20–30 minutes per word. For a 40-word unit, that’s a part-time job

🛠️ The solution:

One weekend, 126 commits, over 800 personalized cards
🎨 Cartoon illustrations, 🔊 pronunciation audio, 🧬 color-coded etymology, 💡 kid-friendly mnemonics
She opened Anki on her iPad and grinned:

“Papa, this one has a funny picture.”

The gap is not in the study app — Anki is battle-tested. The gap is in personalized content generation.

2. The Approach: Feed the Wheel, Don’t Replace It

ankids doesn’t compete with Anki. It feeds it.

Drop in any input, get back a ready-to-import .apkg file. Anki handles everything after that.

Supported inputs:

📝 Word lists
📊 Excel files
📄 PDF textbooks
📸 iPhone photos of homework sheets

Five-step AI pipeline, three core capabilities:

👁️ Vision models — OCR a photo of any textbook page, extract words, translations, and example sentences in one pass
🧠 Language models — Generate etymology, morpheme breakdowns, IPA pronunciation, cognates, and kid-friendly mnemonics at scale
🎨 Image & audio models — Create a cartoon illustration and pronunciation clip for every word, so a child hears and sees before they read

Output:

📦 A single .apkg file, ready to import into Anki
🔄 Sync, scheduling, mobile, progress tracking — all handled by Anki

3. Before and After

😐 Shared deck — what most people use:

Standard shared Anki card — plain text, no image, no audio, no etymology

Text-only, no images, no audio — functional but forgettable

🤩 ankids — what we generate:

AI-generated ankids card — cartoon image, color-coded etymology, IPA, example sentences

🖼️ AI cartoon illustration
🗣️ IPA pronunciation TTS audio
🎨 Color-coded morpheme breakdown
🧬 Origin chain back to Proto-Indo-European
🌍 Cognates across four languages
💡 Memory hook bilingual example sentence

A card that shows why a word looks the way it does is far easier to retain than one that just restates the definition.

4. How It Works

4.1 The 5-Step Pipeline

📄 Each step reads/writes a shared cards.json
🔁 Every step is independently restartable — hit a rate limit at card 847? Re-run and it resumes

4.2 Input Sources

📥 Six source types: --words (CSV), Excel/CSV, PDFs, single images, image folders, Google Drive URLs
📱 HEIC support (iPhone photos) built in — HEIC → Gemini Vision OCR → structured JSON via pillow-heif

4.3 External Services

🤖 MiniMax — enrichment LLM primary image generator
🔍 Google Gemini — OCR image generation fallback
🔄 Auto-fallback on rate limits, progress saved between runs

4.4 Card Lifecycle

4.5 Workspace Layout

			
workspace/
└── a1b2c3d4/
    ├── cards.json               ← single source of truth
    ├── media/
    │   ├── {uuid}_audio.mp3     ← word TTS
    │   ├── {uuid}_example_audio.mp3
    │   └── {uuid}_image.png     ← AI cartoon
    └── MyDeck.apkg              ← ready to import

		

🔒 UUID-scoped isolation — multiple decks run in parallel, no collisions
💾 Workspace persists between sessions — critical for large batches

5. What Makes a Card Memorable

5.1 The Etymology Engine

Etymology Engine — Component Decomposition

🧠 Most Western European words share Latin or Greek roots
🔍 A child who knows in- = “not” and croy = “believe” can decode incroyable without memorizing it
💡 That’s linguistic intuition, not rote memorization

Four fields per card, each color-coded:

Field	Color scheme	Purpose
`target_mnemonic`	Blue prefix · Coral root · Green suffix	Morpheme breakdown
`target_origin`	Gold tones	PIE → Latin → Old French chain
`target_cognates`	Purple tones	EN / DE / FR / Latin cognate family
`target_memory_hook`	Morpheme colors	One-line hook using the breakdown

Example — incroyable:

			
Mnemonic:    <span color="#5b9bd5">in-</span>   <span color="#e07b7b">croy</span>   <span color="#6dba6d">-able</span>
Origin:      <span color="#8B6914">PIE *krey-</span> →
             <span color="#B8860B">Latin credere</span> →
             <span color="#D4A854">Old French incroyable</span>
Cognates:    <span color="#8e7cc3">EN</span> incredible,
             <span color="#a78bfa">DE</span> unglaublich,
             <span color="#7c3aed">LA</span> incredibilis
Memory hook: <span color="#5b9bd5">in-</span> = not  
             <span color="#e07b7b">croy</span> = believe  
             <span color="#6dba6d">-able</span> = can be → "not believable!"

		

🎯 HTML renders directly inside Anki’s card template
⏭️ Atomic words that don’t benefit from breakdown skip the block
🎨 Gender coloring: masculine = blue, feminine = coral, neuter = grey — consistent everywhere

5.2 The Card Template

Front: target word, IPA, part-of-speech, AI cartoon, audio button
Back: etymology stack, example sentences with audio, source word, gender badge
⌨️ “Type-in” variant — shows source word, prompts learner to type the answer

5.3 The Data Model

The Card Pydantic model is the contract across every pipeline stage:

			
class Card(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    unit: str | None = None          # e.g. "Unité 3"
    status: str = STATUS_EXTRACTED   # extracted → enriched → complete
    source_word: str
    source_language: str = "de"
    source_gender: str | None = None
    target_word: str | None = None
    target_language: str
    target_pronunciation: str | None = None   # IPA or Pinyin
    target_part_of_speech: str | None = None
    target_example_sentence: str | None = None
    target_mnemonic: str | None = None
    target_origin: str | None = None
    target_cognates: str | None = None
    target_memory_hook: str | None = None
    typing: bool = False

		

🔑 Merge key: (source_word, target_language)
🛡️ Re-ingesting never overwrites existing enrichment or media — only adds new words

6. Building It: 126 Commits Over a Weekend

📐 Monday — architecture: schema drafts, prompt specs, system design. No runtime code yet
🚀 Thursday 21:30 — first feature live (image OCR via Gemini)
🌙 Friday 00:30–03:19 — 14 commits in flow state: shared constants, API key validation, error handling
🔧 Saturday — refinement: docs, workspace isolation, Gemini enrichment, incremental saves
✅ Sunday morning — MVP in the hands of the first user

📊 Stats: 126 commits, heaviest work Thursday night → Saturday evening. Longest session: 00:30 to 03:19 while the house slept.

🤖 Working with 10 concurrent AI agents shifts the bottleneck. Agents produce code in seconds. The constraint becomes decisions — which design to accept, which edge case to prioritize, which abstraction to commit to. A different kind of exhaustion. It doesn’t live in your hands. It lives behind your eyes.

🧠 Planning with AI before coding — I used superpowers brainstorming heavily for architecture and design decisions before writing any code. Its new visual companion feature is insane — generates UI mockups and diagrams right in the terminal, so you can see what you’re building before you build it.

7. What I Learned the Hard Way

7.1 Your Laptop Is Not a GPU Server

Mac Mini M4 — great for coding, not for local image generation

💻 Mac Mini M4 (16 GB) — ran 10 parallel AI coding agents fine
🚫 Stable Diffusion XL, Flux, Gemma 4 — all need >16 GB VRAM
🐢 On shared unified memory: 1 image every 3–4 minutes. For 1,000 words = days
☁️ Lesson: use cloud APIs. This is a hardware constraint, not a software one

7.2 The $84 Bill

Google Gemini API cost breakdown after generating ~1,000 card images

💸 Gemini image generation aggressive async concurrency = $84 in one Hour
😅 The pipeline worked exactly as designed — both impressive and alarming
💰 Fix: switched primary provider to MiniMax image-01 — less quality, fraction of the cost

			
PROVIDERS = {
    "minimax": _generate_minimax,
    "gemini":  _generate_gemini,
}

🛑 Rate-limit (429) halts the batch gracefully and saves progress — nothing lost

7.3 AI Takes Words Literally

AI-generated illustration for 'Handschuhe' — a hand and a shoe, literally — AI-generated illustration for ‘Handschuhe’ — a hand and a shoe, literally

😂 Handschuhe = “hand shoes” (gloves). The model drew a hand and a shoe. Smiling.
🔧 Fix: pass the full language name tell the model to illustrate the concept, not decode the word

			
def _build_image_prompt(word: str, target_language: str) -> str:
    lang = _lang_full_name(target_language)  # "de" → "German"
    return (
        f"A single cute, kid-friendly cartoon illustration representing "
        f"the {lang} word '{word}'. "
        f"Pure illustration only — no text, letters, labels, or speech bubbles."
    )

		

✅ “German” instead of “de” dramatically reduced literal interpretations
✅ “No text” constraint stopped the model from overlaying words on images

7.4 Deleted Lines Cost Money Too

Metric	Count
Total commits	126
Lines added	15,558
Lines deleted	9,760
Net codebase	5,798 lines
Refactor / fix commits	42 out of 126 (33%)

🗑️ 9,760 deleted lines = the cost of moving fast with AI agents
🔄 Agents generate code without checking if a module already exists — you end up unifying duplicates

Three biggest cleanups:

"Clean the docs" — 3,505 lines deleted, 143 added
"Create MVP in first day" — 3,224 lines deleted, 24 added
"refactor: eliminate duplication" — 95 deleted, 230 added

💡 Lesson: run code review before adding features, not after. “Does this already exist?” saves hours.

7.5 The New Bottleneck Is You

⏱️ Every 10 minutes brings a design question that used to take a day of thinking
🧠 Batch or per-card? Absolute or relative paths? Merge by string or UUID?
😴 Decisions compound — bad call at midnight = refactoring at 2am
🏁 Finished the MVP Sunday morning, slept most of the afternoon

The cognitive cost of AI-accelerated development — 10 agents, 1 engineer, every decision is yours

8. Try It


git clone https://github.com/wuqunfei/anki-card-ai-builder
# Install (Python 3.12 )
uv sync
cp .env.example .env

Minimal .env — grab your API keys first:

🔑 Google API Key — get one at Google AI Studio
🔑 MiniMax API Key — sign up at MiniMax Platform

			
MINIMAX_API_KEY=your_minimax_key
GOOGLE_API_KEY=your_google_key
LEARNER_PROFILE="ages 9-12, kid-friendly with emojis"

8.1 One command

			
# From a word list
ankids run --words "chat,maison,incroyable,bonjour" --lang-target fr --lang-source de
# From iPhone photos of textbook pages (HEIC supported)
ankids run --input ./input/french/ --lang-target fr --lang-source de --deck "Unité 3"
# From a PDF textbook
ankids run --input textbook.pdf --lang-target fr --lang-source de
# Incremental — add words to an existing workspace
ankids run --words "école,professeur" --lang-target fr --output workspace/a1b2c3d4

		

8.2 Step by step

			
ankids ingest --input ./input/french/ --lang-target fr --lang-source de
ankids enrich --output workspace/a1b2c3d4
ankids media  --output workspace/a1b2c3d4
ankids review --output workspace/a1b2c3d4
ankids export --output workspace/a1b2c3d4 --deck "French Unit 3"

		

8.3 Cost controls

			
ankids run --words "chat,chien" --lang-target fr --no-images   # text only
ankids run --words "chat,chien" --lang-target fr --no-audio    # skip TTS
ankids run --words "chat,chien" --lang-target fr --typing      # type-in cards
IMAGE_PROVIDER=minimax MEDIA_CONCURRENCY=2 
ankids media --output workspace/a1b2c3d4

		

8.4 CLI reference

Command	What it does
`run`	Full pipeline: ingest enrich media export
`ingest`	Extract words from file, folder, or `--words`
`enrich`	Fill all card fields with AI
`media`	Generate TTS audio and AI images
`review`	Display cards and media status
`export`	Bundle into `.apkg` for Anki
`clean`	Delete workspace and start fresh

9. What Comes Next

The MVP works. One real user loves it. Here’s where everything stands — what’s tested, what’s not, and what’s still on the roadmap.

🔮 Planned Features — Roadmap

Feature	Status	Description
🌍 More languages	📋 Planned	Spanish, Italian, Mandarin — prompt adjustments for Pinyin and IPA
🎨 Smarter image prompts	📋 Planned	Use `target_memory_hook` as image prompt to fix compound word failures (Handschuhe problem)
🔄 Deck diffing	📋 Planned	Diff-and-patch workflow with stable UUID tracking across exports
📚 Curriculum planner	💭 Vision	Generate a full semester of decks from a syllabus PDF, pre-scheduled to Anki

10. Stack

Layer	Technology	Link
🖥️ CLI	Typer Python 3.12	typer.tiangolo.com
📐 Schema / validation	Pydantic v2	docs.pydantic.dev
👁️ Image OCR	Google Gemini 2.0 Flash	ai.google.dev
🧠 AI enrichment	MiniMax M2.5 (via Anthropic SDK)	minimax.io
🎨 Image generation	MiniMax image-01 / Gemini (fallback)	minimax.io
🔊 Audio	gTTS	pypi.org/project/gTTS
🃏 Anki packaging	genanki	github.com/kerrickstaley/genanki
📄 PDF extraction	PyMuPDF	pymupdf.readthedocs.io
📱 HEIC support	pillow-heif	github.com/bigcat88/pillow_heif
🌐 Async HTTP	httpx asyncio	www.python-httpx.org
🧹 Linting / types	ruff mypy	docs.astral.sh/ruff
📦 Package management	uv hatchling	docs.astral.sh/uv

Open source. Pull requests welcome — especially from parents who know the feeling.

github address: https://github.com/wuqunfei/anki-card-ai-builder

recent posts

about

Leave a comment Cancel reply

recent posts

about