A Weekend: From Idea to 1,000 Memorable Flashcards — AI in Design, UI Mocking, Brainstorming, Coding, and Mistakes

Sometimes technology genuinely makes life better. That’s why I love building things.


1. Why I Built This

😩 The problem:

  • My daughter was struggling with French vocabulary
  • Shared Anki decks didn’t match her textbook or her level
  • Most decks are dull — text-only, no images, no audio, no analysis
  • Making good flashcards by hand? 20–30 minutes per word. For a 40-word unit, that’s a part-time job

🛠️ The solution:

  • One weekend, 126 commits, over 800 personalized cards
  • 🎨 Cartoon illustrations, 🔊 pronunciation audio, 🧬 color-coded etymology, 💡 kid-friendly mnemonics
  • She opened Anki on her iPad and grinned:

“Papa, this one has a funny picture.”

The gap is not in the study app — Anki is battle-tested. The gap is in personalized content generation.


2. The Approach: Feed the Wheel, Don’t Replace It

ankids doesn’t compete with Anki. It feeds it.

Drop in any input, get back a ready-to-import .apkg file. Anki handles everything after that.


Supported inputs:

  • 📝 Word lists
  • 📊 Excel files
  • 📄 PDF textbooks
  • 📸 iPhone photos of homework sheets

Five-step AI pipeline, three core capabilities:

  • 👁️ Vision models — OCR a photo of any textbook page, extract words, translations, and example sentences in one pass
  • 🧠 Language models — Generate etymology, morpheme breakdowns, IPA pronunciation, cognates, and kid-friendly mnemonics at scale
  • 🎨 Image & audio models — Create a cartoon illustration and pronunciation clip for every word, so a child hears and sees before they read

Output:

  • 📦 A single .apkg file, ready to import into Anki
  • 🔄 Sync, scheduling, mobile, progress tracking — all handled by Anki

3. Before and After

😐 Shared deck — what most people use:

Standard shared Anki card — plain text, no image, no audio, no etymology
Standard shared Anki card — plain text, no image, no audio, no etymology
  • Text-only, no images, no audio — functional but forgettable

🤩 ankids — what we generate:

AI-generated ankids card — cartoon image, color-coded etymology, IPA, example sentences
AI-generated ankids card — cartoon image, color-coded etymology, IPA, example sentences
  • 🖼️ AI cartoon illustration
  • 🗣️ IPA pronunciation TTS audio
  • 🎨 Color-coded morpheme breakdown
  • 🧬 Origin chain back to Proto-Indo-European
  • 🌍 Cognates across four languages
  • 💡 Memory hook bilingual example sentence

A card that shows why a word looks the way it does is far easier to retain than one that just restates the definition.


4. How It Works

4.1 The 5-Step Pipeline

ankids 5-step pipeline diagram
ankids 5-step pipeline diagram
  • 📄 Each step reads/writes a shared cards.json
  • 🔁 Every step is independently restartable — hit a rate limit at card 847? Re-run and it resumes

4.2 Input Sources

  • 📥 Six source types: --words (CSV), Excel/CSV, PDFs, single images, image folders, Google Drive URLs
  • 📱 HEIC support (iPhone photos) built in — HEIC → Gemini Vision OCR → structured JSON via pillow-heif

4.3 External Services

External API services diagram
External API services diagram
  • 🤖 MiniMax — enrichment LLM primary image generator
  • 🔍 Google Gemini — OCR image generation fallback
  • 🔄 Auto-fallback on rate limits, progress saved between runs

4.4 Card Lifecycle

Card lifecycle state diagram
Card lifecycle state diagram

4.5 Workspace Layout

workspace/
└── a1b2c3d4/
├── cards.json ← single source of truth
├── media/
│ ├── {uuid}_audio.mp3 ← word TTS
│ ├── {uuid}_example_audio.mp3
│ └── {uuid}_image.png ← AI cartoon
└── MyDeck.apkg ← ready to import
  • 🔒 UUID-scoped isolation — multiple decks run in parallel, no collisions
  • 💾 Workspace persists between sessions — critical for large batches

5. What Makes a Card Memorable

5.1 The Etymology Engine

Etymology Engine — Component Decomposition
Etymology Engine — Component Decomposition
  • 🧠 Most Western European words share Latin or Greek roots
  • 🔍 A child who knows in- = “not” and croy = “believe” can decode incroyable without memorizing it
  • 💡 That’s linguistic intuition, not rote memorization

Four fields per card, each color-coded:

FieldColor schemePurpose
target_mnemonicBlue prefix · Coral root · Green suffixMorpheme breakdown
target_originGold tonesPIE → Latin → Old French chain
target_cognatesPurple tonesEN / DE / FR / Latin cognate family
target_memory_hookMorpheme colorsOne-line hook using the breakdown

Example — incroyable:

Mnemonic: <span color="#5b9bd5">in-</span> <span color="#e07b7b">croy</span> <span color="#6dba6d">-able</span>
Origin: <span color="#8B6914">PIE *krey-</span>
<span color="#B8860B">Latin credere</span>
<span color="#D4A854">Old French incroyable</span>
Cognates: <span color="#8e7cc3">EN</span> incredible,
<span color="#a78bfa">DE</span> unglaublich,
<span color="#7c3aed">LA</span> incredibilis
Memory hook: <span color="#5b9bd5">in-</span> = not
<span color="#e07b7b">croy</span> = believe
<span color="#6dba6d">-able</span> = can be → "not believable!"
  • 🎯 HTML renders directly inside Anki’s card template
  • ⏭️ Atomic words that don’t benefit from breakdown skip the block
  • 🎨 Gender coloring: masculine = blue, feminine = coral, neuter = grey — consistent everywhere

5.2 The Card Template

ankids card anatomy — front and back
ankids card anatomy — front and back
  • Front: target word, IPA, part-of-speech, AI cartoon, audio button
  • Back: etymology stack, example sentences with audio, source word, gender badge
  • ⌨️ “Type-in” variant — shows source word, prompts learner to type the answer

5.3 The Data Model

The Card Pydantic model is the contract across every pipeline stage:

class Card(BaseModel):
id: str = Field(default_factory=lambda: str(uuid.uuid4()))
unit: str | None = None # e.g. "Unité 3"
status: str = STATUS_EXTRACTED # extracted → enriched → complete
source_word: str
source_language: str = "de"
source_gender: str | None = None
target_word: str | None = None
target_language: str
target_pronunciation: str | None = None # IPA or Pinyin
target_part_of_speech: str | None = None
target_example_sentence: str | None = None
target_mnemonic: str | None = None
target_origin: str | None = None
target_cognates: str | None = None
target_memory_hook: str | None = None
typing: bool = False
  • 🔑 Merge key: (source_word, target_language)
  • 🛡️ Re-ingesting never overwrites existing enrichment or media — only adds new words

6. Building It: 126 Commits Over a Weekend

Weekend coding sprint commit chart
Weekend coding sprint commit chart
  • 📐 Monday — architecture: schema drafts, prompt specs, system design. No runtime code yet
  • 🚀 Thursday 21:30 — first feature live (image OCR via Gemini)
  • 🌙 Friday 00:30–03:19 — 14 commits in flow state: shared constants, API key validation, error handling
  • 🔧 Saturday — refinement: docs, workspace isolation, Gemini enrichment, incremental saves
  • Sunday morning — MVP in the hands of the first user

📊 Stats: 126 commits, heaviest work Thursday night → Saturday evening. Longest session: 00:30 to 03:19 while the house slept.

🤖 Working with 10 concurrent AI agents shifts the bottleneck. Agents produce code in seconds. The constraint becomes decisions — which design to accept, which edge case to prioritize, which abstraction to commit to. A different kind of exhaustion. It doesn’t live in your hands. It lives behind your eyes.

🧠 Planning with AI before coding — I used superpowers brainstorming heavily for architecture and design decisions before writing any code. Its new visual companion feature is insane — generates UI mockups and diagrams right in the terminal, so you can see what you’re building before you build it.


7. What I Learned the Hard Way

7.1 Your Laptop Is Not a GPU Server

Mac Mini M4 — great for coding, not for local image generation
Mac Mini M4 — great for coding, not for local image generation
  • 💻 Mac Mini M4 (16 GB) — ran 10 parallel AI coding agents fine
  • 🚫 Stable Diffusion XL, Flux, Gemma 4 — all need >16 GB VRAM
  • 🐢 On shared unified memory: 1 image every 3–4 minutes. For 1,000 words = days
  • ☁️ Lesson: use cloud APIs. This is a hardware constraint, not a software one

7.2 The $84 Bill

Google Gemini API cost breakdown after generating ~1,000 card images
Google Gemini API cost breakdown after generating ~1,000 card images
  • 💸 Gemini image generation aggressive async concurrency = $84 in one Hour
  • 😅 The pipeline worked exactly as designed — both impressive and alarming
  • 💰 Fix: switched primary provider to MiniMax image-01 — less quality, fraction of the cost
PROVIDERS = {
"minimax": _generate_minimax,
"gemini": _generate_gemini,
}
  • 🛑 Rate-limit (429) halts the batch gracefully and saves progress — nothing lost

7.3 AI Takes Words Literally

AI-generated illustration for 'Handschuhe' — a hand and a shoe, literally
AI-generated illustration for ‘Handschuhe’ — a hand and a shoe, literally
  • 😂 Handschuhe = “hand shoes” (gloves). The model drew a hand and a shoe. Smiling.
  • 🔧 Fix: pass the full language name tell the model to illustrate the concept, not decode the word
def _build_image_prompt(word: str, target_language: str) -> str:
lang = _lang_full_name(target_language) # "de" → "German"
return (
f"A single cute, kid-friendly cartoon illustration representing "
f"the {lang} word '{word}'. "
f"Pure illustration only — no text, letters, labels, or speech bubbles."
)
  • ✅ “German” instead of “de” dramatically reduced literal interpretations
  • ✅ “No text” constraint stopped the model from overlaying words on images

7.4 Deleted Lines Cost Money Too

MetricCount
Total commits126
Lines added15,558
Lines deleted9,760
Net codebase5,798 lines
Refactor / fix commits42 out of 126 (33%)
  • 🗑️ 9,760 deleted lines = the cost of moving fast with AI agents
  • 🔄 Agents generate code without checking if a module already exists — you end up unifying duplicates

Three biggest cleanups:

  • "Clean the docs"3,505 lines deleted, 143 added
  • "Create MVP in first day"3,224 lines deleted, 24 added
  • "refactor: eliminate duplication" — 95 deleted, 230 added

💡 Lesson: run code review before adding features, not after. “Does this already exist?” saves hours.

7.5 The New Bottleneck Is You

  • ⏱️ Every 10 minutes brings a design question that used to take a day of thinking
  • 🧠 Batch or per-card? Absolute or relative paths? Merge by string or UUID?
  • 😴 Decisions compound — bad call at midnight = refactoring at 2am
  • 🏁 Finished the MVP Sunday morning, slept most of the afternoon
The cognitive cost of AI-accelerated development — 10 agents, 1 engineer, every decision is yours
The cognitive cost of AI-accelerated development — 10 agents, 1 engineer, every decision is yours

8. Try It


git clone https://github.com/wuqunfei/anki-card-ai-builder
# Install (Python 3.12 )
uv sync
cp .env.example .env

Minimal .env — grab your API keys first:

MINIMAX_API_KEY=your_minimax_key
GOOGLE_API_KEY=your_google_key
LEARNER_PROFILE="ages 9-12, kid-friendly with emojis"

8.1 One command

# From a word list
ankids run --words "chat,maison,incroyable,bonjour" --lang-target fr --lang-source de
# From iPhone photos of textbook pages (HEIC supported)
ankids run --input ./input/french/ --lang-target fr --lang-source de --deck "Unité 3"
# From a PDF textbook
ankids run --input textbook.pdf --lang-target fr --lang-source de
# Incremental — add words to an existing workspace
ankids run --words "école,professeur" --lang-target fr --output workspace/a1b2c3d4

8.2 Step by step

ankids ingest --input ./input/french/ --lang-target fr --lang-source de
ankids enrich --output workspace/a1b2c3d4
ankids media --output workspace/a1b2c3d4
ankids review --output workspace/a1b2c3d4
ankids export --output workspace/a1b2c3d4 --deck "French Unit 3"

8.3 Cost controls

ankids run --words "chat,chien" --lang-target fr --no-images # text only
ankids run --words "chat,chien" --lang-target fr --no-audio # skip TTS
ankids run --words "chat,chien" --lang-target fr --typing # type-in cards
IMAGE_PROVIDER=minimax MEDIA_CONCURRENCY=2
ankids media --output workspace/a1b2c3d4

8.4 CLI reference

CommandWhat it does
runFull pipeline: ingest enrich media export
ingestExtract words from file, folder, or --words
enrichFill all card fields with AI
mediaGenerate TTS audio and AI images
reviewDisplay cards and media status
exportBundle into .apkg for Anki
cleanDelete workspace and start fresh

9. What Comes Next

The MVP works. One real user loves it. Here’s where everything stands — what’s tested, what’s not, and what’s still on the roadmap.

🔮 Planned Features — Roadmap

FeatureStatusDescription
🌍 More languages📋 PlannedSpanish, Italian, Mandarin — prompt adjustments for Pinyin and IPA
🎨 Smarter image prompts📋 PlannedUse target_memory_hook as image prompt to fix compound word failures (Handschuhe problem)
🔄 Deck diffing📋 PlannedDiff-and-patch workflow with stable UUID tracking across exports
📚 Curriculum planner💭 VisionGenerate a full semester of decks from a syllabus PDF, pre-scheduled to Anki

10. Stack

LayerTechnologyLink
🖥️ CLITyper Python 3.12typer.tiangolo.com
📐 Schema / validationPydantic v2docs.pydantic.dev
👁️ Image OCRGoogle Gemini 2.0 Flashai.google.dev
🧠 AI enrichmentMiniMax M2.5 (via Anthropic SDK)minimax.io
🎨 Image generationMiniMax image-01 / Gemini (fallback)minimax.io
🔊 AudiogTTSpypi.org/project/gTTS
🃏 Anki packaginggenankigithub.com/kerrickstaley/genanki
📄 PDF extractionPyMuPDFpymupdf.readthedocs.io
📱 HEIC supportpillow-heifgithub.com/bigcat88/pillow_heif
🌐 Async HTTPhttpx asynciowww.python-httpx.org
🧹 Linting / typesruff mypydocs.astral.sh/ruff
📦 Package managementuv hatchlingdocs.astral.sh/uv

Open source. Pull requests welcome — especially from parents who know the feeling.

github address: https://github.com/wuqunfei/anki-card-ai-builder

Posted in

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.