Working Notes

Building a Language-Learning App from Scratch

Tony Jiang · · 15 min read · permalink

Demo Video

Notes

The Problem

I’ve been learning foreign languages for a while. And what you notice after having internalized the basics (grammar, conjugations), is that what’s left is a seemingly insurmountable steep curve of vocabulary that comprises the bulk of your actual time spent learning the language. And in the past, it was much more difficult to climb up this wall. It often took manual reading, clipping sentences, saving words, and creating flashcards.

The standard method to go about doing it was tedious. In the beginning, one had to look up definitions, but even then, finding the distinct examples was hard (reverso.net simply showed uses, but this could be duplicates, or incomplete), and the definitions in the dictionary wouldn’t always give you the right “senses.” For example, let’s take English as an example, with the verb “lead.”

This would’ve have been very difficult for a non-native speaker to gather. But later came AI, which solved many of these problems. Now, it’s much easier to generate such a list. So I thought - what if we had an SRS that was integrated with text reading itself, so one can see which words one has already learned, and easily add new ones?

The gap I kept running into was this: Anki gives you great spaced-repetition scheduling, but it disconnects words from context. Duolingo gamifies learning but doesn’t let you study words you’ve actually encountered. I wanted something in between — a tool that lets me build a deck from real texts I read, and then quizzes me on those exact words in context (and could generate other contexts for that word, for comparison purposes).

This is a write-up of what I built, the technical choices that mattered, and what I’d do differently.


What it does

The core loop is simple:

  1. You upload a text (an article, a chapter, anything).
  2. The app highlights the words in the text that are already in your deck.
  3. You can select any word or phrase and add it to your deck — with AI-generated example sentences and usage notes.
  4. Every day, the app shows you words due for review. You see the word and its example sentences, try to recall it, then self-rate (Again / Hard / Good / Easy).
  5. The SRS engine schedules the word further into the future based on how well you recalled it.
  6. You can choose manual review if wanted, because the SRS methodology is not always.

Supporting features: lemmatized search across your text corpus, multi-language support (French, Spanish, English, German), multi-deck management, and a subscription paywall.


Stack


How to Use

The app is organized into eight pages, accessible from the sidebar.


The SRS engine

Spaced repetition is based on the insight that you should review something just before you’re about to forget it. Review too soon and you’re wasting time; review too late and you’ve already forgotten.

The algorithm I implemented is SM-2 (SuperMemo 2), which is the same algorithm Anki uses. Each word has two key numbers: an ease factor (a multiplier representing how difficult the word is for you) and an interval (days until next review). After every review:

The ease factor is clamped at a minimum of 1.3 — words can only get so hard. After a few months, a word you know well might have an interval of 100+ days. You barely think about it.

The state for each word (interval, ease factor, due date, streak, repetitions) lives in an srs_state table. Every review is also logged to a separate reviews table — append-only history — so I can build analytics later without losing data.

I don’t think the particularities of the learning algorithm are as important as the generation of the cards, but it’s important to elucidate how precisely it works. The learning algorithm isn’t changed from Anki, but understanding how it calculates and its principles of operation is important.


NLP: lemmatization and why it matters

When you search for “running” in a text, you usually want to find “run”, “runs”, “ran” too. This is lemmatization — reducing a word to its dictionary form.

I used spaCy for this. Each language has a separate model (fr_core_news_sm, es_core_news_sm, etc.) that gets loaded lazily on first use and cached in memory. On top of that, I pre-build a lemma map per deck — a dictionary from lemma string to word record — so that highlight and search queries can run fast without re-tokenizing everything.

# Simplified highlight logic
def highlight(text, deck_id):
    nlp = _get_nlp(active_deck.language)
    lemma_map = _get_lemma_map(deck_id)  # lemma → {word_id, word}
    doc = nlp(text)
    return [
        {"start": token.idx, "end": token.idx + len(token), "word_id": lemma_map[token.lemma_]["word_id"]}
        for token in doc
        if token.lemma_ in lemma_map
    ]

The lemma map gets invalidated whenever a word is added or deleted from the deck. It’s rebuilt on next use. Simple and effective.

One complication: the spaCy models aren’t bundled with the package — you have to download them separately. In the Dockerfile, this meant explicitly running python -m spacy download fr_core_news_sm (and the other three language models) at build time. That’s easy to miss, and I did miss it for the non-French models initially.

An alternative would be to use a RAG, or vector DB.

Lemmatization shows up in three places in the app:


Canonicalization: what form should I look up?

When a user selects a word in a text and wants to add it to their deck, they’re probably looking at an inflected form — “découvert” instead of “découvrir”. I need to canonicalize it to its dictionary form before adding it to the deck.

I built a two-tier approach:

  1. spaCy first: Fast, local, free. For single tokens, return the lemma. For multi-word phrases, strip function words and reflexive pronouns, then pick the most semantically significant word by POS priority (VERB > NOUN > ADJ > ADV).

  2. LLM fallback: If spaCy returns nothing useful (common with complex phrases), fall back to a local LLM with a language-aware prompt.

Right now there’s only 1 LLM being used, however, in the future, I would separate between a cheap LLM for canonicalization, and a more expensive and sophisticated one for other prompts (like sense generation).


AI content generation

Use case generation (/api/generate): Given a word, ask the LLM to produce a comprehensive set of example sentences covering all meaningful senses — like a lexicographer would. Low temperature (0.2), structured JSON output. The prompt is explicit about what to include: core meanings, prepositional constructions, reflexive forms, idiomatic phrases.

I’m using OpenAI GPT-4o-mini — good enough for this task, cheap, fast. But I wrapped the LLM call in a pluggable client interface so the app can use Anthropic’s API or a local Ollama instance without changing any calling code. This came in handy during development when I was running everything locally.


Multi-user design and auth

The app is multi-user with a subscription paywall. Auth is handled by Clerk — I just validate JWTs on the backend with python-jose, extract the user ID from claims["sub"], and use that to scope all data.

Data ownership flows through a chain: subscriptions.user_iddecks.user_idwords.deck_idsrs_state.word_id. Endpoints that accept a word_id directly verify ownership before operating — there’s an explicit _get_word_for_user() check rather than trusting the client.

Billing is Stripe subscriptions. The flow:

  1. User clicks Subscribe → create a Stripe Checkout session → redirect.
  2. Payment completes → Stripe fires a webhook → backend writes status = "active" to subscriptions table.
  3. Every API call (except deck endpoints and billing endpoints) checks require_subscription, which chains off JWT verification.

Deployment

Single container on Fly.io. FastAPI serves both the API and the built React frontend. SQLite and the text corpus live on a persistent Fly.io volume at /data.

The multi-stage Dockerfile builds the React frontend with Node first, then copies the dist/ into the Python image. VITE_CLERK_PUBLISHABLE_KEY is a Docker build arg (it’s a public key — safe to bake into the bundle at build time).


What I’d do differently

Text corpus in the database. Right now, texts are stored as .txt files in a per-user directory. It works, but it means backups require copying both the database and the files directory. Moving texts into a texts table would simplify everything.


What I learned

spaCy is powerful but has operational surface area. The models are separate downloads that have to be explicitly included in your Docker image. It’s easy to end up with the model working locally but missing in production.

SQLite is underrated for this kind of app. Multi-user, subscriptions, SRS state — all of it works fine in SQLite on a single machine. The migration system I built (numbered .sql files, applied once and tracked in a schema_migrations table) is dead simple and has never given me trouble.

The LLM pluggability paid off. Being able to swap between OpenAI, Anthropic, and local Ollama during development meant I wasn’t burning API credits constantly. The CompletionClient protocol is maybe 10 lines of code and saved a lot of friction.

Clerk + Stripe is a pretty good combination. Clerk handles all the auth complexity (OAuth, magic links, session management), and the JWT validation on the backend is straightforward. Stripe’s webhook model is reliable once you understand that local testing uses a different webhook secret than production.


What’s next

A few things I want to build:

↑ Back to top