Building a Language-Learning App from Scratch

Tony Jiang · Mar 28, 2026 · 15 min read · permalink

Demo Video

Notes

When I use SRS, I mean “spaced repetition system”
URL: https://srs-app-tony.fly.dev/

The Problem

I’ve been learning foreign languages for a while. And what you notice after having internalized the basics (grammar, conjugations), is that what’s left is a seemingly insurmountable steep curve of vocabulary that comprises the bulk of your actual time spent learning the language. And in the past, it was much more difficult to climb up this wall. It often took manual reading, clipping sentences, saving words, and creating flashcards.

The standard method to go about doing it was tedious. In the beginning, one had to look up definitions, but even then, finding the distinct examples was hard (reverso.net simply showed uses, but this could be duplicates, or incomplete), and the definitions in the dictionary wouldn’t always give you the right “senses.” For example, let’s take English as an example, with the verb “lead.”

He leads the group (standard use case)
This road leads to Rome (leads in the sense of a path leading to a place)
This led to me looking through the library on a Saturday (as in “this resulted in,” a more figurative use case)
They were leading 5 to 0 (in the sports context)
She’s led a sheltered lifestyle (suffixed with the particular noun “life” or “lifestyle” afterwards)
He’s been led astray (in a particular phrase, in combination with “astray”)
He’s been led on! (phrasal verb, led on)
The events that led up to (phrasal verb)
The article leads with an anecdote from the author (in the sense of an article beginning with something)
It’s led me to believe that… (in the sense of to make someone do something)

This would’ve have been very difficult for a non-native speaker to gather. But later came AI, which solved many of these problems. Now, it’s much easier to generate such a list. So I thought - what if we had an SRS that was integrated with text reading itself, so one can see which words one has already learned, and easily add new ones?

The gap I kept running into was this: Anki gives you great spaced-repetition scheduling, but it disconnects words from context. Duolingo gamifies learning but doesn’t let you study words you’ve actually encountered. I wanted something in between — a tool that lets me build a deck from real texts I read, and then quizzes me on those exact words in context (and could generate other contexts for that word, for comparison purposes).

This is a write-up of what I built, the technical choices that mattered, and what I’d do differently.

What it does

The core loop is simple:

You upload a text (an article, a chapter, anything).
The app highlights the words in the text that are already in your deck.
You can select any word or phrase and add it to your deck — with AI-generated example sentences and usage notes.
Every day, the app shows you words due for review. You see the word and its example sentences, try to recall it, then self-rate (Again / Hard / Good / Easy).
The SRS engine schedules the word further into the future based on how well you recalled it.
You can choose manual review if wanted, because the SRS methodology is not always.

Supporting features: lemmatized search across your text corpus, multi-language support (French, Spanish, English, German), multi-deck management, and a subscription paywall.

Stack

Backend
- Choice: FastAPI + Uvicorn
- FastAPI gave me automatic request validation via Pydantic and async support out of the box — Flask requires you to wire both yourself, and Django’s batteries-included approach is overkill for an API-only backend with no need for its ORM or admin interface. I also like the auto-generated docs.
- The primary alternatives are Django and Flask.
- And why Python? Python has the most developed AI calling libraries, plus spaCy which doesn’t exist in the NodeJS ecosystem.
Auth
- Choice: Clerk (JWT)
- I didn’t want to build login, sessions, OAuth flows, or magic links. Clerk handles all of that; I just validate JWTs on the backend and extract the user ID from the claims. Worth the cost for a solo project.
- What is JWT alternatives?
Billing
- Choice: Stripe
- The standard choice for subscription payments. The Checkout and Customer Portal flows mean I don’t build any payment UI myself.
NLP
- Choice: spaCy
- spaCy is a “natural language processing” library in Python. It can convert between word forms, such as “have,” “haves,” “having,” etc.
- The app needs lemmatization — matching “découvert” to the deck entry for “découvrir” — and spaCy does this well across multiple languages with pre-trained models. The tradeoff is operational: the models aren’t bundled with the package and have to be downloaded explicitly in the Dockerfile.
LLM
- Choice: OpenAI GPT-4o-mini (pluggable)
- Good enough for content generation, cheap, and fast. I wrapped the LLM call in a pluggable client interface so the app can swap in Anthropic or a local Ollama instance without changing calling code. During development I used Ollama to avoid burning API credits.
Frontend
- Choice: React 19 + Vite + TanStack Query
- Vite for fast dev builds, TanStack Query for server state — caching canonicalization results with staleTime: Infinity so clicking the same word twice doesn’t hit the API again.
- React-Query has many uses. In this case, canonicalization can be cached, which fits React-Query’s use perfectly.
- React Router allows for protected routes, which is useful for subscription verification purposes
- Lazy-loading
  - For performance reasons, load pages only when you click on the route. Easy to do with React.lazy
- SSR - no SSR here
  - SSR’s main benefits are SEO, first contentful paint, open graph; but we don’t use these
- HSTS
  - HSTS tells browsers to never attempt an HTTP connection
- REST
  - In this particular case, RESTful paradigms were useful in that one was actually creating, reading, updating, and deleting resources. Many times, APIs do not hew to the RESTful pattern so cleanly, so the API ends up being “JSON-RPC” instead (endpoints serving JSON, but not following RESTful convention).
- Error handling, network resilience
  - React-Query attempts retries automatically, in case of network issues or server overload
  - Modals show errors when they occur from the network or are returned from the server
- Future: MUI and state management
  - MUI has the most advanced components
  - React-Query and the server handle much of the state, but Zustand could come in handy in case cross-page data needs to be shared/cached
Deployment
- Choice: Fly.io (single container)
- Single container, persistent volume for SQLite and the text corpus, auto-stop/start machines to keep costs near zero when idle.
- Eventually I would want to use AWS (see “Future”)
Databases
- Choice: plain SQLite
- It works and is good enough at the moment. Relational databases have won over document DBs (like Mongo) in the last decade due to document databases’ inability to do joins (at a certain point, everyone runs into it), so that’s why relational over document DBs for the main DB.
- In the future, one would have to switch to Postgres eventually. Postgres over MySQL because Postgres got over its speed issues, and MySQL stopped really improving.
- Graph & vector DBs were considered. RAG was considered. Graph DBs were considered for getting relations between words, but I think simply having a good side text is sufficient here. RAG over the text corpus would enable semantic retrieval — finding passages thematically related to a word, not just passages containing it. spaCy’s lemma matching was sufficient for my use case, but RAG is a natural future extension for richer quiz context.
- A natural extension would be embedding the text corpus with a multilingual model and storing vectors in pgvector — at review time, the system could retrieve semantically related sentences from the user’s own uploaded texts, surfacing varied natural contexts for a word rather than just the sentences where they first encountered it.
- The SRS scheduling logic — calculating due words, updating intervals, querying by due date — is set-based operations that map naturally to SQL but awkwardly to ORM abstractions. For that slice of the app, raw SQL is the clearer choice. The user and subscription management, by contrast, would fit an ORM fine.
Future
- The current stack is deliberately simple — it’s a single container on Fly.io with SQLite on a persistent volume. That works fine for now, but it has a clear ceiling. Multi-instance deployment would require moving file storage to S3 and the database to Postgres on RDS. NLP and LLM calls that currently block the request cycle would move to async workers via SQS. None of that is premature to think about, but it would be premature to build.

How to Use

The app is organized into eight pages, accessible from the sidebar.

Texts — a possible starting point. Upload plain text files (articles, book chapters, transcripts, anything). The app renders the text with your known vocabulary highlighted inline. Click a highlighted word to see its definition and use cases. Select any word or phrase in the text to add it to your deck or attach it as a new use case for an existing word.
See Words — another potential starting point. Your full vocabulary list. Add words manually, edit notes and definitions, manage use cases, or trigger AI generation of use cases for a word. This alternative interface allows you to add word by word, instead of selecting from a text. This is where you build and curate the deck outside of a reading session.
Add Sentences — enter a sentence and click individual words to canonicalize them (resolve inflected forms to their dictionary entry) and look up their notes and use cases. Useful for processing sentences you encounter outside the app. I would say, a third entry point.
Review — the daily SRS queue. Words due today are presented one at a time. You see the word and its use cases, try to recall it, then reveal the notes and self-rate: Again, Hard, Good, or Easy. The SRS engine reschedules each word based on your rating.
Manual Review — the same interface as Review, but loads all words in the deck regardless of due date and does not record ratings. Use this to browse or self-test without affecting the SRS schedule.
Search — lemmatized search across your uploaded texts. Type a word in any form and the app finds every sentence in your corpus where that word (or any inflection of it) appears.
Table — a sortable list of all words with their SRS metadata: due date, interval, ease factor, streak. Useful for getting an overview of where your deck stands.
Settings — deck management (create new decks, switch the active deck) and subscription management (subscribe, view status, open the Stripe customer portal).

The SRS engine

Spaced repetition is based on the insight that you should review something just before you’re about to forget it. Review too soon and you’re wasting time; review too late and you’ve already forgotten.

The algorithm I implemented is SM-2 (SuperMemo 2), which is the same algorithm Anki uses. Each word has two key numbers: an ease factor (a multiplier representing how difficult the word is for you) and an interval (days until next review). After every review:

Again: reset the interval to 1 day, reduce ease factor.
Hard: multiply the interval by 0.85, reduce ease factor slightly.
Good: multiply the interval by the ease factor (standard progression).
Easy: multiply the interval by 1.3 × ease factor, increase ease factor.

The ease factor is clamped at a minimum of 1.3 — words can only get so hard. After a few months, a word you know well might have an interval of 100+ days. You barely think about it.

The state for each word (interval, ease factor, due date, streak, repetitions) lives in an srs_state table. Every review is also logged to a separate reviews table — append-only history — so I can build analytics later without losing data.

I don’t think the particularities of the learning algorithm are as important as the generation of the cards, but it’s important to elucidate how precisely it works. The learning algorithm isn’t changed from Anki, but understanding how it calculates and its principles of operation is important.

NLP: lemmatization and why it matters

When you search for “running” in a text, you usually want to find “run”, “runs”, “ran” too. This is lemmatization — reducing a word to its dictionary form.

I used spaCy for this. Each language has a separate model (fr_core_news_sm, es_core_news_sm, etc.) that gets loaded lazily on first use and cached in memory. On top of that, I pre-build a lemma map per deck — a dictionary from lemma string to word record — so that highlight and search queries can run fast without re-tokenizing everything.

# Simplified highlight logic
def highlight(text, deck_id):
    nlp = _get_nlp(active_deck.language)
    lemma_map = _get_lemma_map(deck_id)  # lemma → {word_id, word}
    doc = nlp(text)
    return [
        {"start": token.idx, "end": token.idx + len(token), "word_id": lemma_map[token.lemma_]["word_id"]}
        for token in doc
        if token.lemma_ in lemma_map
    ]

The lemma map gets invalidated whenever a word is added or deleted from the deck. It’s rebuilt on next use. Simple and effective.

One complication: the spaCy models aren’t bundled with the package — you have to download them separately. In the Dockerfile, this meant explicitly running python -m spacy download fr_core_news_sm (and the other three language models) at build time. That’s easy to miss, and I did miss it for the non-French models initially.

An alternative would be to use a RAG, or vector DB.

Lemmatization shows up in three places in the app:

Texts — when rendering a text, every token is lemmatized and looked up in the deck’s lemma map. If the lemma matches a known word, that token gets highlighted. This means “découvrons” lights up because “découvrir” is in the deck, even though the exact string doesn’t appear.
Search — the query is lemmatized before matching. Type “led” and the search finds sentences containing “lead”, “leads”, “leading”, “led” — any form that shares the same lemma. The corpus sentences are also lemmatized at search time, so the match is lemma-to-lemma rather than string-to-string.
Add Sentences — when you click a word in the entered sentence, the app lemmatizes it to find the canonical form before looking it up in the deck. This is the first step of canonicalization (before the spaCy POS logic and Ollama fallback kick in for multi-word phrases).

Canonicalization: what form should I look up?

When a user selects a word in a text and wants to add it to their deck, they’re probably looking at an inflected form — “découvert” instead of “découvrir”. I need to canonicalize it to its dictionary form before adding it to the deck.

I built a two-tier approach:

spaCy first: Fast, local, free. For single tokens, return the lemma. For multi-word phrases, strip function words and reflexive pronouns, then pick the most semantically significant word by POS priority (VERB > NOUN > ADJ > ADV).
LLM fallback: If spaCy returns nothing useful (common with complex phrases), fall back to a local LLM with a language-aware prompt.

Right now there’s only 1 LLM being used, however, in the future, I would separate between a cheap LLM for canonicalization, and a more expensive and sophisticated one for other prompts (like sense generation).

AI content generation

Use case generation (/api/generate): Given a word, ask the LLM to produce a comprehensive set of example sentences covering all meaningful senses — like a lexicographer would. Low temperature (0.2), structured JSON output. The prompt is explicit about what to include: core meanings, prepositional constructions, reflexive forms, idiomatic phrases.

I’m using OpenAI GPT-4o-mini — good enough for this task, cheap, fast. But I wrapped the LLM call in a pluggable client interface so the app can use Anthropic’s API or a local Ollama instance without changing any calling code. This came in handy during development when I was running everything locally.

Multi-user design and auth

The app is multi-user with a subscription paywall. Auth is handled by Clerk — I just validate JWTs on the backend with python-jose, extract the user ID from claims["sub"], and use that to scope all data.

Data ownership flows through a chain: subscriptions.user_id → decks.user_id → words.deck_id → srs_state.word_id. Endpoints that accept a word_id directly verify ownership before operating — there’s an explicit _get_word_for_user() check rather than trusting the client.

Billing is Stripe subscriptions. The flow:

User clicks Subscribe → create a Stripe Checkout session → redirect.
Payment completes → Stripe fires a webhook → backend writes status = "active" to subscriptions table.
Every API call (except deck endpoints and billing endpoints) checks require_subscription, which chains off JWT verification.

Deployment

Single container on Fly.io. FastAPI serves both the API and the built React frontend. SQLite and the text corpus live on a persistent Fly.io volume at /data.

The multi-stage Dockerfile builds the React frontend with Node first, then copies the dist/ into the Python image. VITE_CLERK_PUBLISHABLE_KEY is a Docker build arg (it’s a public key — safe to bake into the bundle at build time).

What I’d do differently

Text corpus in the database. Right now, texts are stored as .txt files in a per-user directory. It works, but it means backups require copying both the database and the files directory. Moving texts into a texts table would simplify everything.

What I learned

spaCy is powerful but has operational surface area. The models are separate downloads that have to be explicitly included in your Docker image. It’s easy to end up with the model working locally but missing in production.

SQLite is underrated for this kind of app. Multi-user, subscriptions, SRS state — all of it works fine in SQLite on a single machine. The migration system I built (numbered .sql files, applied once and tracked in a schema_migrations table) is dead simple and has never given me trouble.

The LLM pluggability paid off. Being able to swap between OpenAI, Anthropic, and local Ollama during development meant I wasn’t burning API credits constantly. The CompletionClient protocol is maybe 10 lines of code and saved a lot of friction.

Clerk + Stripe is a pretty good combination. Clerk handles all the auth complexity (OAuth, magic links, session management), and the JWT validation on the backend is straightforward. Stripe’s webhook model is reliable once you understand that local testing uses a different webhook secret than production.

What’s next

A few things I want to build:

AI tutor having an AI tutor chatbot that can know what you’ve already done, what you need to work on

↑ Back to top