Typosquat naming patterns on PyPI: a taxonomy
Typosquatting is the supply-chain attack where someone registers a package name that's a small variation on a popular real package — requets instead of requests, pillow-image instead of pillow, pyhton-dateutil instead of python-dateutil — and waits for the typos. This post is a taxonomy of the typosquat naming patterns observed on PyPI, with notes on what each pattern looks like and what catches it.
Why PyPI in particular
Every major package registry has a typosquatting problem, but PyPI's profile is distinctive for two reasons.
First, PyPI's namespace is relatively flat. Unlike npm (which has scoped packages, e.g. @org/pkg), PyPI package names are global: anyone who registers requests-typo owns it everywhere. There's no way for the real requests to claim "all near-misses of my name."
Second, the install command and the import statement are decoupled. A developer who runs pip install pyhton-dateutil and then import dateutil will generally get some dateutil working — either the squatted version's stub of the import, or, if the squatter has copied the legitimate package's code, a near-identical experience. The decoupling makes the typo silent until something goes wrong.
The patterns
Across the typosquats reported to PyPI's security team and tracked by independent researchers, a small set of naming patterns recurs.
Single-character transposition
Two adjacent letters swapped: requets for requests, pyhton for python. Common in keyboard-adjacent typos.
Single-character omission or insertion
A letter dropped or added: equests for requests, requestss for requests. Often misses if your typing is slightly off.
Hyphenation variants
PyPI normalizes underscores and hyphens for case-insensitive lookup, but only for the canonical name lookup. Variants like python_dateutil versus python-dateutil and other hyphenation differences are sometimes registrable as separate packages depending on registry rules and timing.
Prefix or suffix insertion
python-requests for requests, requests-python for requests, pillow-image for pillow. The squatter banks on the prefix being plausible — "python-" implies "the Python version of," which is what many developers expect.
Visual lookalikes
Less common on PyPI but observed: substituting visually similar characters (using 0 for o, 1 for l, the cyrillic а for the latin a). PyPI now flags some unicode confusables in package name registration, but historical squats exist.
Maintained-package-name squatting
A package that was abandoned and removed; an attacker re-registers the same name with new, malicious content. The history of the original name lends false credibility to the new package.
Common-typo of multi-word names
Multi-word names produce more typo opportunities. python-dateutil has typos in both halves (pyhton-dateutil, python-dateutill, pythondateutil). Multi-word packages should be aware of their typo surface and consider claiming common variants on the registry.
What gets caught quickly
PyPI's automated systems and security volunteers catch many typosquats within hours of registration. The clearest signals:
- Levenshtein-distance-1 from a top-100 package
- New account, no prior packages, new package in the same week
- Package contents that match a top-package's contents nearly verbatim, with a small added file
These get pulled fast. The squats that linger are the ones that:
- Use less obvious naming patterns (prefix/suffix insertion that looks plausible)
- Sit unmodified for weeks before the malicious version is published
- Resurrect abandoned packages, where the name has organic search history
What you can do as a developer
Read the install command before you run it. This sounds trivial, but it's the highest-yield defense. A second of attention catches single-character typos.
Check the publisher. PyPI shows the user account that uploaded a package. If it's brand-new, that's a signal worth pausing on for a package whose real version has been around for a decade.
Use a vetted allowlist for production. If your CI installs from a curated requirements.txt you maintain (not arbitrarily from pip install ... invocations developers run ad hoc), typosquats cannot land in the production tree.
Cooling-period gates catch first-install windows. A typosquat is a brand-new package on first install. A cooling gate that holds new packages with no community history forces a review before the install runs.
What you can do as a maintainer
Claim near-misses of your package name proactively. PyPI does not require a registered package to have content; you can publish empty placeholder packages claiming the most likely typo variants. Several large maintainers do this for requests, urllib3, and other top-tier packages.
Document the canonical install command prominently. A README with pip install requests (with requests linked to the canonical PyPI page) reduces typo incidence.
Takeaway
Typosquatting on PyPI follows a small set of naming patterns. The registry catches the obvious cases quickly; the long-tail squats remain a real risk. As a consumer, the cheapest defense is attention to install commands; the strongest is a cooling-period gate that turns the first install of an unknown package into a review.