The security engineer’s checklist for evaluating supply-chain tools
You're evaluating supply-chain security tooling. A vendor has sent you a deck, a SOC 2 report, and a $36k annual quote. You have a meeting tomorrow with the VP of Engineering. Here's what to verify before signing.
The questions are organized by attack class. For each, the right answer is concrete and demonstrable; vague answers ("we have ML-powered detection") should make you press harder or walk away.
1. Known-CVE coverage (table stakes)
- Does it use OSV, the GitHub Advisory Database, or its own? OSV is the open standard; vendor proprietary databases are usually OSV with some additions. Ask which.
- Latency from advisory publish to enforcement? Should be hours, not days.
- Severity-aware enforcement? Can you set different thresholds for critical vs. low?
If the tool only does CVE matching, you're paying for a wrapper around npm audit. Move on.
2. Zero-day window
The gap between a malicious package going live and threat feeds catching up is hours to days. Tooling that only uses threat feeds doesn't close this window.
- Does it block before any threat feed lists the package? This is the differentiator. The answer should describe non-threat-feed signals — install-script pattern analysis, source-pattern matching, cooling gates, integrity fingerprinting, OS-level install-script sandboxing — not "we have a faster threat feed."
- What's the false-positive rate on freshly published legitimate packages? Should be < 5%. Higher means cooling windows are misconfigured and you'll fight the tool.
3. Hijack detection
The event-stream pattern (legitimate maintainer adds a co-publisher who later turns malicious) requires cross-version signals.
- Does it compare versions of the same package? Specifically: maintainer changes, file-tree changes, license changes, dormant-revival.
- Does it have a dormant-revival signal? Packages that go quiet for months then suddenly publish are the highest-risk version of any popular library.
- Does it integrate with npm publish attestations / Sigstore? Useful additional signal where available.
4. Install-script analysis
Postinstall scripts are the #1 delivery vector. The tool must inspect them.
- What categories of suspicious behavior does it detect? Should include: curl-pipe-bash, exec of decoded payloads, credential-path access (
~/.aws,~/.ssh,~/.npmrc), time-based execution, IOC matches. - Does it analyze across npm AND PyPI? Python
setup.pyand wheels can both run arbitrary code at install. Some tools only cover npm. - Native binary inspection? Packages can ship pre-compiled binaries with malicious behavior. Does the tool unpack them and check imports?
5. Authenticity
- Typosquat detection? Real tools have a canonical name list with edit-distance scoring.
- Slopsquat / dependency-confusion detection? Important if you have internal scopes or use AI assistants heavily.
- How fresh is the canonical list? Should auto-update; ask cadence.
6. Containment
If the gate misses, what stops the install from damaging the system?
- Is there an OS-level sandbox? Specifically: kernel-enforced FS allowlist and network restrictions during the install. Not "we drop env vars" — that's not containment.
- Per-OS implementation: Linux should be Landlock (or namespace), macOS should be sandbox-exec or equivalent, Windows should be Job Object + restricted token.
- What happens if the sandbox blocks something legitimate? Should be an interactive prompt + persistent overrides, not a hard fail.
7. CI compatibility
- Does it work in GitHub Actions, GitLab CI, CircleCI, Buildkite, Jenkins? Should be a single CLI install, no agent daemon required.
- What's the behavior on non-TTY (CI)? Should fail closed by default, with an explicit env-var to allow auto-approval in trusted internal CI.
- Does it work in restricted CI runners? GitHub Actions on Ubuntu 24.04 has AppArmor restrictions on unprivileged user namespaces — make sure the tool degrades gracefully.
8. Performance
- Cached install latency overhead? Should be < 50ms per package on cache hits.
- First-install latency overhead? Will be slower (signal scoring takes a few seconds at most); ask the p95.
- Does it parallelize? A serial signal scorer on a 500-package install is a slow CI step.
9. Operations
- Air-gapped / offline support? If your customers require it.
- Self-hosted control plane available? If your compliance does.
- How is the trust-signal model updated? Pull from vendor server, baked into agent releases, signed updates? Each has different blast-radius if the vendor itself is compromised.
- Telemetry: opt-in or opt-out? Read the privacy policy.
10. Cost & licensing
- Per-license, per-seat, or per-install pricing? Each has different incentive shapes. Per-license usually scales best for medium teams.
- What counts as a license? A developer machine? A CI runner? Both?
- Annual commit vs. month-to-month? Annual usually discounts 15-25%; ask.
- Free tier or month-to-month with no contract? Pilots are nice but a vendor that lets you cancel anytime gives you the same de-risking without a trial-end deadline driving the decision.
Red flags that should make you walk away
- "We use AI to detect malicious packages" with no specific signal explanation.
- Refusal to share what signals fire on a sample malicious fixture.
- Pricing requires a sales call before you can see numbers.
- No SOC 2 (you're trusting a tool with full network access to your installs).
- Free tier requires you to ship telemetry that includes your dependency tree.
- Closed-source agent with no third-party audit.
- Sandbox claims that don't differentiate by OS.
Sample request to send the vendor before the meeting
Copy-paste:
Hi — before our call, I want to verify a few specifics for our buyer review:
- Send me the full list of trust signals your gate evaluates, by category.
- Run your tool on these three real malicious fixtures (event-stream@3.3.6, ua-parser-js@0.7.29, colors@1.4.1) and share the verdict + signals that fired for each.
- Show me the CI behavior on a sample npm install where the gate refuses a package.
- Document the OS sandbox mechanism per platform (Linux / macOS / Windows).
- Share your SOC 2 Type 2 report.
- Confirm pricing and what counts as a license.
We're evaluating against [n] other vendors. Looking for concrete answers; vague will lose us as a customer.
A tool that can't answer those in writing isn't ready to be your supply-chain layer. A tool that can usually wins the deal on the spot.
Veln's answers
For reference, here's what we send when this email comes in:
- Twelve trust signals, six categories — vulnerabilities (OSV + npm attestations), install scripts (7 sub-categories), version-to-version changes (7 sub-points), suspicious source code, hidden binaries, authenticity. Each signal name is published in the docs.
- Replay verdicts: event-stream@3.3.6 → BLOCK (5 signals fire); ua-parser-js@0.7.29 → BLOCK (4 signals); colors@1.4.1 → BLOCK (3 signals). Signal names listed.
- CI demo: 30-second video showing
veln safe npm cirefusing a malicious fixture, exit code 1, build fails before deploy. - OS sandbox: Linux Landlock (kernel ≥ 5.13), macOS sandbox-exec, Windows Job Object + RestrictedToken + LowIL. Documented per-OS in /docs/sandbox.
- SOC 2 Type 2: in progress; happy to share roadmap.
- Pricing: $4.99 / license / month. Licenses = developer machines + CI runners.
If we don't match what you need, walk. If we do, we'd like the meeting.