The security engineer’s checklist for evaluating supply-chain tools

You're evaluating supply-chain security tooling. A vendor has sent you a deck, a SOC 2 report, and a $36k annual quote. You have a meeting tomorrow with the VP of Engineering. Here's what to verify before signing.

The questions are organized by attack class. For each, the right answer is concrete and demonstrable; vague answers ("we have ML-powered detection") should make you press harder or walk away.

1. Known-CVE coverage (table stakes)

Does it use OSV, the GitHub Advisory Database, or its own? OSV is the open standard; vendor proprietary databases are usually OSV with some additions. Ask which.
Latency from advisory publish to enforcement? Should be hours, not days.
Severity-aware enforcement? Can you set different thresholds for critical vs. low?

If the tool only does CVE matching, you're paying for a wrapper around npm audit. Move on.

2. Zero-day window

The gap between a malicious package going live and threat feeds catching up is hours to days. Tooling that only uses threat feeds doesn't close this window.

Does it block before any threat feed lists the package? This is the differentiator. The answer should describe non-threat-feed signals — install-script pattern analysis, source-pattern matching, cooling gates, integrity fingerprinting, OS-level install-script sandboxing — not "we have a faster threat feed."
What's the false-positive rate on freshly published legitimate packages? Should be < 5%. Higher means cooling windows are misconfigured and you'll fight the tool.

3. Hijack detection

The event-stream pattern (legitimate maintainer adds a co-publisher who later turns malicious) requires cross-version signals.

Does it compare versions of the same package? Specifically: maintainer changes, file-tree changes, license changes, dormant-revival.
Does it have a dormant-revival signal? Packages that go quiet for months then suddenly publish are the highest-risk version of any popular library.
Does it integrate with npm publish attestations / Sigstore? Useful additional signal where available.

4. Install-script analysis

Postinstall scripts are the #1 delivery vector. The tool must inspect them.

What categories of suspicious behavior does it detect? Should include: curl-pipe-bash, exec of decoded payloads, credential-path access (~/.aws, ~/.ssh, ~/.npmrc), time-based execution, IOC matches.
Does it analyze across npm AND PyPI? Python setup.py and wheels can both run arbitrary code at install. Some tools only cover npm.
Native binary inspection? Packages can ship pre-compiled binaries with malicious behavior. Does the tool unpack them and check imports?

5. Authenticity

Typosquat detection? Real tools have a canonical name list with edit-distance scoring.
Slopsquat / dependency-confusion detection? Important if you have internal scopes or use AI assistants heavily.
How fresh is the canonical list? Should auto-update; ask cadence.

6. Containment

If the gate misses, what stops the install from damaging the system?

Is there an OS-level sandbox? Specifically: kernel-enforced FS allowlist and network restrictions during the install. Not "we drop env vars" — that's not containment.
Per-OS implementation: Linux should be Landlock (or namespace), macOS should be sandbox-exec or equivalent, Windows should be Job Object + restricted token.
What happens if the sandbox blocks something legitimate? Should be an interactive prompt + persistent overrides, not a hard fail.

7. CI compatibility

Does it work in GitHub Actions, GitLab CI, CircleCI, Buildkite, Jenkins? Should be a single CLI install, no agent daemon required.
What's the behavior on non-TTY (CI)? Should fail closed by default, with an explicit env-var to allow auto-approval in trusted internal CI.
Does it work in restricted CI runners? GitHub Actions on Ubuntu 24.04 has AppArmor restrictions on unprivileged user namespaces — make sure the tool degrades gracefully.

8. Performance

Cached install latency overhead? Should be < 50ms per package on cache hits.
First-install latency overhead? Will be slower (signal scoring takes a few seconds at most); ask the p95.
Does it parallelize? A serial signal scorer on a 500-package install is a slow CI step.

9. Operations

Air-gapped / offline support? If your customers require it.
Self-hosted control plane available? If your compliance does.
How is the trust-signal model updated? Pull from vendor server, baked into agent releases, signed updates? Each has different blast-radius if the vendor itself is compromised.
Telemetry: opt-in or opt-out? Read the privacy policy.

10. Cost & licensing

Per-license, per-seat, or per-install pricing? Each has different incentive shapes. Per-license usually scales best for medium teams.
What counts as a license? A developer machine? A CI runner? Both?
Annual commit vs. month-to-month? Annual usually discounts 15-25%; ask.
Free tier or month-to-month with no contract? Pilots are nice but a vendor that lets you cancel anytime gives you the same de-risking without a trial-end deadline driving the decision.

Red flags that should make you walk away

"We use AI to detect malicious packages" with no specific signal explanation.
Refusal to share what signals fire on a sample malicious fixture.
Pricing requires a sales call before you can see numbers.
No SOC 2 (you're trusting a tool with full network access to your installs).
Free tier requires you to ship telemetry that includes your dependency tree.
Closed-source agent with no third-party audit.
Sandbox claims that don't differentiate by OS.

Sample request to send the vendor before the meeting

Copy-paste:

Hi — before our call, I want to verify a few specifics for our buyer review:

Send me the full list of trust signals your gate evaluates, by category.

Run your tool on these three real malicious fixtures (event-stream@3.3.6, ua-parser-js@0.7.29, colors@1.4.1) and share the verdict + signals that fired for each.

Show me the CI behavior on a sample npm install where the gate refuses a package.

Document the OS sandbox mechanism per platform (Linux / macOS / Windows).

Share your SOC 2 Type 2 report.

Confirm pricing and what counts as a license.

We're evaluating against [n] other vendors. Looking for concrete answers; vague will lose us as a customer.

A tool that can't answer those in writing isn't ready to be your supply-chain layer. A tool that can usually wins the deal on the spot.

Veln's answers

For reference, here's what we send when this email comes in:

Twelve trust signals, six categories — vulnerabilities (OSV + npm attestations), install scripts (7 sub-categories), version-to-version changes (7 sub-points), suspicious source code, hidden binaries, authenticity. Each signal name is published in the docs.
Replay verdicts: event-stream@3.3.6 → BLOCK (5 signals fire); ua-parser-js@0.7.29 → BLOCK (4 signals); colors@1.4.1 → BLOCK (3 signals). Signal names listed.
CI demo: 30-second video showing veln safe npm ci refusing a malicious fixture, exit code 1, build fails before deploy.
OS sandbox: Linux Landlock (kernel ≥ 5.13), macOS sandbox-exec, Windows Job Object + RestrictedToken + LowIL. Documented per-OS in /docs/sandbox.
SOC 2 Type 2: in progress; happy to share roadmap.
Pricing: $4.99 / license / month. Licenses = developer machines + CI runners.

If we don't match what you need, walk. If we do, we'd like the meeting.