circle-ir SAST Benchmarks

Static analysis only. No LLM. Reproducible.

circle-ir 3.19.4 · benchmark date April 22, 2026

circle-ir is an MIT-licensed neuro-symbolic static analysis library. These benchmarks measure the static analysis engine only—no LLM verification layer is involved. All benchmark code, harnesses, and raw outputs are available in the circle-ir repository for independent reproduction.

Results

Benchmarks by Language

Java (6 benchmarks)

Benchmark	Tests	TP	TN	FP	FN	TPR	FPR	Score
OWASP Benchmark	1,415	708	707	0	0	100%	0%	100%
Juliet Test Suite	243	122	121	0	0	100%	0%	100%
SecuriBench Micro	123	60	60	1	2	96.8%	1.6%	97.7%
CWE-Bench-Java	120	61	—	—	59	50.8%	—	50.8%
WebGoat	29	26	—	—	3	89.7%	—	89.3%
DVJA	7	7	—	—	0	100%	—	100%

Node.js / TypeScript (3 benchmarks)

Benchmark	Tests	TP	TN	FP	FN	TPR	FPR	Score
NodeGoat	14	14	—	—	0	100%	—	100%
Juice Shop	14	14	—	—	0	100%	—	100%
NodeJS Synthetic	25	23	—	—	2	92.0%	—	92.9%

Python (2 benchmarks)

Benchmark	Tests	TP	TN	FP	FN	TPR	FPR	Score
PyGoat	26	23	—	—	3	88.5%	—	90.0%
DVPWA	6	6	—	—	0	100%	—	100%

Rust (2 benchmarks)

Benchmark	Tests	TP	TN	FP	FN	TPR	FPR	Score
Rust Synthetic	50	46	—	—	4	92.0%	—	92.3%
CWE-Bench-Rust	30	28	—	—	2	93.3%	—	94.4%

Other Languages (3 benchmarks)

Benchmark	Tests	TP	TN	FP	FN	TPR	FPR	Score
Bash Synthetic	31	31	—	—	0	100%	—	100%
HTML/JS Synthetic	30	30	—	—	0	100%	—	100%
Firing Range	40	35	—	2	3	92.1%	—	92.1%

Summary

Results by Language

Language	Perfect (100%)	Near-perfect (90%+)	Total Benchmarks
Java	3	4	6
Node.js / TypeScript	2	3	3
Python	1	2	2
Rust	0	2	2
Bash	1	1	1
HTML/JS	1	1	1
Total	8	13	16

Deep Dive

CWE-Bench-Java by Category

CWE	Category	Detected	Missed	Rate
CWE-022	Path Traversal	37 / 55	18	67.3%
CWE-078	Command Injection	6 / 13	7	46.2%
CWE-079	XSS	13 / 31	18	41.9%
CWE-094	Code Injection	5 / 21	16	23.8%

Methodology

How we measured

circle-ir is a neuro-symbolic static analyzer that combines traditional dataflow analysis with learned patterns
All results are from static analysis only—no LLM involvement in detection or verification
Each benchmark's source dataset is linked to its origin: OWASP Benchmark, NIST Juliet Test Suite, CWE-Bench-Java, and others
CWE-Bench-Java uses per-project binary detection: each project contains one CVE, scored as detected or not
All benchmark code, harnesses, and raw outputs are in cogniumhq/circle-ir/benchmarks/

Limitations

Known gaps

SSTI (Server-Side Template Injection) is not currently in circle-ir's CWE coverage—this causes the PyGoat false negative
Firing Range has 2 false positives in the escape/ category (escaped output flagged) and 3 false negatives in cors/ (CORS misconfigurations not detected)
CWE-Bench-Java uses per-project detection, not per-CVE counts—a single missed sink in a complex project counts as a full miss
These benchmarks test static analysis only—the full circle-ir + LLM verification pipeline (SAST+LLM) produces different results, published separately

Reproduce

Run it yourself

To reproduce these benchmark results, you need Git and Node.js (18+) installed.

git clone https://github.com/cogniumhq/circle-ir
cd circle-ir/benchmarks
npm install
npm run benchmark

If you cannot reproduce a result, please open an issue at github.com/cogniumhq/circle-ir/issues.