How It Works Benchmarks Coverage Get Started GitHub ↗

circle-ir SAST Benchmarks

Static analysis only. No LLM. Reproducible.

circle-ir 3.19.4 · benchmark date April 22, 2026

circle-ir is an MIT-licensed neuro-symbolic static analysis library. These benchmarks measure the static analysis engine only—no LLM verification layer is involved. All benchmark code, harnesses, and raw outputs are available in the circle-ir repository for independent reproduction.

Benchmarks by Language

Java (6 benchmarks)

Benchmark Tests TP TN FP FN TPR FPR Score
OWASP Benchmark 1,415 708 707 0 0 100% 0% 100%
Juliet Test Suite 243 122 121 0 0 100% 0% 100%
SecuriBench Micro 123 60 60 1 2 96.8% 1.6% 97.7%
CWE-Bench-Java 120 61 59 50.8% 50.8%
WebGoat 29 26 3 89.7% 89.3%
DVJA 7 7 0 100% 100%

Node.js / TypeScript (3 benchmarks)

Benchmark Tests TP TN FP FN TPR FPR Score
NodeGoat 14 14 0 100% 100%
Juice Shop 14 14 0 100% 100%
NodeJS Synthetic 25 23 2 92.0% 92.9%

Python (2 benchmarks)

Benchmark Tests TP TN FP FN TPR FPR Score
PyGoat 26 23 3 88.5% 90.0%
DVPWA 6 6 0 100% 100%

Rust (2 benchmarks)

Benchmark Tests TP TN FP FN TPR FPR Score
Rust Synthetic 50 46 4 92.0% 92.3%
CWE-Bench-Rust 30 28 2 93.3% 94.4%

Other Languages (3 benchmarks)

Benchmark Tests TP TN FP FN TPR FPR Score
Bash Synthetic 31 31 0 100% 100%
HTML/JS Synthetic 30 30 0 100% 100%
Firing Range 40 35 2 3 92.1% 92.1%

Results by Language

Language Perfect (100%) Near-perfect (90%+) Total Benchmarks
Java 3 4 6
Node.js / TypeScript 2 3 3
Python 1 2 2
Rust 0 2 2
Bash 1 1 1
HTML/JS 1 1 1
Total 8 13 16

CWE-Bench-Java by Category

CWE Category Detected Missed Rate
CWE-022 Path Traversal 37 / 55 18 67.3%
CWE-078 Command Injection 6 / 13 7 46.2%
CWE-079 XSS 13 / 31 18 41.9%
CWE-094 Code Injection 5 / 21 16 23.8%

How we measured

  • circle-ir is a neuro-symbolic static analyzer that combines traditional dataflow analysis with learned patterns
  • All results are from static analysis only—no LLM involvement in detection or verification
  • Each benchmark's source dataset is linked to its origin: OWASP Benchmark, NIST Juliet Test Suite, CWE-Bench-Java, and others
  • CWE-Bench-Java uses per-project binary detection: each project contains one CVE, scored as detected or not
  • All benchmark code, harnesses, and raw outputs are in cogniumhq/circle-ir/benchmarks/

Known gaps

  • SSTI (Server-Side Template Injection) is not currently in circle-ir's CWE coverage—this causes the PyGoat false negative
  • Firing Range has 2 false positives in the escape/ category (escaped output flagged) and 3 false negatives in cors/ (CORS misconfigurations not detected)
  • CWE-Bench-Java uses per-project detection, not per-CVE counts—a single missed sink in a complex project counts as a full miss
  • These benchmarks test static analysis only—the full circle-ir + LLM verification pipeline (SAST+LLM) produces different results, published separately

Run it yourself

To reproduce these benchmark results, you need Git and Node.js (18+) installed.

git clone https://github.com/cogniumhq/circle-ir
cd circle-ir/benchmarks
npm install
npm run benchmark

If you cannot reproduce a result, please open an issue at github.com/cogniumhq/circle-ir/issues.

Copied to clipboard!