Skip to main content
TL;DR: QWED uses deterministic solvers (SymPy, Z3, AST, SQLGlot) wherever possible. When LLM fallback is required, the response is explicitly marked as HEURISTIC.

Engine Classification

Every QWED verification engine falls into one of three categories:
CategoryEnginesTechnologyReproducible?
100% SymbolicMath, Logic, Code, SQL, Schema, TaintSymPy, Z3, AST, SQLGlot✅ Yes
HybridFact, StatsTF-IDF / Sandbox → LLM fallback⚠️ Conditional
HeuristicImage, Consensus, ReasoningVLM / Multi-LLM voting❌ No

Detailed Breakdown

EngineModePrimary TechnologyLLM Fallback?
MathSymbolicSymPy (symbolic algebra)❌ Never
LogicSymbolicZ3 Theorem Prover (SMT)❌ Never
CodeSymbolicPython AST + Bandit❌ Never
SQLSymbolicSQLGlot AST parser❌ Never
SchemaSymbolicPydantic + SymPy❌ Never
TaintSymbolicData Flow Analysis (AST)❌ Never
FactHybridTF-IDF similarity✅ When TF-IDF confidence < threshold
StatsHybridWasm/Docker sandbox✅ When sandbox execution fails
ImageHeuristicVision LLM (GPT-4V, Claude)✅ Always
ConsensusHeuristicMulti-LLM voting✅ Always
ReasoningHeuristicChain-of-thought LLM✅ Always

How to Know Which Mode Was Used

Every API response includes a verification_mode field:
{
  "status": "VERIFIED",
  "verified": true,
  "engine": "math",
  "verification_mode": "SYMBOLIC",
  "result": { ... }
}

verification_mode Values

ValueMeaningTrust Level
SYMBOLICResult proven by deterministic solver (SymPy/Z3/AST)🟢 100% reproducible
HEURISTICResult from LLM fallback or multi-model voting🟡 Best-effort, not guaranteed

Why This Matters

For Regulated Industries

Banks, healthcare, and legal systems require audit trails and reproducibility:
  • SYMBOLIC results can be independently verified
  • ⚠️ HEURISTIC results should be flagged for human review

For Production Systems

result = client.verify_math("2+2=4")

if result.verification_mode == "SYMBOLIC":
    # Safe to use without human review
    process_result(result)
elif result.verification_mode == "HEURISTIC":
    # Flag for human review or additional validation
    queue_for_review(result)

Trust boundaries in translated queries

When a natural language query is verified through the math pipeline, the response status is INCONCLUSIVE — even when the expression evaluation itself is fully deterministic. This is because the LLM translation step (natural language → expression) is not formally verified. Each response includes a trust_boundary object that makes this explicit:
result = client.verify("What is 15% of 200?")

print(result.status)  # "INCONCLUSIVE"
print(result.trust_boundary["deterministic_expression_evaluation"])  # True
print(result.trust_boundary["query_semantics_verified"])  # False
Use the trust_boundary fields to decide how to handle the result in your application. For example, you might accept the computed answer while logging that the translation was not formally proven. See Math Engine — Trust boundary for the full field reference.

Design philosophy

“Probabilistic systems should not be trusted with deterministic tasks.”
QWED treats LLMs as untrusted translators:
  1. LLM converts natural language → formal specification
  2. Deterministic solver verifies the formal specification
  3. If solver cannot verify, result is marked HEURISTIC
  4. If the translation itself cannot be formally verified, result is marked INCONCLUSIVE with a trust_boundary
┌─────────────────────────────────────────────────────────┐
│                    QWED Protocol                        │
├─────────────────────────────────────────────────────────┤
│  User Query → LLM (Translator) → DSL → Solver → Result │
│                                                         │
│  Solver = SymPy/Z3/AST  →  SYMBOLIC                     │
│  Solver = LLM Fallback  →  HEURISTIC                    │
│  Translation unverified  →  INCONCLUSIVE + trust_boundary│
└─────────────────────────────────────────────────────────┘

Trust boundaries in the natural language pipeline

When a query enters through POST /verify/natural_language, an LLM first translates the user’s natural language into a formal expression. Even when the downstream solver (e.g., SymPy) evaluates that expression deterministically, the overall result depends on whether the LLM correctly interpreted the user’s intent — and that step is not deterministic. To make this distinction explicit, the natural language math pipeline now:
  1. Returns INCONCLUSIVE as the top-level status instead of VERIFIED, even when the expression evaluation succeeds.
  2. Includes a trust_boundary object in the response that describes exactly what was and was not proven.
{
  "trust_boundary": {
    "query_interpretation_source": "llm_translation",
    "query_semantics_verified": false,
    "verification_scope": "translated_expression_only",
    "deterministic_expression_evaluation": true,
    "formal_proof": false,
    "translation_claim_self_consistent": true,
    "provider_used": "openai",
    "overall_status": "INCONCLUSIVE"
  }
}
This means a VERIFIED result from the Math Engine’s direct endpoint (POST /verify/math) carries stronger guarantees than the same calculation routed through the natural language pipeline. See the Math Engine documentation for the full field reference.

Numerical sampling fallback

The identity verification fallback (numerical sampling at fixed test points) now returns UNKNOWN with confidence: 0.0 instead of LIKELY_EQUIVALENT with confidence: 0.99. Matching at a handful of sample points does not constitute a formal proof and should not be treated as one.

Frequently Asked Questions

Q: Can I filter responses by verification_mode?

A: Yes! Use the require_symbolic option in your request:
result = client.verify(
    query="Calculate NPV",
    options={"require_symbolic": True}
)
# Fails if solver cannot verify deterministically

Q: Why is Fact verification sometimes HEURISTIC?

A: Fact verification uses TF-IDF (deterministic) to find evidence. If TF-IDF confidence is below threshold, it falls back to NLI (LLM-based), marking the result as HEURISTIC.

Q: Is Consensus useful if it’s HEURISTIC?

A: Yes! Consensus detects disagreement between multiple LLMs. While not deterministic, it catches cases where models are uncertain — useful as a safety net, not a formal verifier.

See Also