Skip to content
> AI agent evaluation framework

Evaluate agents with
terminal precision

No server. No signup. Multi-objective scoring from YAML specs. Deterministic code judges + customizable LLM judges, version-controlled in Git.

agentv
$ agentv eval ./evals/math.yaml
Running 3 eval cases...
PASS addition score: 1.0
PASS multiplication score: 1.0
FAIL division score: 0.4
Results: 2 passed 1 failed
$ agentv compare run-a run-b
Comparing 2 runs...
correctness +12.5% (0.72 -> 0.81)
latency -340ms (1.2s -> 0.86s)
cost +$0.02 ($0.05 -> $0.07)
Overall: improved

Built for your workflow

>_

Local Execution

No cloud dependency. All data stays on your machine. Zero overhead to get started.

[~]

Multi-Objective Scoring

Correctness, latency, cost, and safety measured in a single evaluation run.

{f}

Code + LLM Judges

Deterministic code validators and customizable LLM judges, composable and extensible.

@>

LLM & Agent Targets

Direct LLM providers plus Claude Code, Codex, Pi, Copilot, OpenCode agent targets.

##

Rubric Grading

Structured criteria with weights and auto-generation. Google ADK-style object rubrics.

<=>

A/B Comparison

Compare evaluation runs side-by-side with statistical deltas and regression detection.

Quick Start

1

Install

npm install -g agentv
2

Initialize

agentv init
3

Configure

Copy .env.example to .env and add your API keys.

4

Create an eval

description: Math evaluation
execution:
  target: default

evalcases:
  - id: addition
    expected_outcome: Correctly calculates 15 + 27 = 42
    input_messages:
      - role: user
        content: What is 15 + 27?
5

Run

agentv eval ./evals/example.yaml

How AgentV Compares

Feature AgentV LangWatch LangSmith LangFuse
Setup npm install Cloud account + API key Cloud account + API key Cloud account + API key
Server None (local) Managed cloud Managed cloud Managed cloud
Privacy All local Cloud-hosted Cloud-hosted Cloud-hosted
CLI-first Limited Limited
CI/CD ready Requires API calls Requires API calls Requires API calls
Version control YAML in Git
Evaluators Code + LLM + Custom LLM only LLM + Code LLM only