MorganaBench SDK (Version 0.1)

Welcome to MorganaBench SDK

High-level overview

The MorganaBench SDK provides typed, validated Python schemas (Pydantic models) for representing:

Benchmarks (evaluation datasets) for evaluating RAG and agentic systems (includes inputs and expectations)
Executed benchmark results (inputs + expectations + outputs)

This SDK focuses on interoperable data shapes rather than on executing benchmarks itself.

What it is for

Load MorganaBench benchmark files into typed Python mb.entities.Example objects;
Record executed benchmark results by populating Example.outputs for evaluation in MorganaBench;
Example shapes are designed to interoperate with MLflow and LangSmith.

User Guide:

Indices and tables

Changelog

We follow Semantic Versioning (semver), where versions are written as x.y.z:

x — major version
y — minor version
z — patch version

Patch updates are always backwards compatible. Major and minor updates may introduce breaking changes.

v0.1.3

Add retrieval trace event schema to outputs
Add citations to outputs schema

v0.1.1

Add the mb SDK package with Pydantic models for benchmark inputs, expectations, outputs, and examples
Provide schema and example generators plus a make schema target and generated JSON schema/JSONL docs
Add serialization tests for entity models and remove the demo test
Update packaging metadata, build backend, and type-checking includes for the SDK
Refresh README and add initial API documentation scaffold

v0.1.0

Initial version

Authored by: | Copyright: 2026, TII AIIR Team | Version: 0.1.4