MorganaBench SDK (Version 0.1)

Welcome to MorganaBench SDK

High-level overview

The MorganaBench SDK provides typed, validated Python schemas (Pydantic models) for representing:

  • Benchmarks (evaluation datasets) for evaluating RAG and agentic systems (includes inputs and expectations)

  • Executed benchmark results (inputs + expectations + outputs)

This SDK focuses on interoperable data shapes rather than on executing benchmarks itself.

What it is for

  • Load MorganaBench benchmark files into typed Python mb.entities.Example objects;

  • Record executed benchmark results by populating Example.outputs for evaluation in MorganaBench;

  • Example shapes are designed to interoperate with MLflow and LangSmith.

Indices and tables

Changelog

We follow Semantic Versioning (semver), where versions are written as x.y.z:

  • xmajor version

  • yminor version

  • zpatch version

Patch updates are always backwards compatible. Major and minor updates may introduce breaking changes.

v0.1.3

  • Add retrieval trace event schema to outputs

  • Add citations to outputs schema

v0.1.1

  • Add the mb SDK package with Pydantic models for benchmark inputs, expectations, outputs, and examples

  • Provide schema and example generators plus a make schema target and generated JSON schema/JSONL docs

  • Add serialization tests for entity models and remove the demo test

  • Update packaging metadata, build backend, and type-checking includes for the SDK

  • Refresh README and add initial API documentation scaffold

v0.1.0

Initial version


Authored by: | Copyright: 2026, TII AIIR Team | Version: 0.1.4