MorganaBench SDK

User Guide:

  • MorganaBench JSONL format overview
    • Before we start: JSONL format
    • 1) Hello world: one turn, no tools
      • Benchmark JSONL (provided by MorganaBench): inputs + expectations
      • Executed benchmark JSONL: add outputs.response
    • 2) Add a trace for retrieval evaluation
    • 3) Assertions about tool calls
    • 4) To verify tool assertions, add tool events to the trace (executed benchmark)
    • 5) Multi-turn conversations: history vs the last turn
    • 6) Citations: how to represent them in executed benchmarks
    • Python SDK: load and write JSONL with Example
    • Additional resources
  • Tutorial: OpenAI ADK + MLflow, end-to-end
    • Install dependencies
    • 1) Local: load benchmark JSONL, run an OpenAI ADK agent, write only outputs.response
      • Example script
    • 2) MLflow: upload the benchmark, run the OpenAI ADK agent from predict_fn (no trace)
      • Start MLflow with a SQL backend (SQLite)
      • Create the dataset and evaluate
    • 3) Add tracing: modify predict_fn to emit a MorganaBench trace
      • Trace-building predict_fn (from streamed run events)
      • What is included (and what is not)
  • Schema and examples (non-Python usage)
    • JSON Schema (Example)
    • Example benchmark JSONL (unexecuted)
    • Example executed benchmark JSONL

API Reference:

  • API Reference
    • ChatMessage
    • DateTimeMatcher
      • DateTimeMatcher.match_as
      • DateTimeMatcher.value
    • EmailMatcher
      • EmailMatcher.match_as
      • EmailMatcher.value
    • Environment
      • Environment.user_time
    • EqualsMatcher
      • EqualsMatcher.match_as
      • EqualsMatcher.value
    • Example
      • Example.inputs
      • Example.expectations
      • Example.outputs
    • Expectations
      • Expectations.expected_response
      • Expectations.assertions
    • FreeTextMatcher
      • FreeTextMatcher.match_as
      • FreeTextMatcher.value
    • InputMetadata
      • InputMetadata.turns
      • InputMetadata.categories
    • Inputs
      • Inputs.messages
      • Inputs.metadata
      • Inputs.tools
      • Inputs.message_dicts()
    • MissingMatcher
      • MissingMatcher.match_as
    • NoToolCallAssertion
    • OneParameterAssertion
      • OneParameterAssertion.param
      • OneParameterAssertion.matcher
    • OptionalMatcher
      • OptionalMatcher.match_as
      • OptionalMatcher.default
    • Outputs
      • Outputs.response
      • Outputs.citations
      • Outputs.environment
      • Outputs.trace
    • ParameterGroupAssertion
      • ParameterGroupAssertion.params
      • ParameterGroupAssertion.matcher
    • ToolCall
      • ToolCall.event
      • ToolCall.id
      • ToolCall.tool
      • ToolCall.params
    • ToolCallAssertion
    • ToolResult
      • ToolResult.event
      • ToolResult.id
      • ToolResult.result
    • TurnMetadata
      • TurnMetadata.categories
      • TurnMetadata.resources
    • ValueMatcher
    • parameter_assertion()
MorganaBench SDK
  • Overview: module code

All modules for which code is available

  • mb.entities.example
  • mb.entities.expectations
  • mb.entities.inputs
  • mb.entities.outputs

© Copyright 2026, TII AIIR Team.

Built with Sphinx using a theme provided by Read the Docs.