API Reference

class mb.entities.ChatMessage(*, role: str, content: str)[source]: An OpenAI-style message in the conversation.

class mb.entities.DateTimeMatcher(*, match_as: Literal['date_time'] = 'date_time', value: str)[source]

Matches an argument if it matches a given date and / or time.

The match is determined as semantic equivalence, given the actual argument value, and the request time as provided in the output environment.

match_as: Literal['date_time']: Discriminator field

value: str: A natural language description of the date and / or time to compare the argument to.

class mb.entities.EmailMatcher(*, match_as: Literal['email'] = 'email', value: str)[source]

Matches an argument if it matches the given email address.

match_as: Literal['email']: Discriminator field

value: str: The email address to compare the argument to.

class mb.entities.Environment(*, user_time: datetime | None = None, **extra_data: Any)[source]

user_time: datetime | None: The time the user sent the request.

class mb.entities.EqualsMatcher(*, match_as: Literal['equality'] = 'equality', value: str | int | float | bool)[source]

Matches an argument if it equals a given value.

match_as: Literal['equality']: Discriminator field

value: str | int | float | bool: The value to compare the argument to.

class mb.entities.Example(*, inputs: Inputs, expectations: Expectations, outputs: Outputs | None = None, **extra_data: Any)[source]

A benchmark (eval dataset) example.

inputs: Inputs: The input data for the example.

expectations: Expectations: The expectations for the example’s output.

outputs: Outputs | None: The actual output data for the example. Populated after the benchmark has been run.

class mb.entities.Expectations(*, expected_response: str | None = None, assertions: list[~typing.Annotated[~mb.entities.expectations.ToolCallAssertion | ~mb.entities.expectations.NoToolCallAssertion, FieldInfo(annotation=NoneType, required=True, discriminator='assert_that')]] = <factory>, **extra_data: Any)[source]

Agent expectations description.

expected_response: str | None

The expected response to the user’s question.

If not provided, the agent’s response will not be evaluated against any expected response.

assertions: list[Annotated[ToolCallAssertion | NoToolCallAssertion, FieldInfo(annotation=NoneType, required=True, discriminator='assert_that')]]

Assertions about the agent’s output, such as tool call correctness, abstention, guidelines, etc…

Currently, only tool call correctness assertions are supported.

class mb.entities.FreeTextMatcher(*, match_as: Literal['free_text'] = 'free_text', value: str)[source]

Matches an argument if semantically it achieves the same goal as the given free text.

match_as: Literal['free_text']: Discriminator field

value: str: The free text to compare the argument to.

class mb.entities.InputMetadata(*, turns: list[TurnMetadata] | None = None, categories: dict[str, str] | None = None)[source]

turns: list[TurnMetadata] | None

Metadata associated with each conversation turn.

Each pair of user-assistant messages form a turn, except for the last turn, which has only a user message. This list contains the metadata for each turn.

categories: dict[str, str] | None

Categories associated with the entire input, and remain the same for all turns.

For example, user persona attributes.

class mb.entities.Inputs(*, messages: Annotated[list[ChatMessage], MinLen(min_length=1)], metadata: InputMetadata | None = None, tools: list[str] | None = None, **extra_data: Any)[source]

Agent input description.

messages: list[ChatMessage]

The chat messages to be processed by the agent, in OpenAI-style format.

The last message must have a user role, and represents the user’s request for the agent.

Example:

{
  "messages": [
    {"role": "user", "content": "Who is the King of England?"},
    {"role": "assistant", "content": "The King of England is King Charles III."},
    {"role": "user", "content": "When was he born?"}
  ]
}

metadata: InputMetadata | None: Additional metadata about the input, such as categories and resources used for generation.

tools: list[str] | None: A subset of the tools available to the agent. If not provided, the agent will use all available tools.

message_dicts() → list[dict[str, str]][source]

Return messages as OpenAI-style dicts.

This is a convenience for integrations that expect messages in the shape [{“role”: “…”, “content”: “…”}, …] (for example, agent runners).

class mb.entities.MissingMatcher(*, match_as: Literal['missing'] = 'missing')[source]

Matches an argument if it is missing (not provided).

match_as: Literal['missing']: Discriminator field

class mb.entities.NoToolCallAssertion(*, assert_that: Literal['no_tool_called'] = 'no_tool_called')[source]: Assert that no tool call was made.

Passes when a single argument matches the given matcher.

param: str: The name of the parameter to match.

matcher: Annotated[EqualsMatcher | FreeTextMatcher | DateTimeMatcher | EmailMatcher | MissingMatcher | OptionalMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]: The matcher representing a passing assertion.

class mb.entities.OptionalMatcher(*, match_as: Literal['optional'] = 'optional', default: T)[source]

Matches an argument either if it is missing (not provided) or its value matches the given matcher.

match_as: Literal['optional']: Discriminator field

default: T: The matcher to use to match the argument.

class mb.entities.Outputs(*, response: str, citations: list[Citation] | None = None, environment: Environment | None = None, trace: list[Annotated[ToolCall | ToolResult | RetrievalResults, FieldInfo(annotation=NoneType, required=True, discriminator='event')]] = [], **extra_data: Any)[source]

response: str: The agent’s response to the user.

citations: list[Citation] | None: The citations the agent made to justify its response.

environment: Environment | None: Additional environment information, such as the time the user sent the request.

trace: list[Annotated[ToolCall | ToolResult | RetrievalResults, FieldInfo(annotation=NoneType, required=True, discriminator='event')]]

The trace of the agent’s execution events, such as tool calls, tool call results, search, etc.

This is required for some assertions to work correctly, such as tool-call correctness assertions.

class mb.entities.ParameterGroupAssertion(*, params: list[str], matcher: FreeTextMatcher | DateTimeMatcher | OptionalMatcher[Annotated[FreeTextMatcher | DateTimeMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]])[source]

Passes when a group of arguments matches the given matcher.

params: list[str]: The names of the parameters to assert.

matcher: Annotated[FreeTextMatcher | DateTimeMatcher | OptionalMatcher[Annotated[FreeTextMatcher | DateTimeMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]], FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]: The matcher representing a passing assertion.

class mb.entities.ToolCall(*, event: Literal['tool_call'] = 'tool_call', id: str, tool: str, params: dict[str, JsonValue])[source]

event: Literal['tool_call']: The type of event.

id: str: The ID of the tool call.

tool: str: The name of the tool called. Must correspond to one of the tools in the benchmark description file.

params: dict[str, JsonValue]: The parameters passed to the tool.

class mb.entities.ToolCallAssertion(*, assert_that: Literal['tool_called'] = 'tool_called', tool: str, parameters: list[Annotated[Annotated[OneParameterAssertion, Tag(tag=one)] | Annotated[ParameterGroupAssertion, Tag(tag=group)], Discriminator(discriminator=_parameter_assertion_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)]] = [])[source]: Assert that a tool call was made with given parameters.

class mb.entities.ToolResult(*, event: Literal['tool_result'] = 'tool_result', id: str, result: JsonValue)[source]

event: Literal['tool_result']: The type of event.

id: str: The ID of the tool call that yielded this result.

result: JsonValue: The result of the tool call.

class mb.entities.TurnMetadata(*, categories: dict[str, str], resources: list[JsonValue])[source]

categories: dict[str, str]

Categories associated with one turn in the conversation turn.

For example, whether the query is open-ended or factoid, is it concise of verbose.

resources: list[JsonValue]: Resources used for generation, such as documents, API calls, etc.

class mb.entities.ValueMatcher[source]

mb.entities.parameter_assertion(*, param: str, matcher: EqualsMatcher | FreeTextMatcher | DateTimeMatcher | EmailMatcher | MissingMatcher | OptionalMatcher) → Annotated[Annotated[OneParameterAssertion, Tag(tag=one)] | Annotated[ParameterGroupAssertion, Tag(tag=group)], Discriminator(discriminator=_parameter_assertion_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)][source]

mb.entities.parameter_assertion(*, params: list[str], matcher: FreeTextMatcher | DateTimeMatcher | OptionalMatcher[Annotated[FreeTextMatcher | DateTimeMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]]) → Annotated[Annotated[OneParameterAssertion, Tag(tag=one)] | Annotated[ParameterGroupAssertion, Tag(tag=group)], Discriminator(discriminator=_parameter_assertion_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)]

A convenience function to create a parameter assertion.

Examples: `python parameter_assertion(param="name", matcher=EqualsMatcher(value="John")) parameter_assertion(params=["name", "age"], matcher=FreeTextMatcher(value="John, 20 years old")) `