API Reference
- class mb.entities.ChatMessage(*, role: str, content: str)[source]
An OpenAI-style message in the conversation.
- class mb.entities.DateTimeMatcher(*, match_as: Literal['date_time'] = 'date_time', value: str)[source]
Matches an argument if it matches a given date and / or time.
The match is determined as semantic equivalence, given the actual argument value, and the request time as provided in the output environment.
- match_as: Literal['date_time']
Discriminator field
- value: str
A natural language description of the date and / or time to compare the argument to.
- class mb.entities.EmailMatcher(*, match_as: Literal['email'] = 'email', value: str)[source]
Matches an argument if it matches the given email address.
- match_as: Literal['email']
Discriminator field
- value: str
The email address to compare the argument to.
- class mb.entities.Environment(*, user_time: datetime | None = None, **extra_data: Any)[source]
- user_time: datetime | None
The time the user sent the request.
- class mb.entities.EqualsMatcher(*, match_as: Literal['equality'] = 'equality', value: str | int | float | bool)[source]
Matches an argument if it equals a given value.
- match_as: Literal['equality']
Discriminator field
- value: str | int | float | bool
The value to compare the argument to.
- class mb.entities.Example(*, inputs: Inputs, expectations: Expectations, outputs: Outputs | None = None, **extra_data: Any)[source]
A benchmark (eval dataset) example.
- expectations: Expectations
The expectations for the example’s output.
- class mb.entities.Expectations(*, expected_response: str | None = None, assertions: list[~typing.Annotated[~mb.entities.expectations.ToolCallAssertion | ~mb.entities.expectations.NoToolCallAssertion, FieldInfo(annotation=NoneType, required=True, discriminator='assert_that')]] = <factory>, **extra_data: Any)[source]
Agent expectations description.
- expected_response: str | None
The expected response to the user’s question.
If not provided, the agent’s response will not be evaluated against any expected response.
- assertions: list[Annotated[ToolCallAssertion | NoToolCallAssertion, FieldInfo(annotation=NoneType, required=True, discriminator='assert_that')]]
Assertions about the agent’s output, such as tool call correctness, abstention, guidelines, etc…
Currently, only tool call correctness assertions are supported.
- class mb.entities.FreeTextMatcher(*, match_as: Literal['free_text'] = 'free_text', value: str)[source]
Matches an argument if semantically it achieves the same goal as the given free text.
- match_as: Literal['free_text']
Discriminator field
- value: str
The free text to compare the argument to.
- class mb.entities.InputMetadata(*, turns: list[TurnMetadata] | None = None, categories: dict[str, str] | None = None)[source]
- turns: list[TurnMetadata] | None
Metadata associated with each conversation turn.
Each pair of user-assistant messages form a turn, except for the last turn, which has only a user message. This list contains the metadata for each turn.
- categories: dict[str, str] | None
Categories associated with the entire input, and remain the same for all turns.
For example, user persona attributes.
- class mb.entities.Inputs(*, messages: Annotated[list[ChatMessage], MinLen(min_length=1)], metadata: InputMetadata | None = None, tools: list[str] | None = None, **extra_data: Any)[source]
Agent input description.
- messages: list[ChatMessage]
The chat messages to be processed by the agent, in OpenAI-style format.
The last message must have a
userrole, and represents the user’s request for the agent.Example:
{ "messages": [ {"role": "user", "content": "Who is the King of England?"}, {"role": "assistant", "content": "The King of England is King Charles III."}, {"role": "user", "content": "When was he born?"} ] }
- metadata: InputMetadata | None
Additional metadata about the input, such as categories and resources used for generation.
- tools: list[str] | None
A subset of the tools available to the agent. If not provided, the agent will use all available tools.
- class mb.entities.MissingMatcher(*, match_as: Literal['missing'] = 'missing')[source]
Matches an argument if it is missing (not provided).
- match_as: Literal['missing']
Discriminator field
- class mb.entities.NoToolCallAssertion(*, assert_that: Literal['no_tool_called'] = 'no_tool_called')[source]
Assert that no tool call was made.
- class mb.entities.OneParameterAssertion(*, param: str, matcher: EqualsMatcher | FreeTextMatcher | DateTimeMatcher | EmailMatcher | MissingMatcher | OptionalMatcher)[source]
Passes when a single argument matches the given matcher.
- param: str
The name of the parameter to match.
- matcher: Annotated[EqualsMatcher | FreeTextMatcher | DateTimeMatcher | EmailMatcher | MissingMatcher | OptionalMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]
The matcher representing a passing assertion.
- class mb.entities.OptionalMatcher(*, match_as: Literal['optional'] = 'optional', default: T)[source]
Matches an argument either if it is missing (not provided) or its value matches the given matcher.
- match_as: Literal['optional']
Discriminator field
- default: T
The matcher to use to match the argument.
- class mb.entities.Outputs(*, response: str, citations: list[Citation] | None = None, environment: Environment | None = None, trace: list[Annotated[ToolCall | ToolResult | RetrievalResults, FieldInfo(annotation=NoneType, required=True, discriminator='event')]] = [], **extra_data: Any)[source]
- response: str
The agent’s response to the user.
- citations: list[Citation] | None
The citations the agent made to justify its response.
- environment: Environment | None
Additional environment information, such as the time the user sent the request.
- trace: list[Annotated[ToolCall | ToolResult | RetrievalResults, FieldInfo(annotation=NoneType, required=True, discriminator='event')]]
The trace of the agent’s execution events, such as tool calls, tool call results, search, etc.
This is required for some assertions to work correctly, such as tool-call correctness assertions.
- class mb.entities.ParameterGroupAssertion(*, params: list[str], matcher: FreeTextMatcher | DateTimeMatcher | OptionalMatcher[Annotated[FreeTextMatcher | DateTimeMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]])[source]
Passes when a group of arguments matches the given matcher.
- params: list[str]
The names of the parameters to assert.
- matcher: Annotated[FreeTextMatcher | DateTimeMatcher | OptionalMatcher[Annotated[FreeTextMatcher | DateTimeMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]], FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]
The matcher representing a passing assertion.
- class mb.entities.ToolCall(*, event: Literal['tool_call'] = 'tool_call', id: str, tool: str, params: dict[str, JsonValue])[source]
- event: Literal['tool_call']
The type of event.
- id: str
The ID of the tool call.
- tool: str
The name of the tool called. Must correspond to one of the tools in the benchmark description file.
- params: dict[str, JsonValue]
The parameters passed to the tool.
- class mb.entities.ToolCallAssertion(*, assert_that: Literal['tool_called'] = 'tool_called', tool: str, parameters: list[Annotated[Annotated[OneParameterAssertion, Tag(tag=one)] | Annotated[ParameterGroupAssertion, Tag(tag=group)], Discriminator(discriminator=_parameter_assertion_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)]] = [])[source]
Assert that a tool call was made with given parameters.
- class mb.entities.ToolResult(*, event: Literal['tool_result'] = 'tool_result', id: str, result: JsonValue)[source]
- event: Literal['tool_result']
The type of event.
- id: str
The ID of the tool call that yielded this result.
- result: JsonValue
The result of the tool call.
- class mb.entities.TurnMetadata(*, categories: dict[str, str], resources: list[JsonValue])[source]
- categories: dict[str, str]
Categories associated with one turn in the conversation turn.
For example, whether the query is open-ended or factoid, is it concise of verbose.
- resources: list[JsonValue]
Resources used for generation, such as documents, API calls, etc.
- mb.entities.parameter_assertion(*, param: str, matcher: EqualsMatcher | FreeTextMatcher | DateTimeMatcher | EmailMatcher | MissingMatcher | OptionalMatcher) Annotated[Annotated[OneParameterAssertion, Tag(tag=one)] | Annotated[ParameterGroupAssertion, Tag(tag=group)], Discriminator(discriminator=_parameter_assertion_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)][source]
- mb.entities.parameter_assertion(*, params: list[str], matcher: FreeTextMatcher | DateTimeMatcher | OptionalMatcher[Annotated[FreeTextMatcher | DateTimeMatcher, FieldInfo(annotation=NoneType, required=True, discriminator='match_as')]]) Annotated[Annotated[OneParameterAssertion, Tag(tag=one)] | Annotated[ParameterGroupAssertion, Tag(tag=group)], Discriminator(discriminator=_parameter_assertion_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)]
A convenience function to create a parameter assertion.
Examples:
`python parameter_assertion(param="name", matcher=EqualsMatcher(value="John")) parameter_assertion(params=["name", "age"], matcher=FreeTextMatcher(value="John, 20 years old")) `