yival.evaluators.string_expected_result_evaluator¶

Module: string_expected_result_evaluator.py

This module defines the StringExpectedResultEvaluator class, which is used for evaluating string expected results.

Classes: StringExpectedResultEvaluator: Class for evaluating string expected results.

is_valid_json¶

def is_valid_json(s: str) -> bool

Check if the given string is a valid JSON.

Arguments:

s str - The input string to check.

Returns:

bool - True if the input string is a valid JSON, False otherwise.

StringExpectedResultEvaluator Objects¶

class StringExpectedResultEvaluator(BaseEvaluator)

Class for evaluating string expected results.

This class extends the BaseEvaluator and provides specific implementation for evaluating string expected results using different matching techniques.

Attributes:

config ExpectedResultEvaluatorConfig - Configuration object for the evaluator.

init¶

def __init__(config: ExpectedResultEvaluatorConfig)

Initialize the StringExpectedResultEvaluator with the provided configuration.

Arguments:

config ExpectedResultEvaluatorConfig - Configuration object for the evaluator.

evaluate¶

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the expected result against the actual result using the specified matching technique.

Returns:

EvaluatorOutput - An EvaluatorOutput object containing the evaluation result.

yival.evaluators.alpaca_eval_evaluator¶

yival.evaluators.python_validation_evaluator¶

Python Validation Evaluator Module.

This module provides an implementation of the PythonValidationEvaluator, which evaluates the raw output of an experiment using Python's exec function. The evaluator is designed to validate Python code snippets and determine whether they can be executed without any errors.

Classes: - PythonValidationEvaluator: Evaluates the raw output of an experiment.

PythonValidationEvaluator Objects¶

class PythonValidationEvaluator(BaseEvaluator)

Python Validation Evaluator.

Evaluates the raw output of an experiment by attempting to execute it as Python code. If the code executes without any errors, a positive result is returned. Otherwise, a negative result is returned.

yival.evaluators.base_evaluator¶

Evaluators Module.

This module contains the base class and common methods for evaluators used in experiments. Evaluators are essential components in the system that interpret the results of experiments and provide quantitative or qualitative feedback. Specific evaluators are expected to inherit from the base class and implement custom evaluation logic as needed.

BaseEvaluator Objects¶

class BaseEvaluator(ABC)

Base class for all evaluators.

This class provides the basic structure and methods for evaluators. Specific evaluators should inherit from this class and implement the necessary methods.

init¶

def __init__(config: BaseEvaluatorConfig)

Initialize the evaluator with its configuration.

Arguments:

config BaseEvaluatorConfig - The configuration for the evaluator.

register¶

@classmethod
def register(cls, name: str)

Decorator to register new evaluators.

get_evaluator¶

@classmethod
def get_evaluator(cls, name: str) -> Optional[Type['BaseEvaluator']]

Retrieve evaluator class from registry by its name.

get_default_config¶

@classmethod
def get_default_config(cls, name: str) -> Optional[BaseEvaluatorConfig]

Retrieve the default configuration of an evaluator by its name.

get_config_class¶

@classmethod
def get_config_class(cls, name: str) -> Optional[Type[BaseEvaluatorConfig]]

Retrieve the configuration class of a reader by its name.

evaluate¶

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result and produce an evaluator output.

Arguments:

experiment_result ExperimentResult - The result of an experiment to be evaluated.

Returns:

EvaluatorOutput - The result of the evaluation.

aevaluate¶

async def aevaluate(experiment_result: ExperimentResult) -> Any

Evaluate the experiment result and produce an evaluator output.

Arguments:

experiment_result ExperimentResult - The result of an experiment to be evaluated.

Returns:

EvaluatorOutput - The result of the evaluation.

evaluate_comparison¶

def evaluate_comparison(group_data: List[ExperimentResult]) -> None

Evaluate and compare a list of experiment results.

This method is designed to evaluate multiple experiment results together, allowing for comparisons and potentially identifying trends, anomalies, or other patterns in the set of results.

Arguments:

group_data List[ExperimentResult] - A list of experiment results to be evaluated together.

Notes:

Implementations of this method in subclasses should handle the specifics of how multiple experiments are evaluated and compared.

evaluate_based_on_all_results¶

def evaluate_based_on_all_results(experiment: List[Experiment]) -> None

Evaluate based on the entirety of experiment results.

This method evaluates an entire list of experiments, potentially taking into account all available data to produce a comprehensive evaluation.

Arguments:

experiment List[Experiment] - A list of all experiments to be evaluated.

Notes:

Implementations of this method in subclasses should determine how to best utilize all available experiment data for evaluation.

yival.evaluators.utils¶

fuzzy_match_util¶

def fuzzy_match_util(generated: str,
                     expected: str,
                     threshold: int = 80) -> bool

Matches the generated string with the expected answer(s) using fuzzy matching.

Arguments:

generated str - The generated string.
expected str - The expected answer(s). Can be a string or list of strings.
threshold int, optional - The threshold for fuzzy matching. Defaults to 80.

Returns:

int - Returns 1 if there's a match, 0 otherwise.

yival.evaluators.openai_elo_evaluator¶

Elo Evaluators Module.

This module contains the OpenAIEloEvaluator class, which implements an ELO-based evaluation system. The ELO system is used to rank different model outputs based on human evaluations, and this specific implementation interfaces with the OpenAI API for those evaluations.

K¶

Elo rating constant

OpenAIEloEvaluator Objects¶

class OpenAIEloEvaluator(BaseEvaluator)

OpenAIEloEvaluator is an evaluator that uses the ELO rating system to rank model outputs.

expected_score¶

def expected_score(r1, r2)

Calculate the expected score between two ratings.

yival.evaluators.rouge_evaluator¶

The Rouge Evaluator is an advanced AI tool designed to assess the quality of dialogue models. It uses a unique approach to evaluate the responses generated by these models, focusing on aspects such as relevance, coherence, and fluency.

This tool is particularly useful for developers and researchers working on dialogue systems, as it allows them to measure the effectiveness of their models and make necessary enhancements.

The Rouge Evaluator is a valuable asset for anyone looking to enhance the quality and performance of their dialogue models.

RougeEvaluator Objects¶

class RougeEvaluator(BaseEvaluator)

Evaluator using rouge to calculate rouge score

evaluate¶

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result using rouge evaluat

main¶

def main()

Main function to test the rouge evaluator

yival.evaluators.bertscore_evaluator¶

BERTScore is a language model evaluation metric based on the BERT language model. It leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity. It has been shown to correlate with human judgment on sentence-level and system-level evaluation.

BertScoreEvaluator Objects¶

class BertScoreEvaluator(BaseEvaluator)

Evaluator calculate bert_score

evaluate¶

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result according to bertsocre

main¶

def main()

Main function to test the bertscore evaluator

yival.evaluators.openai_prompt_based_evaluator¶

OpenAIPromptBasedEvaluator is an evaluator that uses OpenAI's prompt-based system for evaluations.

The evaluator interfaces with the OpenAI API to present tasks and interpret the model's responses to determine the quality or correctness of a given experiment result.

extract_choice_from_response¶

def extract_choice_from_response(response: str,
                                 choice_strings: Iterable[str]) -> str

Extracts the choice from the response string.

calculate_choice_score¶

def calculate_choice_score(
        choice: str,
        choice_scores: Optional[Dict[str, float]] = None) -> Optional[float]

Calculates the score for the given choice.

format_template¶

def format_template(
        template: Union[str, List[Dict[str, str]]],
        content: Dict[str, Any]) -> Union[str, List[Dict[str, str]]]

Formats a string or list template with the provided content.

choices_to_string¶

def choices_to_string(choice_strings: Iterable[str]) -> str

Converts a list of choices into a formatted string.

OpenAIPromptBasedEvaluator Objects¶

class OpenAIPromptBasedEvaluator(BaseEvaluator)

Evaluator using OpenAI's prompt-based evaluation.

evaluate¶

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result using OpenAI's prompt-based evaluation.

main¶

def main()

Main function to test the OpenAIPromptBasedEvaluator.

yival.evaluators.string_expected_result_evaluator¶

is_valid_json¶

StringExpectedResultEvaluator Objects¶

__init__¶

evaluate¶

yival.evaluators.alpaca_eval_evaluator¶

yival.evaluators.python_validation_evaluator¶

PythonValidationEvaluator Objects¶

yival.evaluators.base_evaluator¶

BaseEvaluator Objects¶

__init__¶

register¶

get_evaluator¶

get_default_config¶

get_config_class¶

evaluate¶

aevaluate¶

evaluate_comparison¶

evaluate_based_on_all_results¶

yival.evaluators.utils¶

fuzzy_match_util¶

yival.evaluators.openai_elo_evaluator¶

K¶

OpenAIEloEvaluator Objects¶

expected_score¶

yival.evaluators.rouge_evaluator¶

RougeEvaluator Objects¶

evaluate¶

main¶

yival.evaluators.bertscore_evaluator¶

BertScoreEvaluator Objects¶

evaluate¶

main¶

yival.evaluators.openai_prompt_based_evaluator¶

extract_choice_from_response¶

calculate_choice_score¶

format_template¶

choices_to_string¶

OpenAIPromptBasedEvaluator Objects¶

evaluate¶

main¶

init¶

init¶