Skip to content

yival.evaluators.string_expected_result_evaluator

Module: string_expected_result_evaluator.py

This module defines the StringExpectedResultEvaluator class, which is used for evaluating string expected results.

Classes: StringExpectedResultEvaluator: Class for evaluating string expected results.

is_valid_json

def is_valid_json(s: str) -> bool

Check if the given string is a valid JSON.

Arguments:

  • s str - The input string to check.

Returns:

  • bool - True if the input string is a valid JSON, False otherwise.

StringExpectedResultEvaluator Objects

class StringExpectedResultEvaluator(BaseEvaluator)

Class for evaluating string expected results.

This class extends the BaseEvaluator and provides specific implementation for evaluating string expected results using different matching techniques.

Attributes:

  • config ExpectedResultEvaluatorConfig - Configuration object for the evaluator.

__init__

def __init__(config: ExpectedResultEvaluatorConfig)

Initialize the StringExpectedResultEvaluator with the provided configuration.

Arguments:

  • config ExpectedResultEvaluatorConfig - Configuration object for the evaluator.

evaluate

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the expected result against the actual result using the specified matching technique.

Returns:

  • EvaluatorOutput - An EvaluatorOutput object containing the evaluation result.

yival.evaluators.alpaca_eval_evaluator

yival.evaluators.python_validation_evaluator

Python Validation Evaluator Module.

This module provides an implementation of the PythonValidationEvaluator, which evaluates the raw output of an experiment using Python's exec function. The evaluator is designed to validate Python code snippets and determine whether they can be executed without any errors.

Classes: - PythonValidationEvaluator: Evaluates the raw output of an experiment.

PythonValidationEvaluator Objects

class PythonValidationEvaluator(BaseEvaluator)

Python Validation Evaluator.

Evaluates the raw output of an experiment by attempting to execute it as Python code. If the code executes without any errors, a positive result is returned. Otherwise, a negative result is returned.

yival.evaluators.base_evaluator

Evaluators Module.

This module contains the base class and common methods for evaluators used in experiments. Evaluators are essential components in the system that interpret the results of experiments and provide quantitative or qualitative feedback. Specific evaluators are expected to inherit from the base class and implement custom evaluation logic as needed.

BaseEvaluator Objects

class BaseEvaluator(ABC)

Base class for all evaluators.

This class provides the basic structure and methods for evaluators. Specific evaluators should inherit from this class and implement the necessary methods.

__init__

def __init__(config: BaseEvaluatorConfig)

Initialize the evaluator with its configuration.

Arguments:

  • config BaseEvaluatorConfig - The configuration for the evaluator.

register

@classmethod
def register(cls, name: str)

Decorator to register new evaluators.

get_evaluator

@classmethod
def get_evaluator(cls, name: str) -> Optional[Type['BaseEvaluator']]

Retrieve evaluator class from registry by its name.

get_default_config

@classmethod
def get_default_config(cls, name: str) -> Optional[BaseEvaluatorConfig]

Retrieve the default configuration of an evaluator by its name.

get_config_class

@classmethod
def get_config_class(cls, name: str) -> Optional[Type[BaseEvaluatorConfig]]

Retrieve the configuration class of a reader by its name.

evaluate

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result and produce an evaluator output.

Arguments:

  • experiment_result ExperimentResult - The result of an experiment to be evaluated.

Returns:

  • EvaluatorOutput - The result of the evaluation.

aevaluate

async def aevaluate(experiment_result: ExperimentResult) -> Any

Evaluate the experiment result and produce an evaluator output.

Arguments:

  • experiment_result ExperimentResult - The result of an experiment to be evaluated.

Returns:

  • EvaluatorOutput - The result of the evaluation.

evaluate_comparison

def evaluate_comparison(group_data: List[ExperimentResult]) -> None

Evaluate and compare a list of experiment results.

This method is designed to evaluate multiple experiment results together, allowing for comparisons and potentially identifying trends, anomalies, or other patterns in the set of results.

Arguments:

  • group_data List[ExperimentResult] - A list of experiment results to be evaluated together.

Notes:

Implementations of this method in subclasses should handle the specifics of how multiple experiments are evaluated and compared.

evaluate_based_on_all_results

def evaluate_based_on_all_results(experiment: List[Experiment]) -> None

Evaluate based on the entirety of experiment results.

This method evaluates an entire list of experiments, potentially taking into account all available data to produce a comprehensive evaluation.

Arguments:

  • experiment List[Experiment] - A list of all experiments to be evaluated.

Notes:

Implementations of this method in subclasses should determine how to best utilize all available experiment data for evaluation.

yival.evaluators.utils

fuzzy_match_util

def fuzzy_match_util(generated: str,
                     expected: str,
                     threshold: int = 80) -> bool

Matches the generated string with the expected answer(s) using fuzzy matching.

Arguments:

  • generated str - The generated string.
  • expected str - The expected answer(s). Can be a string or list of strings.
  • threshold int, optional - The threshold for fuzzy matching. Defaults to 80.

Returns:

  • int - Returns 1 if there's a match, 0 otherwise.

yival.evaluators.openai_elo_evaluator

Elo Evaluators Module.

This module contains the OpenAIEloEvaluator class, which implements an ELO-based evaluation system. The ELO system is used to rank different model outputs based on human evaluations, and this specific implementation interfaces with the OpenAI API for those evaluations.

K

Elo rating constant

OpenAIEloEvaluator Objects

class OpenAIEloEvaluator(BaseEvaluator)

OpenAIEloEvaluator is an evaluator that uses the ELO rating system to rank model outputs.

expected_score

def expected_score(r1, r2)

Calculate the expected score between two ratings.

yival.evaluators.rouge_evaluator

The Rouge Evaluator is an advanced AI tool designed to assess the quality of dialogue models. It uses a unique approach to evaluate the responses generated by these models, focusing on aspects such as relevance, coherence, and fluency.

This tool is particularly useful for developers and researchers working on dialogue systems, as it allows them to measure the effectiveness of their models and make necessary enhancements.

The Rouge Evaluator is a valuable asset for anyone looking to enhance the quality and performance of their dialogue models.

RougeEvaluator Objects

class RougeEvaluator(BaseEvaluator)

Evaluator using rouge to calculate rouge score

evaluate

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result using rouge evaluat

main

def main()

Main function to test the rouge evaluator

yival.evaluators.bertscore_evaluator

BERTScore is a language model evaluation metric based on the BERT language model. It leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity. It has been shown to correlate with human judgment on sentence-level and system-level evaluation.

BertScoreEvaluator Objects

class BertScoreEvaluator(BaseEvaluator)

Evaluator calculate bert_score

evaluate

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result according to bertsocre

main

def main()

Main function to test the bertscore evaluator

yival.evaluators.openai_prompt_based_evaluator

OpenAIPromptBasedEvaluator is an evaluator that uses OpenAI's prompt-based system for evaluations.

The evaluator interfaces with the OpenAI API to present tasks and interpret the model's responses to determine the quality or correctness of a given experiment result.

extract_choice_from_response

def extract_choice_from_response(response: str,
                                 choice_strings: Iterable[str]) -> str

Extracts the choice from the response string.

calculate_choice_score

def calculate_choice_score(
        choice: str,
        choice_scores: Optional[Dict[str, float]] = None) -> Optional[float]

Calculates the score for the given choice.

format_template

def format_template(
        template: Union[str, List[Dict[str, str]]],
        content: Dict[str, Any]) -> Union[str, List[Dict[str, str]]]

Formats a string or list template with the provided content.

choices_to_string

def choices_to_string(choice_strings: Iterable[str]) -> str

Converts a list of choices into a formatted string.

OpenAIPromptBasedEvaluator Objects

class OpenAIPromptBasedEvaluator(BaseEvaluator)

Evaluator using OpenAI's prompt-based evaluation.

evaluate

def evaluate(experiment_result: ExperimentResult) -> EvaluatorOutput

Evaluate the experiment result using OpenAI's prompt-based evaluation.

main

def main()

Main function to test the OpenAIPromptBasedEvaluator.