Evaluation Metrics

Topic related to evaluation-metrics

confident-ai/deepeval

The LLM Evaluation Framework

8,485734

Python

AmenRa/ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

56928

Python

Unbabel/COMET

A Neural Framework for MT Evaluation

61993

Python

AgentOps-AI/agentops

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI

4,606425

Python

huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

1,668300

Python

vectara/open-rag-eval

Open source RAG evaluation package

24813

Python

athina-ai/athina-evals

Python SDK for running evaluations on LLM generated responses

28821

Python

IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

19958

Python

ziqihuangg/Awesome-Evaluation-of-Visual-Generation

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

31618

MantisAI/nervaluate

Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13

18223

Python

hyeonsangjeon/computing-Korean-STT-error-rates

STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지

6210