Decensoring

Several open-source projects and initiatives focus on developing standards and benchmarks for testing and evaluating language models (LLMs) and their performance on various tasks, including prompt-based evaluations.

Hugging Face Evaluate

Open-source library provided by Hugging Face, a popular platform for natural language processing (NLP) models and tools
Includes a collection of evaluation modules and metrics for assessing NLP models, including LLMs
Not exclusively focused on prompt-based evaluation but offers a range of tools and resources for standardized model testing
GitHub: https://github.com/huggingface/evaluate

These projects and initiatives contribute to the development of standardized methods and benchmarks for evaluating LLMs and their performance on various tasks, enabling researchers and developers to compare and assess different models, identify areas for improvement, and push the boundaries of LLM capabilities.