BLUFF-1000 is a comprehensive benchmark that evaluates the factuality, faithfulness, and uncertainty expression abilities of RAG models. It contains 500 questions and 1000 evaluation instances.
BLUFF-1000 is a comprehensive benchmark that evaluates the factuality, faithfulness, and uncertainty expression abilities of RAG models. It contains 500 questions and 1000 evaluation instances.