Evading Data Contamination Detection for Language Models is (too) Easy

Abstract: models are widespread, with their performance on frequently guiding user preferences for one model over another. However, the vast amount of these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination try to address this issue, they overlook the possibility of deliberate contamination

Read more

Related Posts