Skip to main content

Table 2 Cohen’s kappa coefficient for agreement between AI model pairs

From: Performance of the Large Language Models in African rheumatology: a diagnostic test accuracy study of ChatGPT-4, Gemini, Copilot, and Claude artificial intelligence

 

Coefficient Kappa de Cohen

CI à 95%

ChatGPT-4 vs. Gemini

0.45

[0.265–0.652]

ChatGPT-4 vs. Copilot

0.59

[0.405–0.786]

ChatGPT-4 vs. Claude AI

0.47

[0.235–0.721]

Gemini vs. Copilot

0.44

[0.254–0.643]

Gemini vs. Claude AI

0.43

[0.241–0.633]

Copilot vs. Claude AI

0.57

[0.377–0.766]