BMC Rheumatology

Table 2 Cohen’s kappa coefficient for agreement between AI model pairs

From: Performance of the Large Language Models in African rheumatology: a diagnostic test accuracy study of ChatGPT-4, Gemini, Copilot, and Claude artificial intelligence

	Coefficient Kappa de Cohen	CI à 95%
ChatGPT-4 vs. Gemini	0.45	[0.265–0.652]
ChatGPT-4 vs. Copilot	0.59	[0.405–0.786]
ChatGPT-4 vs. Claude AI	0.47	[0.235–0.721]
Gemini vs. Copilot	0.44	[0.254–0.643]
Gemini vs. Claude AI	0.43	[0.241–0.633]
Copilot vs. Claude AI	0.57	[0.377–0.766]

Back to article page

ISSN: 2520-1026

Contact us

Submission enquiries: bmcrheumatology@biomedcentral.com
General enquiries: journalsubmissions@springernature.com