Comparative Study on Response Efficacy of Generative Artificial Intelligence Large Language Model for Elderly Diabetes Mellitus
Keywords:
generative artificial intelligence, chat robot, large language model, senile diabetes mellitus, medical informaticsAbstract
We aimed to evaluate the response accuracy of different generative artificial intelligence (GAI) large language models to common problems of elderly diabetes, so as to compare the performance differences of various AI large language models in the quality of medical information service.
A standardized evaluation question pool containing 10 elderly diabetes related questions was constructed, and then four GAI chat robots using different generative artificial intelligence large language model were selected to answer the questions and score the accuracy of all answers. In addition, the problem is summarized into two dimensions of “diagnosis and evaluation” and “control and treatment”, and the above four GAI big language models are analyzed in these two dimensions.
In general, Moonshot model and Lark model are significantly better than DeepSeek LLM and SparkDesk model in response to common problems of elderly diabetes, with higher accuracy and strong stability, but there is no significant difference in response performance between Moonshot model and Lark model. In addition, in the dimensions of “diagnosis and evaluation” and “control and treatment”, Moonshot model and Lark model have better performance than DeepSeek LLM model and SparkDesk model.