The solution of a mathematical problem is correct only in 2.4% of cases against 97.6%, which were back in March
It seems that the fear that ChatGPT will “take over the world and put all people out of work” can be considered irrelevant at least temporarily. A recent study showed that as of June 2023, the GPT-4 language model underlying the chatbot has become much “sillier” than the same model, but in March 2023. At the same time, the GPT-3.5 language model, on the contrary, has become better in most tasks, although it also has problems.
Researchers at Stanford University asked the chatbot various questions and evaluated the correctness of the answers. Moreover, it was not about some abstract things, but about quite concrete ones. For example, the AI had to answer whether the number 17,077 is prime. In order to better understand the process of “thinking” AI, and at the same time improve the result, the chatbot was asked to describe their calculations step by step. Usually in this mode, the AI is more likely to answer correctly.
ChatGPT abruptly “stupid”. After a recent update
However, this did not help. If back in March the GPT-4 model gave the correct answer in 97.6% of cases, then in June the figure dropped … to 2.4%! That is, almost never the chatbot could not correctly answer the question. In the case of GPT-3.5, the indicator, on the contrary, increased from 7.4% to 86.8%.
Code generation has also deteriorated. The scientists created a data set with 50 simple tasks from LeetCode and measured how many GPT-4 responses were completed without any changes. The March version successfully fixed 52% of issues, but this dropped to 10% with the June model.
By the way, users have also complained about the decrease in the “mental abilities” of ChatGPT in recent weeks. At the same time, it is not yet clear why this happened and whether OpenAI will do something.