"The judges did not use ChatGPT in an informed or responsible manner"
"The short answer is that they used ChatGPT as if it was an oracle: a trustworthy source of knowledge that did not require any sort of verification. While the judges were transparent about the fact that they used the tool and included quotation marks to distinguish the content produced by ChatGPT, their use was not informed nor responsible. There are three main reasons why the way ChatGPT was used by the judiciary in these cases is very concerning for Colombians and beyond." "First, the stakes in judicial rulings are too high – especially when human rights are involved – to justify the use of unreliable and insufficiently tested technologies. Due to the way that LLMs like ChatGPT are developed and operate, these tools tend to produce incorrect and imprecise answers and confuse reality with fiction<https://dl.acm.org/doi/10.1145/3442188.3445922>. Even the CEO of OpenAI acknowledged in December 2022<https://twitter.com/sama/status/1601731295792414720?s=20>, that “ChatGPT is incredibly limited (…) it’s a mistake to be relying on it for anything important right now”. Furthermore, due to structural reasons, it is unlikely that these problems of LLMs<https://www.theverge.com/2023/2/9/23592647/ai-search-bing-bard-chatgpt-microsoft-google-problems-challenges> will be solved soon." "Secondly, ChatGPT’s answers should not have been accepted and taken at face value, but rather contrasted with other more reliable sources" "The judgement states that the information offered by ChatGPT would be “corroborated”. However, there is no explicit trace in the text that allows us to conclude that judge Padilla or his clerk effectively checked whether ChatGPT’s responses were accurate. In fact, I replicated the four queries<https://twitter.com/JuanDGut/status/1620579558742429697?s=20> posed by judge Padilla and the chatbot answered slightly differently, a result that is not surprising given how the tool works. Furthermore, when I prompted ChatGPT to provide examples of case law from the Constitutional Court<https://twitter.com/JuanDGut/status/1620589338085171200?s=20> that justified its answers, the chatbot invented the facts and ratio decidendi of one ruling, and cited a judgement that did not exist (inventing the facts and the verdict" "If ChatGPT and other LLMs currently available are evidently unreliable, since their outputs tend to include incorrect and false information, then judges would require significant time to check the validity of the AI-generated content, thereby undoing any significant “time savings”. As it happens with AI in other areas, under the narrative of supposed “efficiencies”, fundamental rights can be put at risk. Finally, there is a risk that the judges and its clerks over-rely on the AI’s recommendations, incurring what is known as “automation bias”." https://verfassungsblog.de/colombian-chatgpt/ (grassetti miei) Buona giornata, Daniela ________________________________ Da: nexa <nexa-boun...@server-nexa.polito.it> per conto di Guido Noto La Diega <noto.la.di...@gmail.com> Inviato: mercoledì 15 marzo 2023 21:25 A: nexa@server-nexa.polito.it Oggetto: [nexa] GTP-4 passa l'esame da avvocato "In this paper, we experimentally evaluate the zero-shot performance of a preliminary version of GPT-4 against prior generations of GPT on the entire Uniform Bar Examination (UBE), including not only the multiple-choice Multistate Bar Examination (MBE), but also the open-ended Multistate Essay Exam (MEE) and Multistate Performance Test (MPT) components. On the MBE, GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas. On the MEE and MPT, which have not previously been evaluated by scholars, GPT-4 scores an average of 4.2/6.0 as compared to much lower scores for ChatGPT. Graded across the UBE components, in the manner in which a human tast-taker would be, GPT-4 scores approximately 297 points, significantly in excess of the passing threshold for all UBE jurisdictions. These findings document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society." https://es.sonicurlprotection-fra.com/click?PV=2&MSGID=202303152025510605658&URLID=2&ESV=10.0.19.7431&IV=041A13A38538CA0E3BDA180205C19E01&TT=1678911952061&ESN=EHBGvdq0gWgsXAWrWfJu6k7TkYUmKr%2FY26vTm9%2BcS04%3D&KV=1536961729280&B64_ENCODED_URL=aHR0cHM6Ly9wYXBlcnMuc3Nybi5jb20vc29sMy9wYXBlcnMuY2ZtP2Fic3RyYWN0X2lkPTQzODkyMzM&HK=95F77368E2EDECBCE251BA624CD4EB22E0B5D2A5C0BBF45E310C2D0A5AA8B885 Sent from Outlook for Android<https://es.sonicurlprotection-fra.com/click?PV=2&MSGID=202303152025510605658&URLID=1&ESV=10.0.19.7431&IV=9F7B54F36E7CF84C8F027BBC6E46CAAE&TT=1678911952061&ESN=HnZv1uxPpjhsj3%2FacBeDhzcPb3%2BeApIL%2FYrrbAsrKgM%3D&KV=1536961729280&B64_ENCODED_URL=aHR0cHM6Ly9ha2EubXMvQUFiOXlzZw&HK=186388273BD1DBF2BBD4F05D166D58B90697CDCDF541B39E15B37A5BF2D70F74>
_______________________________________________ nexa mailing list nexa@server-nexa.polito.it https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa