Re: [nexa] GTP-4 passa l'esame da avvocato

Daniela Tafani Thu, 16 Mar 2023 00:34:57 -0700

"The judges did not use ChatGPT in an informed or responsible manner"


"The short answer is that they used ChatGPT as if it was an oracle: a 
trustworthy source of knowledge that did not require any sort of verification. 
While the judges were transparent about the fact that they used the tool and 
included quotation marks to distinguish the content produced by ChatGPT, their 
use was not informed nor responsible.

There are three main reasons why the way ChatGPT was used by the judiciary in 
these cases is very concerning for Colombians and beyond."


"First, the stakes in judicial rulings are too high – especially when human 
rights are involved – to justify the use of unreliable and insufficiently 
tested technologies. Due to the way that LLMs like ChatGPT are developed and 
operate, these tools tend to produce incorrect and imprecise answers and 
confuse reality with fiction<https://dl.acm.org/doi/10.1145/3442188.3445922>. 
Even the CEO of OpenAI acknowledged in December 
2022<https://twitter.com/sama/status/1601731295792414720?s=20>, that “ChatGPT 
is incredibly limited (…) it’s a mistake to be relying on it for anything 
important right now”. Furthermore, due to structural reasons, it is unlikely 
that these problems of 
LLMs<https://www.theverge.com/2023/2/9/23592647/ai-search-bing-bard-chatgpt-microsoft-google-problems-challenges>
 will be solved soon."


"Secondly, ChatGPT’s answers should not have been accepted and taken at face 
value, but rather contrasted with other more reliable sources"


"The judgement states that the information offered by ChatGPT would be 
“corroborated”. However, there is no explicit trace in the text that allows us 
to conclude that judge Padilla or his clerk effectively checked whether 
ChatGPT’s responses were accurate. In fact, I replicated the four 
queries<https://twitter.com/JuanDGut/status/1620579558742429697?s=20> posed by 
judge Padilla and the chatbot answered slightly differently, a result that is 
not surprising given how the tool works. Furthermore, when I prompted ChatGPT 
to provide examples of case law from the Constitutional 
Court<https://twitter.com/JuanDGut/status/1620589338085171200?s=20> that 
justified its answers, the chatbot invented the facts and ratio decidendi of 
one ruling, and cited a judgement that did not exist (inventing the facts and 
the verdict"

"If ChatGPT and other LLMs currently available are evidently unreliable, since 
their outputs tend to include incorrect and false information, then judges 
would require significant time to check the validity of the AI-generated 
content, thereby undoing any significant “time savings”. As it happens with AI 
in other areas, under the narrative of supposed “efficiencies”, fundamental 
rights can be put at risk.
Finally, there is a risk that the judges and its clerks over-rely on the AI’s 
recommendations, incurring what is known as “automation bias”."


https://verfassungsblog.de/colombian-chatgpt/ (grassetti miei)


Buona giornata,
Daniela


________________________________
Da: nexa <nexa-boun...@server-nexa.polito.it> per conto di Guido Noto La Diega 
<noto.la.di...@gmail.com>
Inviato: mercoledì 15 marzo 2023 21:25
A: nexa@server-nexa.polito.it
Oggetto: [nexa] GTP-4 passa l'esame da avvocato

"In this paper, we experimentally evaluate the zero-shot performance of a 
preliminary version of GPT-4 against prior generations of GPT on the entire 
Uniform Bar Examination (UBE), including not only the multiple-choice 
Multistate Bar Examination (MBE), but also the open-ended Multistate Essay Exam 
(MEE) and Multistate Performance Test (MPT) components. On the MBE, GPT-4 
significantly outperforms both human test-takers and prior models, 
demonstrating a 26% increase over ChatGPT and beating humans in five of seven 
subject areas. On the MEE and MPT, which have not previously been evaluated by 
scholars, GPT-4 scores an average of 4.2/6.0 as compared to much lower scores 
for ChatGPT. Graded across the UBE components, in the manner in which a human 
tast-taker would be, GPT-4 scores approximately 297 points, significantly in 
excess of the passing threshold for all UBE jurisdictions. These findings 
document not just the rapid and remarkable advance of large language model 
performance generally, but also the potential for such models to support the 
delivery of legal services in society."
https://es.sonicurlprotection-fra.com/click?PV=2&MSGID=202303152025510605658&URLID=2&ESV=10.0.19.7431&IV=041A13A38538CA0E3BDA180205C19E01&TT=1678911952061&ESN=EHBGvdq0gWgsXAWrWfJu6k7TkYUmKr%2FY26vTm9%2BcS04%3D&KV=1536961729280&B64_ENCODED_URL=aHR0cHM6Ly9wYXBlcnMuc3Nybi5jb20vc29sMy9wYXBlcnMuY2ZtP2Fic3RyYWN0X2lkPTQzODkyMzM&HK=95F77368E2EDECBCE251BA624CD4EB22E0B5D2A5C0BBF45E310C2D0A5AA8B885

Sent from Outlook for 
Android<https://es.sonicurlprotection-fra.com/click?PV=2&MSGID=202303152025510605658&URLID=1&ESV=10.0.19.7431&IV=9F7B54F36E7CF84C8F027BBC6E46CAAE&TT=1678911952061&ESN=HnZv1uxPpjhsj3%2FacBeDhzcPb3%2BeApIL%2FYrrbAsrKgM%3D&KV=1536961729280&B64_ENCODED_URL=aHR0cHM6Ly9ha2EubXMvQUFiOXlzZw&HK=186388273BD1DBF2BBD4F05D166D58B90697CDCDF541B39E15B37A5BF2D70F74>

_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa

Re: [nexa] GTP-4 passa l'esame da avvocato

Reply via email to