[Wikimedia-l] Re: Bing-ChatGPT

Kim Bruning via Wikimedia-l Mon, 20 Mar 2023 21:28:24 -0700

On Sun, Mar 19, 2023 at 02:48:12AM -0700, Lauren Worden wrote:
> 
> They have, and LLMs absolutely do encode a verbatim copy of their
> training data, which can be produced intact with little effort.


> https://arxiv.org/pdf/2205.10770.pdf
> https://bair.berkeley.edu/blog/2020/12/20/lmmem/

My understanding so far is that encoding a verbatim copy is typically due to 
'Overfitting'.

This is considered a type of bug. It is undesirable for many reasons
(technical, ethical, legal).

Models are (supposed to be) trained to prevent this as much as possible.

Clearly there was still work to be done in dec 2020 at the least. 

sincerely,
        Kim Bruning
_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/5PNCR3KVBCEEKYT6I3J6VZKFE7NFIGB2/
To unsubscribe send an email to [email protected]

[Wikimedia-l] Re: Bing-ChatGPT

Reply via email to