Buonasera, Giuseppe Attardi <atta...@di.unipi.it> writes:
> C’è tutta un’area di ricerca, che passa sotto il nome di BERTology, > che analizza la questione e si dimostra ad esempio che dalle relazioni > presenti nella matrici di attention delle frasi, si può ricavare > l’intero albero sintattico della frase: > https://aclanthology.org/N19-1419.pdf «A Structural Probe for Finding Syntax in Word Representations» John Hewitt, Christopher D. Manning Published 1 June 2019 (via https://www.semanticscholar.org/paper/A-Structural-Probe-for-Finding-Syntax-in-Word-Hewitt-Manning/455a8838cde44f288d456d01c76ede95b56dc675) --8<---------------cut here---------------start------------->8--- Recent work has improved our ability to detect linguistic knowledge in word representations. However, current methods for detecting syntactic knowledge do not test whether syntax trees are represented in their entirety. In this work, we propose a structural probe, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space. The probe identifies a linear transformation under which squared L2 distance encodes the distance between words in the parse tree, and one in which squared L2 norm encodes depth in the parse tree. Using our probe, we show that such transformations exist for both ELMo and BERT but not in baselines, providing evidence that entire syntax trees are embedded implicitly in deep models’ vector geometry. --8<---------------cut here---------------end--------------->8--- Interessante, ma se davvero vogliamo parlare di BERTology allora, ripeto [1] che nel Vol. 8 del Transactions of the Association for Computational Linguistics del MIT (2020) è pubblicato questo articolo: «A Primer in BERTology: What We Know About How BERT Works» by Anna Rogers, Olga Kovaleva, Anna Rumshisky https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT che cita il paper sopra tra le fonti per descrivere le caratteristiche di syntactic knowledge di BERT, mentre più avanti, nel capitolo "3.3 World Knowledge" e "3.4 Limitations" scrive: --8<---------------cut here---------------start------------->8--- However, BERT cannot reason based on its world knowledge. Forbes et al. (2019) show that BERT can “guess” the affordances and properties of many objects, but cannot reason about the relationship between properties and affordances. For example, it “knows” that people can walk into houses, and that houses are big, but it cannot infer that houses are bigger than people. Zhou et al. (2020) and Richardson and Sabharwal (2019) also show that the performance drops with the number of necessary inference steps. Some of BERT’s world knowledge success comes from learning stereotypical associations (Poerner et al., 2019), for example, a person with an Italian-sounding name is predicted to be Italian, even when it is incorrect. 3.4 Limitations Multiple probing studies in section 3 and section 4 report that BERT possesses a surprising amount of syntactic, semantic, and world knowledge. However, Tenney et al. (2019a) remark, “the fact that a linguistic pattern is not observed by our probing classifier does not guarantee that it is not there, and the observation of a pattern does not tell us how it is used.” There is also the issue of how complex a probe should be allowed to be (Liu et al., 2019a). If a more complex probe recovers more information, to what extent are we still relying on the original model? Furthermore, different probing methods may lead to complementary or even contradictory conclusions, which makes a single test (as in most studies) insufficient (Warstadt et al., 2019). A given method might also favor one model over another, for example, RoBERTa trails BERT with one tree extraction method, but leads with another (Htut et al., 2019). The choice of linguistic formalism also matters (Kuznetsov and Gurevych, 2020). [...] --8<---------------cut here---------------end--------------->8--- Nel 2020 i ricercatori citati sopra sostenevano che BERT non è in grado di ragionare. Sullo stesso numero della rivista è pubblicato anche questo articolo: «What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models» by Allyson Ettinger https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00298/43535/What-BERT-Is-Not-Lessons-from-a-New-Suite-of cito dalle conclusioni: --8<---------------cut here---------------start------------->8--- In this paper we have introduced a suite of diagnostic tests for language models to better our understanding of the linguistic competencies acquired by pre-training via language modeling. We draw our tests from psycholinguistic studies, allowing us to target a range of linguistic capacities by testing word prediction accuracies and sensitivity of model probabilities to linguistic distinctions. As a case study, we apply these tests to analyze strengths and weaknesses of the popular BERT model, finding that it shows sensitivity to role reversal and same-category distinctions, albeit less than humans, and it succeeds with noun hypernyms, but it struggles with challenging inferences and role-based event prediction—and it shows clear failures with the meaning of negation. We make all test sets and experiment code available (see Footnote 1), for further experiments. The capacities targeted by these test sets are by no means comprehensive, and future work can build on the foundation of these datasets to expand to other aspects of language processing. Because these sets are small, we must also be conservative in the strength of our conclusions—different formulations may yield different performance, and future work can expand to verify the generality of these results. In parallel, we hope that the weaknesses highlighted by these diagnostics can help to identify areas of need for establishing robust and generalizable models for language understanding. --8<---------------cut here---------------end--------------->8--- Allora ripeto la mia domanda: ci sono nuovi studi che dimostrino che le limitazioni evidenziate nei test sulla competenza logico/linguistica di BERT siano stati risolti da altri LLM? Perché credo che sulla _perfomance_ sintattica degli LLM nessuno abbia proprio nulla da ridire. Saluti, 380° [...] [1] Message-id: 87o7ux2i16....@xelera.eu https://server-nexa.polito.it/pipermail/nexa/2022-September/049508.html -- 380° (Giovanni Biscuolo public alter ego) «Noi, incompetenti come siamo, non abbiamo alcun titolo per suggerire alcunché» Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>.
signature.asc
Description: PGP signature
_______________________________________________ nexa mailing list nexa@server-nexa.polito.it https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa