Buonasera,

Giuseppe Attardi <atta...@di.unipi.it> writes:

> C’è tutta un’area di ricerca, che passa sotto il nome di BERTology,
> che analizza la questione e si dimostra ad esempio che dalle relazioni
> presenti nella matrici di attention delle frasi, si può ricavare
> l’intero albero sintattico della frase:
> https://aclanthology.org/N19-1419.pdf

«A Structural Probe for Finding Syntax in Word Representations»
John Hewitt, Christopher D. Manning
Published 1 June 2019
(via
https://www.semanticscholar.org/paper/A-Structural-Probe-for-Finding-Syntax-in-Word-Hewitt-Manning/455a8838cde44f288d456d01c76ede95b56dc675)

--8<---------------cut here---------------start------------->8---

Recent work has improved our ability to detect linguistic knowledge in
word representations. However, current methods for detecting syntactic
knowledge do not test whether syntax trees are represented in their
entirety. In this work, we propose a structural probe, which evaluates
whether syntax trees are embedded in a linear transformation of a neural
network’s word representation space. The probe identifies a linear
transformation under which squared L2 distance encodes the distance
between words in the parse tree, and one in which squared L2 norm
encodes depth in the parse tree. Using our probe, we show that such
transformations exist for both ELMo and BERT but not in baselines,
providing evidence that entire syntax trees are embedded implicitly in
deep models’ vector geometry.

--8<---------------cut here---------------end--------------->8---

Interessante, ma se davvero vogliamo parlare di BERTology allora, ripeto
[1] che nel Vol. 8 del Transactions of the Association for Computational
Linguistics del MIT (2020) è pubblicato questo articolo:

«A Primer in BERTology: What We Know About How BERT Works»
by Anna Rogers, Olga Kovaleva, Anna Rumshisky

https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT

che cita il paper sopra tra le fonti per descrivere le caratteristiche
di syntactic knowledge di BERT, mentre più avanti, nel capitolo "3.3
World Knowledge" e "3.4 Limitations" scrive:

--8<---------------cut here---------------start------------->8---

However, BERT cannot reason based on its world knowledge. Forbes et
al. (2019) show that BERT can “guess” the affordances and properties of
many objects, but cannot reason about the relationship between
properties and affordances. For example, it “knows” that people can walk
into houses, and that houses are big, but it cannot infer that houses
are bigger than people. Zhou et al. (2020) and Richardson and Sabharwal
(2019) also show that the performance drops with the number of necessary
inference steps. Some of BERT’s world knowledge success comes from
learning stereotypical associations (Poerner et al., 2019), for example,
a person with an Italian-sounding name is predicted to be Italian, even
when it is incorrect.

3.4 Limitations

Multiple probing studies in section 3 and section 4 report that BERT
possesses a surprising amount of syntactic, semantic, and world
knowledge. However, Tenney et al. (2019a) remark, “the fact that a
linguistic pattern is not observed by our probing classifier does not
guarantee that it is not there, and the observation of a pattern does
not tell us how it is used.” There is also the issue of how complex a
probe should be allowed to be (Liu et al., 2019a). If a more complex
probe recovers more information, to what extent are we still relying on
the original model?

Furthermore, different probing methods may lead to complementary or even
contradictory conclusions, which makes a single test (as in most
studies) insufficient (Warstadt et al., 2019). A given method might also
favor one model over another, for example, RoBERTa trails BERT with one
tree extraction method, but leads with another (Htut et al., 2019). The
choice of linguistic formalism also matters (Kuznetsov and Gurevych,
2020).

[...]

--8<---------------cut here---------------end--------------->8---

Nel 2020 i ricercatori citati sopra sostenevano che BERT non è in grado
di ragionare.

Sullo stesso numero della rivista è pubblicato anche questo articolo:

«What BERT Is Not: Lessons from a New Suite of Psycholinguistic
Diagnostics for Language Models»
by Allyson Ettinger

https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00298/43535/What-BERT-Is-Not-Lessons-from-a-New-Suite-of

cito dalle conclusioni:

--8<---------------cut here---------------start------------->8---

In this paper we have introduced a suite of diagnostic tests for
language models to better our understanding of the linguistic
competencies acquired by pre-training via language modeling. We draw our
tests from psycholinguistic studies, allowing us to target a range of
linguistic capacities by testing word prediction accuracies and
sensitivity of model probabilities to linguistic distinctions. As a case
study, we apply these tests to analyze strengths and weaknesses of the
popular BERT model, finding that it shows sensitivity to role reversal
and same-category distinctions, albeit less than humans, and it succeeds
with noun hypernyms, but it struggles with challenging inferences and
role-based event prediction—and it shows clear failures with the meaning
of negation. We make all test sets and experiment code available (see
Footnote 1), for further experiments.

The capacities targeted by these test sets are by no means
comprehensive, and future work can build on the foundation of these
datasets to expand to other aspects of language processing. Because
these sets are small, we must also be conservative in the strength of
our conclusions—different formulations may yield different performance,
and future work can expand to verify the generality of these results. In
parallel, we hope that the weaknesses highlighted by these diagnostics
can help to identify areas of need for establishing robust and
generalizable models for language understanding.

--8<---------------cut here---------------end--------------->8---

Allora ripeto la mia domanda: ci sono nuovi studi che dimostrino che le
limitazioni evidenziate nei test sulla competenza logico/linguistica di
BERT siano stati risolti da altri LLM?

Perché credo che sulla _perfomance_ sintattica degli LLM nessuno abbia
proprio nulla da ridire.

Saluti, 380°

[...]

[1] Message-id: 87o7ux2i16....@xelera.eu
https://server-nexa.polito.it/pipermail/nexa/2022-September/049508.html


-- 
380° (Giovanni Biscuolo public alter ego)

«Noi, incompetenti come siamo,
 non abbiamo alcun titolo per suggerire alcunché»

Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa

Reply via email to