[Corpora-List] PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing

François Portet Mon, 22 Aug 2022 01:20:27 -0700

PhD in ML/NLP – Efficient, Fair, robust and knowledge informedself-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)

Application deadline: September 5th, 2022

Interviews (tentative): September 19th, 2022

Salary: ~2000€ gross/month (social security included)


Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, natural language processing,self-supervised learning, knowledge informed learning, Robustness,fairness
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning forInclusive and Innovative Speech Technologies) will start on November1st 2022. Self-supervised learning (SSL) has recently emerged as oneof the most promising artificial intelligence (AI) methods as itbecomes now feasible to take advantage of the colossal amounts ofexisting unlabeled data to significantly improve the performances ofvarious speech processing tasks.
*PROJECT OBJECTIVES*
Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shownan impressive impact on downstream tasks performance. This is mainlydue to their ability to benefit from a large amount of data at thecost of a tremendous carbon footprint rather than improving theefficiency of the learning. Another question related to SSL models istheir unpredictable results once applied to realistic scenarios whichexhibit their lack of robustness. Furthermore, as for any pre-trainedmodels applied in society, it isimportant to be able to measure thebias of such models since they can augment social unfairness.
The goals of this PhD position are threefold:

- to design new evaluation metrics for SSL of speech models ;

- to develop knowledge-driven SSL algorithms ;

- to propose methods for learning robust and unbiased representations.
SSL models are evaluated with downstream task-dependent metrics e.g.,word error rate for speech recognition. This couple the evaluation ofthe universality of SSL representations to a potentially biased andcostly fine-tuning that also hides the efficiencyinformation relatedto the pre-training cost. In practice, we will seek to measure thetraining efficiency as the ratio between the amount of data,computation and memory needed to observe a certain gain in terms ofperformance on a metric of interest i.e.,downstream dependent or not.The first step will be to document standard markers that can be usedas robust measurements to assess these values robustly at trainingtime. Potential candidates are, for instance, floating pointoperations for computational intensity, number of neural parameterscoupled with precision for storage, online measurement of memoryconsumption for training and cumulative input sequence length for data.
Most state-of-the-art SSL models for speech rely onmasked predictione.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Suchprevalence in the literature is mostly linked to the size, amount ofdata and computational resources injected by thecompany producingthese models. In fact, vanilla masking approaches and contrastivelosses may be identified as uninformed solutions as they do notbenefit from in-domain expertise. For instance, it has beendemonstrated that blindly masking frames in theinput signal i.e.HuBERT and WavLM results in much worse downstream performance thanapplying unsupervised phonetic boundaries [Yue2021] to generateinformed masks. Recently some studies have demonstrated thesuperiority of an informed multitask learning strategy carefullyselecting self-supervised pretext-tasks with respect to a set ofdownstream tasks, over the vanilla wav2vec 2.0 contrastive learningloss [Zaiem2022]. In this PhD project, our objective is: 1. continueto develop knowledge-driven SSL algorithms reaching higher efficiencyratios and results at the convergence, data consumption and downstreamperformance levels; and 2. scale these novel approaches to a pointenabling the comparison with current state-of-the-art systems andtherefore motivating a paradigm change in SSL for the wider speechcommunity.
Despite remarkable performance on academic benchmarks, SSL poweredtechnologies e.g. speech and speaker recognition, speech synthesis andmany others may exhibit highly unpredictable results once applied torealistic scenarios. This can translate into a global accuracy dropdue to a lack of robustness to adversarial acoustic conditions, orbiased and discriminatory behaviors with respect to different pools ofend users. Documenting and facilitating the control of such aspectsprior to the deployment of SSL models into the real-life is necessaryfor the industrial market. To evaluate such aspects, within theproject, we will create novel robustness regularization and debasingtechniques along two axes: 1. debasing and regularizing speechrepresentations at the SSL level; 2. debasing and regularizingdownstream-adapted models (e.g. using a pre-trained model).
To ensure the creation of fair and robust SSL pre-trained models, wepropose to act both at the optimization and data levels following someof our previous work on adversarial protected attributedisentanglement and the NLP literature on data sampling andaugmentation [Noé2021]. Here, we wish to extend this technique to morecomplex SSL architectures and more realistic conditions by increasingthe disentanglement complexity i.e. the sex attribute studied in[Noé2021] is particularly discriminatory. Then, and to benefit fromthe expert knowledge induced by the scope of the task of interest, wewill build on a recent introduction of task-dependent counterfactualequal odds criteria [Sari2021] to minimize the downstream performancegap observed in between different individuals of certain protectedattributes and to maximize the overall accuracy. Following thismulti-objective optimization scheme, we will then inject furtheridentified constraints as inspired by previous NLP work [Zhao2017].Intuitively, constraints are injected so the predictions arecalibrated towards a desired distribution i.e. unbiased.
*SKILLS*

 *

    Master 2 in Natural Language Processing, Speech Processing,
    computer science or data science.

 *

    Good mastering of  Python programming and  deep learning framework.

 *

    Previous in Self-Supervised Learning, acoustic modeling or ASR
    would be a plus

 *

    Very good communication skills in English

 *

    Good command of French would be a plus but is not mandatory

*SCIENTIFIC ENVIRONMENT*
The thesis will be conducted within the Getalp teams of the LIGlaboratory (_https://lig-getalp.imag.fr/_<https://lig-getalp.imag.fr/>) and the LIA laboratory(https://lia.univ-avignon.fr/). The GETALP team and the LIA have astrong expertise and track record in Natural Language Processing andspeech processing. The recruited person will be welcomed within theteams which offer a stimulating, multinational and pleasant workingenvironment.
The means to carry out the PhD will be providedboth in terms ofmissions in France and abroad and in terms of equipment. The candidatewill have access to the cluster of GPUs of both the LIG and LIA.Furthermore, access to the National supercomputer Jean-Zay will enableto run large scale experiments.
The PhD position will be co-supervised by Mickael Rouvier (LIA,Avignon) and Benjamin Lecouteux and François Portet (UniversitéGrenoble Alpes). Joint meetings are planned on a regular basis and thestudent is expected to spend time in both places. Moreover, the PhDstudent will collaborate with several team members involved in theproject in particular the two other PhD candidates who will berecruited and the partners from LIA, LIG and Dauphine Université PSL,Paris. Furthermore, the project will involve one of the founders ofSpeechBrain, Titouan Parcollet with whom the candidate will interactclosely.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + masternotes + be ready to provide letter(s) of recommendation; and beaddressed to Mickael Rouvier ([email protected]_<mailto:[email protected]>), BenjaminLecouteux([email protected]) and FrançoisPortet ([email protected]_ <mailto:[email protected]>).We celebrate diversity and are committed to creating an inclusiveenvironment for all employees.
*REFERENCES:*
[Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T.,Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of SpeakerRepresentation for Attribute-Driven Privacy Preservation in Proc.Interspeech 2021 (2021), 1902–1906.
[Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D.Counterfactually Fair Automatic Speech Recognition. IEEE/ACMTransactions on Audio, Speech, and Language Processing 29, 3515–3525(2021)
[Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-SupervisedSpeech Representation Learning in Proc. Interspeech 2021 (2021), 746–750.
[Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext TasksSelection for Multitask Self-Supervised Speech Representation in AAAI,The 2nd Workshop on Self-supervised Learning for Audio and SpeechProcessing, 2023 (2022).
[Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. -W. Men Also Like Shopping: Reducing Gender Bias Amplification usingCorpus-level Constraints in Proceedings of the 2017 Conference onEmpirical Methods in Natural Language Processing (2017), 2979–2989.

--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE

Phone:  +33 (0)4 57 42 15 44
Email:[email protected]
www:http://membres-liglab.imag.fr/portet/

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing

Reply via email to