Hi Jan, thanks a lot! I tried "typeName" or "fieldName" and none worked either, so you helped me realize I was missing something beyond HTMLStripFieldUpdateProcessorFactory.
This made me read this guide <https://solr.apache.org/guide/solr/9_4/configuration-guide/update-request-processors.html#update-request-processor-configuration> with fresh eyes and learn a crucial bit, which goes: WARNING RunUpdateProcessorFactory Do not forget to add RunUpdateProcessorFactory at the end of any chains you define in solrconfig.xml. Otherwise update requests processed by that chain will not actually affect the indexed data. So all 3 methods worked like so: <updateRequestProcessorChain> <processor class="solr.HTMLStripFieldUpdateProcessorFactory"> <str name="typeName">text_pt</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> Thank you for the push! Regards, Gino Em ter., 20 de fev. de 2024 às 11:24, Jan Høydahl <jan....@cominvent.com> escreveu: > I have never used the "typeClass" option. Have you tried with "typeName" > or "fieldName" as an alternative? > > Jan > > > 20. feb. 2024 kl. 13:44 skrev Gino Rodrigues <ginorodrig...@gmail.com>: > > > > O left a small mismatch on the field type, the fields I am trying to > clean > > are all “text_general“ (class solr.TextField) > > > > Em ter., 20 de fev. de 2024 às 09:38, Gino Rodrigues < > > ginorodrig...@gmail.com> escreveu: > > > >> Hello everyone, > >> > >> I am trying to clean source fields from HTML markup before indexing, > using > >> an Update Request Processor. > >> > >> But no variation I try seems to work, and HTML markup is still being > >> indexed. > >> > >> Would anyone have an idea about it? > >> > >> Thanks in advance! > >> > >> *indexing command* > >> curl -X POST -H "Content-Type: application/csv" --data-binary > @myfile.csv " > >> http://localhost:8983/solr/mycore/update?commit=true" > >> > >> *managed-schema.xml* > >> <fieldType name="text_general" class="solr.TextField" > positionIncrementGap > >> ="100" multiValued="true"> > >> <analyzer type="index"> > >> <tokenizer name="standard"/> > >> <filter words="stopwords.txt" ignoreCase="true" name="stop"/> > >> <filter name="lowercase"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer name="standard"/> > >> <filter words="stopwords.txt" ignoreCase="true" name="stop"/> > >> <filter name="synonymGraph" synonyms="synonyms.txt" ignoreCase="true" > >> expand="true"/> > >> <filter name="lowercase"/> > >> </analyzer> > >> </fieldType> > >> <field name="body" type="text_pt" indexed="true" stored="true"/> > >> <copyField source="body" dest="catchall"/> > >> > >> *solrconfig.xml* > >> <updateRequestProcessorChain> > >> <processor class="solr.HTMLStripFieldUpdateProcessorFactory"> > >> <str name="typeClass">solr.TextField</str> > >> </processor> > >> </updateRequestProcessorChain> > >> > >> References > >> > >> > https://solr.apache.org/guide/solr/9_4/configuration-guide/update-request-processors.html > >> > >> > https://solr.apache.org/docs/9_4_1/core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html > >> > >> > https://solr.apache.org/docs/9_4_1/core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html > >> > >