FYI, https://issues.apache.org/jira/browse/SOLR-16682 MLT component: SyntaxError: Cannot parse has been committed. It should be released at 9.2
On Fri, Dec 2, 2022 at 7:00 PM Wu, Hansen [USA] <wu_han...@bah.com> wrote: > Hi Alessandro and Mikhail, > Thanks for your tips first. > I discussed with my team about escaping : characters. They don't agree > with the ideas. They think we should NOT make any changes to the original > data. They think, the problem is in solr's MoreLikeThis functionality, > which fails to break subj:cedation into 2 terms. What do you guys think? > Thanks for your further helps. > > FYI > In our schema.xml, search field is defined as follows: > <field name="search" type="text_en" indexed="true" stored="true" > multiValued="true" termVectors="true"/> > ...... > <copyField source="original_text" dest="search" /> > <copyField source="title" dest="search" /> > ..... > <fieldType name="text_en" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_en.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_en.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPossessiveFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > </fieldType> > > > --H. Wu > Email: wu_han...@bah.com > Office: 703-995-3027 > Cell: 703-732-1612 > > -----Original Message----- > From: Alessandro Benedetti <a.benede...@sease.io> > Sent: Thursday, November 24, 2022 1:00 PM > To: wu_han...@bah.com.invalid > Cc: users@solr.apache.org > Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions > > I agree with Mikhail, escaping should solve the problem! > If after that it has to do with More Like This, please let us know and > I'll be happy to take a look at it (been working in the last few years > quite extensively on the More Like This feature) > > Cheers > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io < > https://urldefense.com/v3/__http://sease.io/__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ3VvWNFP$ > > LinkedIn < > https://urldefense.com/v3/__https://linkedin.com/company/sease-ltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ7eQtI1I$ > > | Twitter < > https://urldefense.com/v3/__https://twitter.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJwY_bkV_$ > > | Youtube < > https://urldefense.com/v3/__https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ5S1DZrh$ > > | Github < > https://urldefense.com/v3/__https://github.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ9Y9nRLc$ > > > > > On Thu, 24 Nov 2022 at 16:16, Wu, Hansen [USA] <wu_han...@bah.com.invalid> > wrote: > > > Hi Mikhail, > > Thank you for your suggestions. > > Happy Thanksgiving Day! > > --H.Wu > > > > -----Original Message----- > > From: Mikhail Khludnev <m...@apache.org> > > Sent: Thursday, November 24, 2022 2:33 AM > > To: users@solr.apache.org > > Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions > > > > For me it doesn't seem like MoreLikeThis, but it can fixed with > > escaping '\', I believe search:subj\:cedation > > > > On Wed, Nov 23, 2022 at 11:08 PM Wu, Hansen [USA] > > <wu_han...@bah.com.invalid> > > wrote: > > > > > Hi Solr folks, > > > Below is the info related to the bug I reported. Thanks for taking a > > look. > > > Happy holidays! > > > --H.Wu > > > > > > ---------- ERROR MESSAGE: ---------- Failed to search for document > > > id: 123456/2022080916.txt Error from server at > > > > > https://urldefense.com/v3/__http://xxx.xxx.xxx:8983/solr/documents__;! > > !May37g!ICxqmf0VKbK0xEC4nt4o6-ZmdyXpWjOONYBV2QFRU5dwU2kdooGoF3vCgcudXy > > IMNcszDC99no0$ > > : Error from server at null: > > > org.apache.solr.search.SyntaxError: Cannot parse '+(search:item > > > search:date search:dte search:report search:subj:cedation > > > search:0154 search:20220808) > > > -id:123456/2022080916.txt': Encountered " ":" ": "" at line 1, column > 62. > > > Was expecting one of: > > > <AND> ... > > > <OR> ... > > > <NOT> ... > > > "+" ... > > > "-" ... > > > ...... > > > at org.yyy.data.service.solr.proxy.SolrProxyService.query > > > (ssssss.java:90) at ...... (******.java:51) at ...... > > > (aaaaaa.java:163) > > > > > > > > > ---------- SOLR FIELD DATA: ---------- > > > "search":["......\nsubj:cedation report date 20220809\n > > > ......\n > > > <contains many lines in the format of name:value>\n > > > ...... > > > "], > > > "cedation report date 20220809"] > > > > > > > > > > > > -----Original Message----- > > > From: Mikhail Khludnev <m...@apache.org> > > > Sent: Wednesday, November 23, 2022 7:40 AM > > > To: users@solr.apache.org > > > Subject: [External] Re: Seeking tips about MoreLikeThis exceptions > > > > > > Hello, > > > I can only speculate that some doc ids has a colon in value. In this > > > can you need to either escape it with backslash or use more explicit > > > syntax described in Solr Local Params topic. > > > But I might be wrong. Can you provide a problem query and/or > stacktrace? > > > > > > On Tue, Nov 22, 2022 at 9:09 PM Wu, Hansen [USA] > > > <wu_han...@bah.com.invalid > > > > > > > wrote: > > > > > > > Hello Solr folks, > > > > I have a Web Service app, which uses CloudSolrClient to connect to > > > > a solr server to query for similar documents against field search > > > > in a collection through MoreLikeThis function by given docId. The > > > > content in the search field is copied from original text of a > document. > > > > The Web Service app works fine with most documents. But whenever a > > > > document contains colon, e.g. "subj:cedation", it throws out > > > > exceptions, complaining about ":". Have you folks ever seen such > > > > problems? Any tips to resolve the issue? > > > > > > > > In my dev env, I have a newer version of solr installed, which > > > > uses HttpSolrClient to do the same query. The problem didn't > > > > happen. I was wondering whether the issues only happen with old > > > > versions of solr or > > > not. > > > > > > > > I wish to get ideas from solr community. > > > > Thanks a lot in advance. > > > > > > > > Hansen Wu > > > > Associate | Lead Data Scientist > > > > Booz | Allen | Hamilton > > > > > > > > Email: wu_han...@bah.com<mailto:wu_han...@bah.com> > > > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!