Thanks.
Will test to verify the fix.

From: Mikhail Khludnev <m...@apache.org>
Sent: Wednesday, March 1, 2023 1:22 AM
To: users@solr.apache.org
Cc: a.benede...@sease.io; Wu, Hansen [USA] <wu_han...@bah.com>
Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions

FYI,
https://issues.apache.org/jira/browse/SOLR-16682<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/SOLR-16682__;!!May37g!P5M9d9ZIZ9lwor50PuT7Ce2a2a3-rcn2jzLrjs_DHenuhXMgoEHhZmSxjP5Dcd6-WM3c_zORYYk$>
 MLT component: SyntaxError: Cannot parse
 has been committed. It should be released at 9.2

On Fri, Dec 2, 2022 at 7:00 PM Wu, Hansen [USA] 
<wu_han...@bah.com<mailto:wu_han...@bah.com>> wrote:
Hi Alessandro and Mikhail,
Thanks for your tips first.
I discussed with my team about escaping : characters. They don't agree with the 
ideas. They think we should NOT make any changes to the original data. They 
think, the problem is in solr's MoreLikeThis functionality, which fails to 
break subj:cedation into 2 terms. What do you guys think?
Thanks for your further helps.

FYI
In our schema.xml, search field is defined as follows:
<field name="search" type="text_en" indexed="true" stored="true" 
multiValued="true" termVectors="true"/>
......
<copyField source="original_text" dest="search" />
<copyField source="title" dest="search" />
.....
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_en.txt"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPossessiveFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
                <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_en.txt"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPossessiveFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
                <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
</fieldType>


--H. Wu
Email: wu_han...@bah.com<mailto:wu_han...@bah.com>
Office: 703-995-3027
Cell: 703-732-1612

-----Original Message-----
From: Alessandro Benedetti <a.benede...@sease.io<mailto:a.benede...@sease.io>>
Sent: Thursday, November 24, 2022 1:00 PM
To: wu_han...@bah.com.invalid<mailto:wu_han...@bah.com.invalid>
Cc: users@solr.apache.org<mailto:users@solr.apache.org>
Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions

I agree with Mikhail, escaping should solve the problem!
If after that it has to do with More Like This, please let us know and I'll be 
happy to take a look at it (been working in the last few years quite 
extensively on the More Like This feature)

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io<mailto:a.benede...@sease.io>


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
<https://urldefense.com/v3/__http://sease.io/__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ3VvWNFP$<https://urldefense.com/v3/__http:/sease.io/__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ3VvWNFP$>
 > LinkedIn 
<https://urldefense.com/v3/__https://linkedin.com/company/sease-ltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ7eQtI1I$<https://urldefense.com/v3/__https:/linkedin.com/company/sease-ltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ7eQtI1I$>
 > | Twitter 
<https://urldefense.com/v3/__https://twitter.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJwY_bkV_$<https://urldefense.com/v3/__https:/twitter.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJwY_bkV_$>
 > | Youtube 
<https://urldefense.com/v3/__https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ5S1DZrh$<https://urldefense.com/v3/__https:/www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ5S1DZrh$>
 > | Github 
<https://urldefense.com/v3/__https://github.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ9Y9nRLc$<https://urldefense.com/v3/__https:/github.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ9Y9nRLc$>
 >


On Thu, 24 Nov 2022 at 16:16, Wu, Hansen [USA] 
<wu_han...@bah.com.invalid<mailto:wu_han...@bah.com.invalid>>
wrote:

> Hi Mikhail,
> Thank you for your suggestions.
> Happy Thanksgiving Day!
> --H.Wu
>
> -----Original Message-----
> From: Mikhail Khludnev <m...@apache.org<mailto:m...@apache.org>>
> Sent: Thursday, November 24, 2022 2:33 AM
> To: users@solr.apache.org<mailto:users@solr.apache.org>
> Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions
>
> For me it doesn't seem like MoreLikeThis, but it can fixed with
> escaping '\', I believe  search:subj\:cedation
>
> On Wed, Nov 23, 2022 at 11:08 PM Wu, Hansen [USA]
> <wu_han...@bah.com.invalid<mailto:wu_han...@bah.com.invalid>>
> wrote:
>
> > Hi Solr folks,
> > Below is the info related to the bug I reported. Thanks for taking a
> look.
> > Happy holidays!
> > --H.Wu
> >
> > ---------- ERROR MESSAGE: ---------- Failed to search for document
> > id: 123456/2022080916.txt Error from server at
> >
> https://urldefense.com/v3/__http://xxx.xxx.xxx:8983/solr/documents__<https://urldefense.com/v3/__http:/xxx.xxx.xxx:8983/solr/documents__>;!
> !May37g!ICxqmf0VKbK0xEC4nt4o6-ZmdyXpWjOONYBV2QFRU5dwU2kdooGoF3vCgcudXy
> IMNcszDC99no0$
> : Error from server at null:
> > org.apache.solr.search.SyntaxError: Cannot parse '+(search:item
> > search:date search:dte search:report search:subj:cedation
> > search:0154 search:20220808)
> > -id:123456/2022080916.txt': Encountered " ":" ": "" at line 1, column 62.
> > Was expecting one of:
> >   <AND> ...
> >   <OR> ...
> >   <NOT> ...
> >   "+" ...
> >   "-" ...
> >   ......
> > at org.yyy.data.service.solr.proxy.SolrProxyService.query
> > (ssssss.java:90) at ...... (******.java:51) at ......
> > (aaaaaa.java:163)
> >
> >
> > ---------- SOLR FIELD DATA: ----------
> > "search":["......\nsubj:cedation report date 20220809\n
> >     ......\n
> >     <contains many lines in the format of name:value>\n
> >     ......
> > "],
> > "cedation report date 20220809"]
> >
> >
> >
> > -----Original Message-----
> > From: Mikhail Khludnev <m...@apache.org<mailto:m...@apache.org>>
> > Sent: Wednesday, November 23, 2022 7:40 AM
> > To: users@solr.apache.org<mailto:users@solr.apache.org>
> > Subject: [External] Re: Seeking tips about MoreLikeThis exceptions
> >
> > Hello,
> > I can only speculate that some doc ids has a colon in value. In this
> > can you need to either escape it with backslash or use more explicit
> > syntax described in Solr Local Params topic.
> > But I might be wrong. Can you provide a problem query and/or stacktrace?
> >
> > On Tue, Nov 22, 2022 at 9:09 PM Wu, Hansen [USA]
> > <wu_han...@bah.com.invalid
<mailto:wu_han...@bah.com.invalid%0b>> > >
> > wrote:
> >
> > > Hello Solr folks,
> > > I have a Web Service app, which uses CloudSolrClient to connect to
> > > a solr server to query for similar documents against field search
> > > in a collection through MoreLikeThis function by given docId. The
> > > content in the search field is copied from original text of a document.
> > > The Web Service app works fine with most documents. But whenever a
> > > document contains colon, e.g. "subj:cedation", it throws out
> > > exceptions, complaining about ":". Have you folks ever seen such
> > > problems? Any tips to resolve  the issue?
> > >
> > > In my dev env, I have a newer version of solr installed, which
> > > uses HttpSolrClient to do the same query. The problem didn't
> > > happen. I was wondering whether the issues only happen with old
> > > versions of solr or
> > not.
> > >
> > > I wish to get ideas from solr community.
> > > Thanks a lot in advance.
> > >
> > > Hansen Wu
> > > Associate | Lead Data Scientist
> > > Booz | Allen | Hamilton
> > >
> > > Email: 
> > > wu_han...@bah.com<mailto:wu_han...@bah.com><mailto:wu_han...@bah.com<mailto:wu_han...@bah.com>>
> > >
> > >
> > >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


--
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH<https://urldefense.com/v3/__https:/t.me/MUST_SEARCH__;!!May37g!P5M9d9ZIZ9lwor50PuT7Ce2a2a3-rcn2jzLrjs_DHenuhXMgoEHhZmSxjP5Dcd6-WM3cOK9lCLk$>
A caveat: Cyrillic!

Reply via email to