FYI,
https://issues.apache.org/jira/browse/SOLR-16682 MLT component:
SyntaxError: Cannot parse
 has been committed. It should be released at 9.2

On Fri, Dec 2, 2022 at 7:00 PM Wu, Hansen [USA] <wu_han...@bah.com> wrote:

> Hi Alessandro and Mikhail,
> Thanks for your tips first.
> I discussed with my team about escaping : characters. They don't agree
> with the ideas. They think we should NOT make any changes to the original
> data. They think, the problem is in solr's MoreLikeThis functionality,
> which fails to break subj:cedation into 2 terms. What do you guys think?
> Thanks for your further helps.
>
> FYI
> In our schema.xml, search field is defined as follows:
> <field name="search" type="text_en" indexed="true" stored="true"
> multiValued="true" termVectors="true"/>
> ......
> <copyField source="original_text" dest="search" />
> <copyField source="title" dest="search" />
> .....
> <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_en.txt"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>                 <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <filter class="solr.PorterStemFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_en.txt"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>                 <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <filter class="solr.PorterStemFilterFactory"/>
>         </analyzer>
> </fieldType>
>
>
> --H. Wu
> Email: wu_han...@bah.com
> Office: 703-995-3027
> Cell: 703-732-1612
>
> -----Original Message-----
> From: Alessandro Benedetti <a.benede...@sease.io>
> Sent: Thursday, November 24, 2022 1:00 PM
> To: wu_han...@bah.com.invalid
> Cc: users@solr.apache.org
> Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions
>
> I agree with Mikhail, escaping should solve the problem!
> If after that it has to do with More Like This, please let us know and
> I'll be happy to take a look at it (been working in the last few years
> quite extensively on the More Like This feature)
>
> Cheers
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <
> https://urldefense.com/v3/__http://sease.io/__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ3VvWNFP$
> > LinkedIn <
> https://urldefense.com/v3/__https://linkedin.com/company/sease-ltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ7eQtI1I$
> > | Twitter <
> https://urldefense.com/v3/__https://twitter.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJwY_bkV_$
> > | Youtube <
> https://urldefense.com/v3/__https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ5S1DZrh$
> > | Github <
> https://urldefense.com/v3/__https://github.com/seaseltd__;!!May37g!N__68yFgO9BtSJbveA06P4n95C3mfFiizqZguIKbkfe4OS1z09C_1Kqa12n6pPrFth3s1WRWpu4sJ9Y9nRLc$
> >
>
>
> On Thu, 24 Nov 2022 at 16:16, Wu, Hansen [USA] <wu_han...@bah.com.invalid>
> wrote:
>
> > Hi Mikhail,
> > Thank you for your suggestions.
> > Happy Thanksgiving Day!
> > --H.Wu
> >
> > -----Original Message-----
> > From: Mikhail Khludnev <m...@apache.org>
> > Sent: Thursday, November 24, 2022 2:33 AM
> > To: users@solr.apache.org
> > Subject: Re: [External] Re: Seeking tips about MoreLikeThis exceptions
> >
> > For me it doesn't seem like MoreLikeThis, but it can fixed with
> > escaping '\', I believe  search:subj\:cedation
> >
> > On Wed, Nov 23, 2022 at 11:08 PM Wu, Hansen [USA]
> > <wu_han...@bah.com.invalid>
> > wrote:
> >
> > > Hi Solr folks,
> > > Below is the info related to the bug I reported. Thanks for taking a
> > look.
> > > Happy holidays!
> > > --H.Wu
> > >
> > > ---------- ERROR MESSAGE: ---------- Failed to search for document
> > > id: 123456/2022080916.txt Error from server at
> > >
> > https://urldefense.com/v3/__http://xxx.xxx.xxx:8983/solr/documents__;!
> > !May37g!ICxqmf0VKbK0xEC4nt4o6-ZmdyXpWjOONYBV2QFRU5dwU2kdooGoF3vCgcudXy
> > IMNcszDC99no0$
> > : Error from server at null:
> > > org.apache.solr.search.SyntaxError: Cannot parse '+(search:item
> > > search:date search:dte search:report search:subj:cedation
> > > search:0154 search:20220808)
> > > -id:123456/2022080916.txt': Encountered " ":" ": "" at line 1, column
> 62.
> > > Was expecting one of:
> > >   <AND> ...
> > >   <OR> ...
> > >   <NOT> ...
> > >   "+" ...
> > >   "-" ...
> > >   ......
> > > at org.yyy.data.service.solr.proxy.SolrProxyService.query
> > > (ssssss.java:90) at ...... (******.java:51) at ......
> > > (aaaaaa.java:163)
> > >
> > >
> > > ---------- SOLR FIELD DATA: ----------
> > > "search":["......\nsubj:cedation report date 20220809\n
> > >     ......\n
> > >     <contains many lines in the format of name:value>\n
> > >     ......
> > > "],
> > > "cedation report date 20220809"]
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Mikhail Khludnev <m...@apache.org>
> > > Sent: Wednesday, November 23, 2022 7:40 AM
> > > To: users@solr.apache.org
> > > Subject: [External] Re: Seeking tips about MoreLikeThis exceptions
> > >
> > > Hello,
> > > I can only speculate that some doc ids has a colon in value. In this
> > > can you need to either escape it with backslash or use more explicit
> > > syntax described in Solr Local Params topic.
> > > But I might be wrong. Can you provide a problem query and/or
> stacktrace?
> > >
> > > On Tue, Nov 22, 2022 at 9:09 PM Wu, Hansen [USA]
> > > <wu_han...@bah.com.invalid
> > > >
> > > wrote:
> > >
> > > > Hello Solr folks,
> > > > I have a Web Service app, which uses CloudSolrClient to connect to
> > > > a solr server to query for similar documents against field search
> > > > in a collection through MoreLikeThis function by given docId. The
> > > > content in the search field is copied from original text of a
> document.
> > > > The Web Service app works fine with most documents. But whenever a
> > > > document contains colon, e.g. "subj:cedation", it throws out
> > > > exceptions, complaining about ":". Have you folks ever seen such
> > > > problems? Any tips to resolve  the issue?
> > > >
> > > > In my dev env, I have a newer version of solr installed, which
> > > > uses HttpSolrClient to do the same query. The problem didn't
> > > > happen. I was wondering whether the issues only happen with old
> > > > versions of solr or
> > > not.
> > > >
> > > > I wish to get ideas from solr community.
> > > > Thanks a lot in advance.
> > > >
> > > > Hansen Wu
> > > > Associate | Lead Data Scientist
> > > > Booz | Allen | Hamilton
> > > >
> > > > Email: wu_han...@bah.com<mailto:wu_han...@bah.com>
> > > >
> > > >
> > > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Reply via email to