Re: Seeking tips about MoreLikeThis exceptions

2022-12-03 Thread Mikhail Khludnev
Hi,
It's a bug in MLTComponent, it can't handle field values with colon, etc.
As a workaround we can use [subquery] result transformer with mlt query
parser.
For example, in Solr admin  UI
I added fl param as fl=workaround:[subquery], then in raw parameters:
workaround.q={!mlt qf=manu v=$row.id mintf=0 mindf=0}
This query adds subquery result into every doc like:
"workaround":{"numFound":2,"start":0,"numFoundExact":true,"docs":[ { "id":
"VS1GB400C3",

On Tue, Nov 22, 2022 at 9:09 PM Wu, Hansen [USA] 
wrote:

> Hello Solr folks,
> I have a Web Service app, which uses CloudSolrClient to connect to a solr
> server to query for similar documents against field search in a collection
> through MoreLikeThis function by given docId. The content in the search
> field is copied from original text of a document.
> The Web Service app works fine with most documents. But whenever a
> document contains colon, e.g. "subj:cedation", it throws out exceptions,
> complaining about ":". Have you folks ever seen such problems? Any tips to
> resolve  the issue?
>
> In my dev env, I have a newer version of solr installed, which uses
> HttpSolrClient to do the same query. The problem didn't happen. I was
> wondering whether the issues only happen with old versions of solr or not.
>
> I wish to get ideas from solr community.
> Thanks a lot in advance.
>
> Hansen Wu
> Associate | Lead Data Scientist
> Booz | Allen | Hamilton
>
> Email: wu_han...@bah.com
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: SOLR adding ,​ to strings erroneously

2022-12-03 Thread dmitri maziuk

On 2022-12-02 7:41 PM, Shawn Heisey wrote:

I'm curious as to why those entities are displaying as text instead of 
being interpreted by the browser as a zero-width space.


I am curious as to why Matthew and I are apparently the only people 
seeing it.


Dima



Re: SOLR adding ,​ to strings erroneously

2022-12-03 Thread Shawn Heisey

On 12/3/22 10:38, dmitri maziuk wrote:

On 2022-12-02 7:41 PM, Shawn Heisey wrote:

I'm curious as to why those entities are displaying as text instead 
of being interpreted by the browser as a zero-width space.


I am curious as to why Matthew and I are apparently the only people 
seeing it.


I see it on my install, 9.2.0-SNAPSHOT compiled 2022/11/30, and it was 
also happening on a version compiled a few days earlier.  I have no idea 
when it first started happening.  I tend to glance at the logs every now 
and then, and only look closer at logs that pertain to whatever I am 
working on at that moment.  And I use solr.log a lot more than the 
logging tab in the UI ... this problem does not occur in the actual logfile.


Thanks,
Shawn



Re: CVE-2022-40153 com.fasterxml.woodstox_woodstox-core

2022-12-03 Thread Gus Heck
Hi Billy,

Thanks for bringing this up. The CVE you link is rejected (
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-40153). However
reading through the report here:
https://github.com/x-stream/xstream/issues/304 it seems that this was part
of a series of low quality auto generated CVE reports and 4/6 of them were
rejected, but annoyingly NVD only reflects the rejected status for 3 out of
4, having missed it for the one you linked. In any case,
https://nvd.nist.gov/vuln/detail/CVE-2022-40152 did eventually stick to
woodstox after initially being reported against x-stream and can be fixed
by an upgrade to woodstox 6.4. Main branch is on 6.3.1 presently and Solr
will receive this upgrade to 6.4 as part of the Caffeine Cache upgrade, so
you can follow https://issues.apache.org/jira/browse/SOLR-16562 (I have
added a comment so, hopefully it at least shows up in searches for the
correct CVE soon).

Sorry the response took so long, For my part I missed the first mail you
sent. It's not my job any more than anyone else on the PMC to respond, but
I do appreciate the way you have been following our requested process on
the security page which I helped revise. Once I saw your second mail, I
initiated a small private list discussion to try to ensure a coherent
response since it didn't seem to have been addressed previously. It doesn't
look like there is much risk from this one since it's at most a DOS and
would only be encountered by users that are using text tagging
functionality since that is the only place we use this library directly. I
also see it as a transitive dependency in some of the s3 related code, but
is not directly used by us in that module. While there is processing of
external data in this path, this would generally be indexing related, which
makes a DOS a bit difficult to achieve. This is based on initial quick
look/discussion, but there has been no serious attempt to
find/verify/exclude an exploit since it's getting fixed soon as part of
other work, so YMMV.

-Gus

On Tue, Nov 29, 2022 at 8:30 AM Billy Kidwell 
wrote:

> https://nvd.nist.gov/vuln/detail/CVE-2022-40153
>
> Our container scan found a potential security vulnerability in Solr 9.0.0
> and 9.1.0 for woodstox-core.
>
> I checked the security page, the official list of non-exploitable
> vulnerabilities and the user mailing list.  I also checked jira.  There are
> a number of tickets concerning woodstox, but they seem to be prior issues.
>
> For 9.1.0, the package version seems to be 6.2.8
>
> /solr/server/solr-webapp/webapp/WEB-INF/lib/woodstox-core-6.2.8.jar
>
> This vulnerability is addressed in 6.4.0.
>
> Does anyone know if this vulnerability is exploitable in Solr?
> If so, under what circumstances?
>
> Thanks,
>
> Bill
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)