Re: Searching for special characters in documents

2024-08-22 Thread Thomas Corthals
Hi Thorsten,

How exactly are you executing the queries? If you're using shell commands,
keep in mind that the shell will apply its own escaping rules to your
command parameters first so if you're not careful a backslash might already
be "eaten" before the actual request is fired of. Same goes for any string
escaping rules in the programming language you might be implementing your
test script in.

With echoParams=explicit you can see what ended up being sent to Solr, but
keep in mind that JSON also uses backslash as an escape character. This is
one case where wt=xml can actually help avoid confusion.

Thomas


Op wo 14 aug 2024 15:18 schreef Thorsten Heit :

> Hi,
>
> this is the first time I'm writing to this list, so hi to all :-)
>
> I'm having problems querying text having special characters inside (see
>
> https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html#escaping-special-charaters
> ).
>
> My setup:
> Solr 9.6.1 running as a standalone server system under Java 21 on an
> internal Linux VM (Ubuntu 24.04).
>
> For testing purposes I created a new core "test" and uploaded a few
> sample documents to it:
>
>
> {
>"responseHeader":{
>  "status":0,
>  "QTime":0,
>  "params":{
>"q":"*:*",
>"indent":"true",
>"q.op":"OR",
>"useParams":"",
>"_":"1723547755451"
>  }
>},
>"response":{
>  "numFound":7,
>  "start":0,
>  "numFoundExact":true,
>  "docs":[{
>"id":"70",
>"resourcename":"beispiel.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n
> Suchtext:\r\nab_dc\r\n \n  "],
>"_version_":1801822834550374400
>  },{
>"id":"71",
>"resourcename":"beispiel2.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n
> Suchtext:\r\nab-dc\r\n \n  "],
>"_version_":1801823062283255808
>  },{
>"id":"72",
>"resourcename":"beispiel3.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n  Dies ist ein
> langer Suchtext:\r\nab-de\r\ndef+hi\r\nkl-nop\r\n \n  "],
>"_version_":1806915982686420992
>  },{
>"id":"73",
>"resourcename":"beispiel4.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n
> ab-cd\r\nde-fg\r\n \n  "],
>"_version_":1806917322172006400
>  },{
>"id":"74",
>"resourcename":"beispiel2-1.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n  Dies ist ein
> langer Suchtext:\r\nabedc\r\ndef+ghi\r\n \n  "],
>"_version_":1807270704395059200
>  },{
>"id":"75",
>"resourcename":"beispiel2-2.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n  Dies ist ein
> langer Suchtext:\r\nab-dc\r\ndefxghi\r\n \n  "],
>"_version_":1807270722296348672
>  },{
>"id":"76",
>"resourcename":"beispiel2-3.txt",
>"content_type":["text/plain; charset=windows-1252"],
>"content":[" \n \n  \n  \n  \n  \n  \n  \n  \n \n  Dies ist ein
> langer Suchtext:\r\nabedc\r\ndefxghi\r\n \n  "],
>"_version_":1807270740219658240
>  }]
>}
> }
>
>
> The problem is that I haven't found out how to correctly search for
> documents with a "-" in it by using wildcards (* and ?). Some queries
> seem to work while others don't...
>
> The query itself is basically the same:
>
> q=...&q.op=AND&fl=id,resourcename&sort=id+asc&start=0&rows=2147483647
>
> and differs only in the value of "q".
>
> My queries:
>
> q: *uchtex*
> => ok, 6 documents found (#70, #71, #72, #74, #75, #76)
>
> q: uchtex*
> => ok, 0 documents found
>
> q: Suchtex*
> => ok, 6 documents found (#70, #71, #72, #74, #75, #76)
>
> q: b?d
> => ok, 0 documents found
>
> q: b?d*
> => ok, 0 documents found
>
> q: *b-d*
> => ok, 0 documents found (because "-" isn't quoted, right?)
>
> q: *b?d*
> => not ok, only 3 documents found: #70, #74, #76
> => missing:  #71, #72, #75
>
> q: *b*d*
> => not ok, only 3 documents found: #70, #74, #76
> => (all 7 expected)
>
> q: ?b?d?
> => not ok, only 3 documents found: #70, #74, #76
> => missing:  #71, #72, #75
>
> q: ab*
> => ok, all 7 documents found
>
> q: ab*d
> => not ok, 0 documents found
> => missing: #73
>
> q: ab??d
> => not ok, 0 documents found
> => missing: #73
>
> q: ab\-dc
> => ok, 2 documents found: #71, #75
>
> q: ab\-d*
> => not ok, 0 documents found
> => missing: #71, #72, #75
>
> q: ab?d*
> => not ok, 3 documents found: #70, #74, #76
> => missing: #71, #72, #75
>
> q: *b\-d*
> => not ok, 0 documents found
> => missing: #71, #72, #75
>
> q: *b\\-d*
> => 0
>
>
> Can someone enlighten me what I'm doing wrong? A

Re: SOLR-13510 patch installation

2024-08-22 Thread Shawn Heisey

On 8/22/2024 11:38, Raju Vaddeh wrote:
Below are the steps we have performed on the local solr environment for 
the patch install.


*1. Clone the solr repo*

git clone GitHub - apache/solr: Apache Solr open-source search software 



There's your problem right there.  You are cloning the Solr repo, which 
does not have Solr 8.x.  It only has code for Solr 9.0 and later.


If you instead clone the lucene-solr repo, your other steps will work.

git clone https://github.com/apache/lucene-solr.git

You really should upgrade to at least 8.11.3 if not the latest 9.x 
version.  Version 8.1.1 was released five years ago.


Thanks,
Shawn



Re: SOLR-13510 patch installation

2024-08-22 Thread Shawn Heisey

On 8/22/2024 17:40, Shawn Heisey wrote:

On 8/22/2024 11:38, Raju Vaddeh wrote:
Below are the steps we have performed on the local solr environment 
for the patch install.


*1. Clone the solr repo*

git clone GitHub - apache/solr: Apache Solr open-source search 
software 


There's your problem right there.  You are cloning the Solr repo, which 
does not have Solr 8.x.  It only has code for Solr 9.0 and later.


But you will need a couple of things before you can compile 8.x.  You 
will need ant, perl, and a Java JDK.  For Solr 8.x, I would use Java 8 
or Java 11.  I do not know whether it will work properly with Java 17 or 
later.


https://ant.apache.org/bindownload.cgi
https://strawberryperl.com/

These two programs and the Java JDK will need to be on the path.

Then you will need these commands, starting from the root of the checkout:

ant ivy-bootstrap
cd solr
ant clean package

The last command will create the .tar.gz and .zip archives which are 
very similar to what you get when you download Solr.


It is generally easier to compile Solr on an OS like Linux than on Windows.

Thanks,
Shawn



Re: SOLR-13510 patch installation

2024-08-22 Thread Shawn Heisey

On 8/22/2024 17:55, Shawn Heisey wrote:

Then you will need these commands, starting from the root of the checkout:

ant ivy-bootstrap
cd solr
ant clean package


I can confirm that Java 17 does NOT work.

I installed Java 11 (it was missing on my Windows 11 machine) and added 
this command before trying the final command:


set JAVA_HOME=C:\Program Files\Java\jdk-11

But it actually still failed.  It was trying to download artifacts from 
https://maven.restlet.com but that web server is using an expired 
certificate.


I managed to fix this by changing the following file:

lucene/default-nested-ivy-settings.xml

In that file, I replaced three instances of "maven.restlet.com" with 
"maven.restlet.talend.com".


Thanks,
Shawn