Re: Problem with eDisMax and multi-word synonyms

2021-04-29 Thread Jan Høydahl
I think you are hitting this bug 
https://issues.apache.org/jira/browse/SOLR-12779

Jan

> 29. apr. 2021 kl. 08:51 skrev Ere Maijala :
> 
> Hello Markus,
> 
> Thanks for the reply. I'm not sure I understand. The docs state the following:
> 
> "The default value of mm is 0% (all clauses optional), unless q.op is 
> specified as "AND", in which case mm defaults to 100% (all clauses required)."
> (https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter)
> 
> And obviously it has effect. You can also replace q.op=AND with mm=100%25 in 
> my examples with the same results. The multi-word synonym makes the query 
> explained by debugQuery=true seem wrong to me in that it requires all terms 
> to match in the same field, whereas normally the match can be found in any of 
> the fields listed in qf. For example this is the query from my first example:
> 
> +(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair)) 
> +DisjunctionMaxQuery((name:microsystems | manu:microsystems | 
> cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory | 
> cat:memory)))
> 
> Using the synonym instead of `corsair microsystems` produces this:
> 
> +(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair 
> +manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))
> 
> We don't use stopwords. mm.autoRelax does not make a difference here.
> 
> Best,
> Ere
> 
> Markus Jelsma kirjoitti 28.4.2021 klo 16.20:
>> Hello Ere,
>> The q.op parameter is not a dismax parameter. instead i think you are being
>> bitten bij de mm parameter [1] which by default is 100%, meaning all terms
>> must match. Multi word synonym handing and mm are not a very intuitive
>> match, and can lead to crazy problems. Also beware of mm and stopword
>> handling and check out mm.autoRelax [2]. But it is best not to use
>> stopwords at all.
>> Check it out,
>> Markus
>> [1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
>> [2] https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
>> Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala :
>>> Hi,
>>> 
>>> Here's one that I can't wrap my head around. The main question is: why
>>> are the search terms treated differently in eDisMax if the query expands
>>> to a multi-word synonym, and there are different field types and q.op=AND?
>>> 
>>> This gets complicated quickly, so I tried to reproduce the results with
>>> the techproducts example:
>>> 
>>> 1. Start with vanilla Solr 8.8.2
>>> 
>>> 2. echo "cor => Corsair" >>
>>> server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
>>> 
>>> 4. echo "cmi => Corsair Microsystems" >>
>>> server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
>>> 
>>> 4. bin/solr start -e techproducts
>>> 
>>> 
>>> Now, a basic query that works fine produces 2 results:
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>>> 
>>> But if I use the synonym, I don't get any results:
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>>> 
>>> If I leave cat field out, however, I get 2 results:
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND
>>> 
>>> Also if leave q.op out and add AND between the terms, I get 2 results
>>> even with the cat field:
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat
>>> 
>>> The single-word synonym works just fine:
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>>> 
>>> 
>>> Can anyone shine a light on what's happening here?
>>> 
>>> Additional notes:
>>> 
>>> 1. This is a simplified example, and the real-world case is much more
>>> complicated. It has our custom class create the synonyms for compound
>>> words in Finnish, and the queries come from users.
>>> 
>>> 2. As far as I can see mm doesn't affect the results in any meaningful
>>> way, but I just might be doing something wrong.
>>> 
>>> 3. I included the debugQuery parameter so that it's easy to see how
>>> different the queries become.
>>> 
>>> Best Regards,
>>> Ere
>>> 
>>> --
>>> Ere Maijala
>>> Kansalliskirjasto / The National Library of Finland
>>> 
> 
> -- 
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland



Re: Problem with eDisMax and multi-word synonyms

2021-04-29 Thread Ere Maijala
Ah, yes, that seems to be the case. Thanks for the pointer! Also the 
discussion is enlightening. It looks like there hasn't been much 
happening in that bug so I'll need to consider any other options.


--Ere

Jan Høydahl kirjoitti 29.4.2021 klo 11.48:

I think you are hitting this bug 
https://issues.apache.org/jira/browse/SOLR-12779

Jan


29. apr. 2021 kl. 08:51 skrev Ere Maijala :

Hello Markus,

Thanks for the reply. I'm not sure I understand. The docs state the following:

"The default value of mm is 0% (all clauses optional), unless q.op is specified as 
"AND", in which case mm defaults to 100% (all clauses required)."
(https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter)

And obviously it has effect. You can also replace q.op=AND with mm=100%25 in my 
examples with the same results. The multi-word synonym makes the query 
explained by debugQuery=true seem wrong to me in that it requires all terms to 
match in the same field, whereas normally the match can be found in any of the 
fields listed in qf. For example this is the query from my first example:

+(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair)) 
+DisjunctionMaxQuery((name:microsystems | manu:microsystems | 
cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory | 
cat:memory)))

Using the synonym instead of `corsair microsystems` produces this:

+(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair 
+manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))

We don't use stopwords. mm.autoRelax does not make a difference here.

Best,
Ere

Markus Jelsma kirjoitti 28.4.2021 klo 16.20:

Hello Ere,
The q.op parameter is not a dismax parameter. instead i think you are being
bitten bij de mm parameter [1] which by default is 100%, meaning all terms
must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.
Check it out,
Markus
[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2] https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala :

Hi,

Here's one that I can't wrap my head around. The main question is: why
are the search terms treated differently in eDisMax if the query expands
to a multi-word synonym, and there are different field types and q.op=AND?

This gets complicated quickly, so I tried to reproduce the results with
the techproducts example:

1. Start with vanilla Solr 8.8.2

2. echo "cor => Corsair" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. echo "cmi => Corsair Microsystems" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. bin/solr start -e techproducts


Now, a basic query that works fine produces 2 results:


http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

But if I use the synonym, I don't get any results:


http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

If I leave cat field out, however, I get 2 results:


http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND

Also if leave q.op out and add AND between the terms, I get 2 results
even with the cat field:


http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat

The single-word synonym works just fine:


http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


Can anyone shine a light on what's happening here?

Additional notes:

1. This is a simplified example, and the real-world case is much more
complicated. It has our custom class create the synonyms for compound
words in Finnish, and the queries come from users.

2. As far as I can see mm doesn't affect the results in any meaningful
way, but I just might be doing something wrong.

3. I included the debugQuery parameter so that it's easy to see how
different the queries become.

Best Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Problem with eDisMax and multi-word synonyms

2021-04-29 Thread Atita Arora
I struggled with something very similar today, which turned our search
model upside down as in Solr 8+ edismax uses sow=false by default and it
was true in Solr 6 (where we upgraded from),
but after adding this my handler things were back to normal.

On Thu, Apr 29, 2021 at 11:45 AM Ere Maijala 
wrote:

> Ah, yes, that seems to be the case. Thanks for the pointer! Also the
> discussion is enlightening. It looks like there hasn't been much
> happening in that bug so I'll need to consider any other options.
>
> --Ere
>
> Jan Høydahl kirjoitti 29.4.2021 klo 11.48:
> > I think you are hitting this bug
> https://issues.apache.org/jira/browse/SOLR-12779
> >
> > Jan
> >
> >> 29. apr. 2021 kl. 08:51 skrev Ere Maijala :
> >>
> >> Hello Markus,
> >>
> >> Thanks for the reply. I'm not sure I understand. The docs state the
> following:
> >>
> >> "The default value of mm is 0% (all clauses optional), unless q.op is
> specified as "AND", in which case mm defaults to 100% (all clauses
> required)."
> >> (
> https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
> )
> >>
> >> And obviously it has effect. You can also replace q.op=AND with
> mm=100%25 in my examples with the same results. The multi-word synonym
> makes the query explained by debugQuery=true seem wrong to me in that it
> requires all terms to match in the same field, whereas normally the match
> can be found in any of the fields listed in qf. For example this is the
> query from my first example:
> >>
> >> +(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair))
> +DisjunctionMaxQuery((name:microsystems | manu:microsystems |
> cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory |
> cat:memory)))
> >>
> >> Using the synonym instead of `corsair microsystems` produces this:
> >>
> >> +(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair
> +manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))
> >>
> >> We don't use stopwords. mm.autoRelax does not make a difference here.
> >>
> >> Best,
> >> Ere
> >>
> >> Markus Jelsma kirjoitti 28.4.2021 klo 16.20:
> >>> Hello Ere,
> >>> The q.op parameter is not a dismax parameter. instead i think you are
> being
> >>> bitten bij de mm parameter [1] which by default is 100%, meaning all
> terms
> >>> must match. Multi word synonym handing and mm are not a very intuitive
> >>> match, and can lead to crazy problems. Also beware of mm and stopword
> >>> handling and check out mm.autoRelax [2]. But it is best not to use
> >>> stopwords at all.
> >>> Check it out,
> >>> Markus
> >>> [1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
> >>> [2]
> https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
> >>> Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala <
> ere.maij...@helsinki.fi>:
>  Hi,
> 
>  Here's one that I can't wrap my head around. The main question is: why
>  are the search terms treated differently in eDisMax if the query
> expands
>  to a multi-word synonym, and there are different field types and
> q.op=AND?
> 
>  This gets complicated quickly, so I tried to reproduce the results
> with
>  the techproducts example:
> 
>  1. Start with vanilla Solr 8.8.2
> 
>  2. echo "cor => Corsair" >>
>  server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
> 
>  4. echo "cmi => Corsair Microsystems" >>
>  server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
> 
>  4. bin/solr start -e techproducts
> 
> 
>  Now, a basic query that works fine produces 2 results:
> 
> 
> 
> http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
> 
>  But if I use the synonym, I don't get any results:
> 
> 
> 
> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
> 
>  If I leave cat field out, however, I get 2 results:
> 
> 
> 
> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND
> 
>  Also if leave q.op out and add AND between the terms, I get 2 results
>  even with the cat field:
> 
> 
> 
> http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat
> 
>  The single-word synonym works just fine:
> 
> 
> 
> http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
> 
> 
>  Can anyone shine a light on what's happening here?
> 
>  Additional notes:
> 
>  1. This is a simplified example, and the real-world case is much more
>  complicated. It has our custom class create the synonyms for compound
>  words in Finnish, and the queries come from users.
> 
> 

Solr 8.6 Indexing Issue

2021-04-29 Thread Anuj Bhargava
Ever since installing 8.6 a couple of months ago, the indexing was working
fine. All of a sudden getting the following error -

4/29/2021, 12:01:59 PM
ERROR false
DocBuilder
Exception while processing: hotels document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK'
OR country ='MV' OR country ='BT' Processing Document # 1

4/29/2021, 12:01:59 PM
ERROR false
DataImporter
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT * FROM hotels WHERE country = 'IN' OR country ='PK'
OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
country ='MV' OR country ='BT' Processing Document # 1

Please help


Re: Problem with eDisMax and multi-word synonyms

2021-04-29 Thread Ere Maijala
Thanks for sharing your experience. We've been running sow=false until 
now and got away with it as most searches hit a catch-all field. That 
won't be the case in the long run anymore, so now I'm contemplating 
switching back to sow=true. It's just a bit scary change to do at this 
point since it might affect something I've failed to consider.


--Ere

Atita Arora kirjoitti 29.4.2021 klo 13.32:

I struggled with something very similar today, which turned our search
model upside down as in Solr 8+ edismax uses sow=false by default and it
was true in Solr 6 (where we upgraded from),
but after adding this my handler things were back to normal.

On Thu, Apr 29, 2021 at 11:45 AM Ere Maijala 
wrote:


Ah, yes, that seems to be the case. Thanks for the pointer! Also the
discussion is enlightening. It looks like there hasn't been much
happening in that bug so I'll need to consider any other options.

--Ere

Jan Høydahl kirjoitti 29.4.2021 klo 11.48:

I think you are hitting this bug

https://issues.apache.org/jira/browse/SOLR-12779


Jan


29. apr. 2021 kl. 08:51 skrev Ere Maijala :

Hello Markus,

Thanks for the reply. I'm not sure I understand. The docs state the

following:


"The default value of mm is 0% (all clauses optional), unless q.op is

specified as "AND", in which case mm defaults to 100% (all clauses
required)."

(

https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
)


And obviously it has effect. You can also replace q.op=AND with

mm=100%25 in my examples with the same results. The multi-word synonym
makes the query explained by debugQuery=true seem wrong to me in that it
requires all terms to match in the same field, whereas normally the match
can be found in any of the fields listed in qf. For example this is the
query from my first example:


+(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair))

+DisjunctionMaxQuery((name:microsystems | manu:microsystems |
cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory |
cat:memory)))


Using the synonym instead of `corsair microsystems` produces this:

+(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair

+manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))


We don't use stopwords. mm.autoRelax does not make a difference here.

Best,
Ere

Markus Jelsma kirjoitti 28.4.2021 klo 16.20:

Hello Ere,
The q.op parameter is not a dismax parameter. instead i think you are

being

bitten bij de mm parameter [1] which by default is 100%, meaning all

terms

must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.
Check it out,
Markus
[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2]

https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html

Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala <

ere.maij...@helsinki.fi>:

Hi,

Here's one that I can't wrap my head around. The main question is: why
are the search terms treated differently in eDisMax if the query

expands

to a multi-word synonym, and there are different field types and

q.op=AND?


This gets complicated quickly, so I tried to reproduce the results

with

the techproducts example:

1. Start with vanilla Solr 8.8.2

2. echo "cor => Corsair" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. echo "cmi => Corsair Microsystems" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. bin/solr start -e techproducts


Now, a basic query that works fine produces 2 results:




http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


But if I use the synonym, I don't get any results:




http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


If I leave cat field out, however, I get 2 results:




http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND


Also if leave q.op out and add AND between the terms, I get 2 results
even with the cat field:




http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat


The single-word synonym works just fine:




http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND



Can anyone shine a light on what's happening here?

Additional notes:

1. This is a simplified example, and the real-world case is much more
complicated. It has our custom class create the synonyms for compound
words in Finnish, and the queries come from users.

2. As far as I can see mm doesn't affect the results in any meaningful
way, but I just might be doing something wrong.

3. I included the debugQuery parameter so that it'

Re: Solr 8.6 Indexing Issue

2021-04-29 Thread Charlie Hull

What happens if you run exactly that SELECT query on your source database?

Charlie

On 29/04/2021 12:08, Anuj Bhargava wrote:

Ever since installing 8.6 a couple of months ago, the indexing was working
fine. All of a sudden getting the following error -

4/29/2021, 12:01:59 PM
ERROR false
DocBuilder
Exception while processing: hotels document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK'
OR country ='MV' OR country ='BT' Processing Document # 1

4/29/2021, 12:01:59 PM
ERROR false
DataImporter
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT * FROM hotels WHERE country = 'IN' OR country ='PK'
OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
country ='MV' OR country ='BT' Processing Document # 1

Please help



--
Charlie Hull - Managing Consultant at OpenSource Connections Limited 

Founding member of The Search Network  
and co-author of Searching the Enterprise 


tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828


Re: Problem with eDisMax and multi-word synonyms

2021-04-29 Thread Eric Pugh
This is where having a really nice regression test across a wide variety of 
representative queries comes in super helpful, to understand the impact of 
changes not just on one query, but across all your queries.   See 
https://quepid.com/ or https://github.com/SeaseLtd/rated-ranking-evaluator for 
some tooling that can help!



> On Apr 29, 2021, at 7:42 AM, Ere Maijala  wrote:
> 
> Thanks for sharing your experience. We've been running sow=false until now 
> and got away with it as most searches hit a catch-all field. That won't be 
> the case in the long run anymore, so now I'm contemplating switching back to 
> sow=true. It's just a bit scary change to do at this point since it might 
> affect something I've failed to consider.
> 
> --Ere
> 
> Atita Arora kirjoitti 29.4.2021 klo 13.32:
>> I struggled with something very similar today, which turned our search
>> model upside down as in Solr 8+ edismax uses sow=false by default and it
>> was true in Solr 6 (where we upgraded from),
>> but after adding this my handler things were back to normal.
>> On Thu, Apr 29, 2021 at 11:45 AM Ere Maijala 
>> wrote:
>>> Ah, yes, that seems to be the case. Thanks for the pointer! Also the
>>> discussion is enlightening. It looks like there hasn't been much
>>> happening in that bug so I'll need to consider any other options.
>>> 
>>> --Ere
>>> 
>>> Jan Høydahl kirjoitti 29.4.2021 klo 11.48:
 I think you are hitting this bug
>>> https://issues.apache.org/jira/browse/SOLR-12779
 
 Jan
 
> 29. apr. 2021 kl. 08:51 skrev Ere Maijala :
> 
> Hello Markus,
> 
> Thanks for the reply. I'm not sure I understand. The docs state the
>>> following:
> 
> "The default value of mm is 0% (all clauses optional), unless q.op is
>>> specified as "AND", in which case mm defaults to 100% (all clauses
>>> required)."
> (
>>> https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
>>> )
> 
> And obviously it has effect. You can also replace q.op=AND with
>>> mm=100%25 in my examples with the same results. The multi-word synonym
>>> makes the query explained by debugQuery=true seem wrong to me in that it
>>> requires all terms to match in the same field, whereas normally the match
>>> can be found in any of the fields listed in qf. For example this is the
>>> query from my first example:
> 
> +(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair))
>>> +DisjunctionMaxQuery((name:microsystems | manu:microsystems |
>>> cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory |
>>> cat:memory)))
> 
> Using the synonym instead of `corsair microsystems` produces this:
> 
> +(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair
>>> +manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))
> 
> We don't use stopwords. mm.autoRelax does not make a difference here.
> 
> Best,
> Ere
> 
> Markus Jelsma kirjoitti 28.4.2021 klo 16.20:
>> Hello Ere,
>> The q.op parameter is not a dismax parameter. instead i think you are
>>> being
>> bitten bij de mm parameter [1] which by default is 100%, meaning all
>>> terms
>> must match. Multi word synonym handing and mm are not a very intuitive
>> match, and can lead to crazy problems. Also beware of mm and stopword
>> handling and check out mm.autoRelax [2]. But it is best not to use
>> stopwords at all.
>> Check it out,
>> Markus
>> [1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
>> [2]
>>> https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html
>> Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala <
>>> ere.maij...@helsinki.fi>:
>>> Hi,
>>> 
>>> Here's one that I can't wrap my head around. The main question is: why
>>> are the search terms treated differently in eDisMax if the query
>>> expands
>>> to a multi-word synonym, and there are different field types and
>>> q.op=AND?
>>> 
>>> This gets complicated quickly, so I tried to reproduce the results
>>> with
>>> the techproducts example:
>>> 
>>> 1. Start with vanilla Solr 8.8.2
>>> 
>>> 2. echo "cor => Corsair" >>
>>> server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
>>> 
>>> 4. echo "cmi => Corsair Microsystems" >>
>>> server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
>>> 
>>> 4. bin/solr start -e techproducts
>>> 
>>> 
>>> Now, a basic query that works fine produces 2 results:
>>> 
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>>> 
>>> But if I use the synonym, I don't get any results:
>>> 
>>> 
>>> 
>>> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>>>

Re: Problem with eDisMax and multi-word synonyms

2021-04-29 Thread Ere Maijala
Thanks for the pointers. I suppose the first hurdle is to collect the 
"wide variety of representative queries". :) But even a small set would 
provide some indication of the impact, I think, so it wouldn't hurt to 
start using a proper tool for it.


--Ere

Eric Pugh kirjoitti 29.4.2021 klo 14.53:

This is where having a really nice regression test across a wide variety of 
representative queries comes in super helpful, to understand the impact of 
changes not just on one query, but across all your queries.   See 
https://quepid.com/ or https://github.com/SeaseLtd/rated-ranking-evaluator for 
some tooling that can help!




On Apr 29, 2021, at 7:42 AM, Ere Maijala  wrote:

Thanks for sharing your experience. We've been running sow=false until now and 
got away with it as most searches hit a catch-all field. That won't be the case 
in the long run anymore, so now I'm contemplating switching back to sow=true. 
It's just a bit scary change to do at this point since it might affect 
something I've failed to consider.

--Ere

Atita Arora kirjoitti 29.4.2021 klo 13.32:

I struggled with something very similar today, which turned our search
model upside down as in Solr 8+ edismax uses sow=false by default and it
was true in Solr 6 (where we upgraded from),
but after adding this my handler things were back to normal.
On Thu, Apr 29, 2021 at 11:45 AM Ere Maijala 
wrote:

Ah, yes, that seems to be the case. Thanks for the pointer! Also the
discussion is enlightening. It looks like there hasn't been much
happening in that bug so I'll need to consider any other options.

--Ere

Jan Høydahl kirjoitti 29.4.2021 klo 11.48:

I think you are hitting this bug

https://issues.apache.org/jira/browse/SOLR-12779


Jan


29. apr. 2021 kl. 08:51 skrev Ere Maijala :

Hello Markus,

Thanks for the reply. I'm not sure I understand. The docs state the

following:


"The default value of mm is 0% (all clauses optional), unless q.op is

specified as "AND", in which case mm defaults to 100% (all clauses
required)."

(

https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
)


And obviously it has effect. You can also replace q.op=AND with

mm=100%25 in my examples with the same results. The multi-word synonym
makes the query explained by debugQuery=true seem wrong to me in that it
requires all terms to match in the same field, whereas normally the match
can be found in any of the fields listed in qf. For example this is the
query from my first example:


+(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair))

+DisjunctionMaxQuery((name:microsystems | manu:microsystems |
cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory |
cat:memory)))


Using the synonym instead of `corsair microsystems` produces this:

+(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair

+manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))


We don't use stopwords. mm.autoRelax does not make a difference here.

Best,
Ere

Markus Jelsma kirjoitti 28.4.2021 klo 16.20:

Hello Ere,
The q.op parameter is not a dismax parameter. instead i think you are

being

bitten bij de mm parameter [1] which by default is 100%, meaning all

terms

must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.
Check it out,
Markus
[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2]

https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html

Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala <

ere.maij...@helsinki.fi>:

Hi,

Here's one that I can't wrap my head around. The main question is: why
are the search terms treated differently in eDisMax if the query

expands

to a multi-word synonym, and there are different field types and

q.op=AND?


This gets complicated quickly, so I tried to reproduce the results

with

the techproducts example:

1. Start with vanilla Solr 8.8.2

2. echo "cor => Corsair" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. echo "cmi => Corsair Microsystems" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. bin/solr start -e techproducts


Now, a basic query that works fine produces 2 results:




http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


But if I use the synonym, I don't get any results:




http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


If I leave cat field out, however, I get 2 results:




http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND


Also if leave q.op out and add AND between the terms, I get 2 results
even with the cat field:




http://localhost:8983/solr/techpro

Re: Solr 8.6 Indexing Issue

2021-04-29 Thread Anuj Bhargava
{ "responseHeader":{ "status":0, "QTime":11, "params":{ "q":"*:*", "_":
"1619701169621"}}, "response":{"numFound":179573,"start":0,"numFoundExact":
true,"docs":[ { "country":["AU"], "date_c":"2019-03-14T18:30:00Z",

On Thu, 29 Apr 2021 at 17:18, Charlie Hull 
wrote:

> What happens if you run exactly that SELECT query on your source database?
>
> Charlie
>
> On 29/04/2021 12:08, Anuj Bhargava wrote:
> > Ever since installing 8.6 a couple of months ago, the indexing was
> working
> > fine. All of a sudden getting the following error -
> >
> > 4/29/2021, 12:01:59 PM
> > ERROR false
> > DocBuilder
> > Exception while processing: hotels document : SolrInputDocument(fields:
> > []):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> > to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
> > ='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK'
> > OR country ='MV' OR country ='BT' Processing Document # 1
> >
> > 4/29/2021, 12:01:59 PM
> > ERROR false
> > DataImporter
> > Full Import failed:java.lang.RuntimeException:
> java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> > execute query: SELECT * FROM hotels WHERE country = 'IN' OR country ='PK'
> > OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
> > country ='MV' OR country ='BT' Processing Document # 1
> >
> > Please help
> >
>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> 
> Founding member of The Search Network 
> and co-author of Searching the Enterprise
> 
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>


Re: Solr 8.6 Indexing Issue

2021-04-29 Thread Anuj Bhargava
Saw the following in the logs -

*2021-04-29 11:19:58.330 INFO  (main) [   ] o.e.j.s.Server
jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git:
a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_292-b10*
*2021-04-29 11:19:58.352 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider
Deployment monitor [file:///opt/solr-8.6.2/server/contexts/] at interval 0*
*2021-04-29 11:19:58.559 INFO  (main) [   ]
o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not find
org.apache.jasper.ser.servlet.JspServlet*
*2021-04-29 11:19:58.568 INFO  (main) [   ] o.e.j.s.session
DefaultSessionIdManager workerName=node0*
*2021-04-29 11:19:58.568 INFO  (main) [   ] o.e.j.s.session No
SessionScavenger set, using defaults*
*2021-04-29 11:19:58.570 INFO  (main) [   ] o.e.j.s.session node0
Scavenging every 66ms*
*2021-04-29 11:19:58.693 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory*
*2021-04-29 11:19:58.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
___  _   Welcome to Apache Solr™ version 8.6.2*
*2021-04-29 11:19:58.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
__| ___| |_ _   Starting in standalone mode on port 8983*
*2021-04-29 11:19:58.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
\/ _ \ | '_|  Install dir: /opt/solr*
*2021-04-29 11:19:58.699 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
|___/\___/_|_|Start time: 2021-04-29T11:19:58.699Z*
*Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver
class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered
via the SPI and manual loading of the driver class is generally unnecessary*

Changed *com.mysql.jdbc.Drive*r with com.*mysql.cj.jdbc.Driver * in
datacofig.xml

 wrote:

> { "responseHeader":{ "status":0, "QTime":11, "params":{ "q":"*:*", "_":
> "1619701169621"}}, "response":{"numFound":179573,"start":0,"numFoundExact
> ":true,"docs":[ { "country":["AU"], "date_c":"2019-03-14T18:30:00Z",
>
> On Thu, 29 Apr 2021 at 17:18, Charlie Hull <
> ch...@opensourceconnections.com> wrote:
>
>> What happens if you run exactly that SELECT query on your source database?
>>
>> Charlie
>>
>> On 29/04/2021 12:08, Anuj Bhargava wrote:
>> > Ever since installing 8.6 a couple of months ago, the indexing was
>> working
>> > fine. All of a sudden getting the following error -
>> >
>> > 4/29/2021, 12:01:59 PM
>> > ERROR false
>> > DocBuilder
>> > Exception while processing: hotels document : SolrInputDocument(fields:
>> > []):org.apache.solr.handler.dataimport.DataImportHandlerException:
>> Unable
>> > to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
>> > ='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country
>> ='LK'
>> > OR country ='MV' OR country ='BT' Processing Document # 1
>> >
>> > 4/29/2021, 12:01:59 PM
>> > ERROR false
>> > DataImporter
>> > Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
>> > execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
>> ='PK'
>> > OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
>> > country ='MV' OR country ='BT' Processing Document # 1
>> >
>> > Please help
>> >
>>
>> --
>> Charlie Hull - Managing Consultant at OpenSource Connections Limited
>> 
>> Founding member of The Search Network 
>> and co-author of Searching the Enterprise
>> 
>> tel/fax: +44 (0)8700 118334
>> mobile: +44 (0)7767 825828
>>
>


Re: Solr 8.6 Indexing Issue

2021-04-29 Thread Anuj Bhargava
It is not fetching

Requests: 1 , Fetched: 0 , Skipped: 0 , Processed: 0


On Thu, 29 Apr 2021 at 18:35, Anuj Bhargava  wrote:

> Saw the following in the logs -
>
> *2021-04-29 11:19:58.330 INFO  (main) [   ] o.e.j.s.Server
> jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git:
> a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_292-b10*
> *2021-04-29 11:19:58.352 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider
> Deployment monitor [file:///opt/solr-8.6.2/server/contexts/] at interval 0*
> *2021-04-29 11:19:58.559 INFO  (main) [   ]
> o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not find
> org.apache.jasper.ser.servlet.JspServlet*
> *2021-04-29 11:19:58.568 INFO  (main) [   ] o.e.j.s.session
> DefaultSessionIdManager workerName=node0*
> *2021-04-29 11:19:58.568 INFO  (main) [   ] o.e.j.s.session No
> SessionScavenger set, using defaults*
> *2021-04-29 11:19:58.570 INFO  (main) [   ] o.e.j.s.session node0
> Scavenging every 66ms*
> *2021-04-29 11:19:58.693 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory*
> *2021-04-29 11:19:58.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> ___  _   Welcome to Apache Solr™ version 8.6.2*
> *2021-04-29 11:19:58.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
> __| ___| |_ _   Starting in standalone mode on port 8983*
> *2021-04-29 11:19:58.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
> \/ _ \ | '_|  Install dir: /opt/solr*
> *2021-04-29 11:19:58.699 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> |___/\___/_|_|Start time: 2021-04-29T11:19:58.699Z*
> *Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver
> class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered
> via the SPI and manual loading of the driver class is generally unnecessary*
>
> Changed *com.mysql.jdbc.Drive*r with com.*mysql.cj.jdbc.Driver * in
> datacofig.xml
>
>  driver="com.mysql.cj.jdbc.Driver"
> batchSize="-1"
>
> and also uploaded latest version - mysql-connector-java-8.0.24.jar
>
> On Thu, 29 Apr 2021 at 18:30, Anuj Bhargava  wrote:
>
>> { "responseHeader":{ "status":0, "QTime":11, "params":{ "q":"*:*", "_":
>> "1619701169621"}}, "response":{"numFound":179573,"start":0,"numFoundExact
>> ":true,"docs":[ { "country":["AU"], "date_c":"2019-03-14T18:30:00Z",
>>
>> On Thu, 29 Apr 2021 at 17:18, Charlie Hull <
>> ch...@opensourceconnections.com> wrote:
>>
>>> What happens if you run exactly that SELECT query on your source
>>> database?
>>>
>>> Charlie
>>>
>>> On 29/04/2021 12:08, Anuj Bhargava wrote:
>>> > Ever since installing 8.6 a couple of months ago, the indexing was
>>> working
>>> > fine. All of a sudden getting the following error -
>>> >
>>> > 4/29/2021, 12:01:59 PM
>>> > ERROR false
>>> > DocBuilder
>>> > Exception while processing: hotels document : SolrInputDocument(fields:
>>> > []):org.apache.solr.handler.dataimport.DataImportHandlerException:
>>> Unable
>>> > to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
>>> > ='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country
>>> ='LK'
>>> > OR country ='MV' OR country ='BT' Processing Document # 1
>>> >
>>> > 4/29/2021, 12:01:59 PM
>>> > ERROR false
>>> > DataImporter
>>> > Full Import failed:java.lang.RuntimeException:
>>> java.lang.RuntimeException:
>>> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
>>> to
>>> > execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
>>> ='PK'
>>> > OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
>>> > country ='MV' OR country ='BT' Processing Document # 1
>>> >
>>> > Please help
>>> >
>>>
>>> --
>>> Charlie Hull - Managing Consultant at OpenSource Connections Limited
>>> 
>>> Founding member of The Search Network 
>>> and co-author of Searching the Enterprise
>>> 
>>> tel/fax: +44 (0)8700 118334
>>> mobile: +44 (0)7767 825828
>>>
>>


Re: Streaming expression and matrix creation from numerical field

2021-04-29 Thread Joel Bernstein
That's interesting, it seems like you've indexed a matrix into a field.

If that's the case I think you'll need to access the arrays using the index
as described here:

https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index

Then you can create a matrix from the arrays.

I guess we need to add a way to materialize the matrix directly from a
multidimensional array.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER 
wrote:

> Hello everyone,
>
> I am currently trying to create a system for performing distance
> computation of different documents based on some pre-computed numerical
> feature vector.
>
> I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
> documents as such, with the feature field being pfloat with multiValued set
> to True:
>
>   {
> "id":"1",
> "feature":[
>   0.1,
>   0.5,
>   0.6,
>   1.7],
>   ,
>   {
> "id":"2",
> "feature":[
>   0.5,
>   0.1,
>   0.7,
>   0.9],
>   },
>   {
> "id":"3",
> "feature":[
>  -0.5,
>   0.9,
>   1.5,
>   0.2],
>   },
>
> I want to create a matrix so I can then use the distance() function to
> compute the distances for the columns of a matrix. The documentation
> provides an example of what I am interested in, by defining the vectors on
> the fly:
>
> let(a=array(20, 30, 40),
> b=array(21, 29, 41),
> c=array(31, 40, 50),
> d=matrix(a, b, c),
> c=distance(d))
>
> By transposing the matrix I can easily perform the distance between the
> rows, so I can get what I want.
>
> However, now I want to extract the numerical features from a feature field
> indexed in Solr. The documentation explains how to create a matrix from
> numerical values stored in some fields:
>
> let(
> a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> e=col(a, price_f),
> f=col(b, price_f),
> g=col(c, price_f),
> h=col(d, price_f),
> i=matrix(e, f, g, h),
> j=sumRows(i))
>
> However, in my case, I already have an array of float values for each
> document. So I try to do it that way:
>
> let(
> s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> m=matrix(f1,f2,f3)
> )
>
> But I get this error:
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
> Numeric value expected but found type java.util.ArrayList for value
> [0.1,0.5,0.6,1.7]",
> "EOF": true,
> "RESPONSE_TIME": 5
>   }
> ]
>   }
> }
>
> When I inspect what I get as f3, I see that I have an array of array, which
> is why I think it is failing here to create the matrix. I've been searching
> a lot on how to create a matrix from float vectors stored in a field of my
> documents, and I still cannot find any solution. What I could do is extract
> the vectors, create them on the fly, and construct the vectors and matrix,
> but I would like to be able to do it in one request. Moreover, I find it
> really curious that I cannot directly create the matrix on the results of a
> a normal search. For instance, I would prefer to do something like that:
>
> s=search(test,q="*",fl="feature,id"), m=col(s,feature))
>
> which returns:
>
> {
>   "result-set": {
> "docs": [
>   {
> "m": [
>   [
> 0.1,
> 0.5,
> 0.6,
> 1.7
>   ],
>   [
> 0.5,
> 0.1,
> 0.7,
> 0.9
>   ],
>   [
> -0.5,
> 0.9,
> 1.5,
> 0.2]
>   ]
> ]
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 3
>   }
> ]
>   }
> }
>
> and be able to use the matrix I obtain here. But again, I was not able to
> perform matrix operations on "m".
>
> Does anyone know any elegant way to create a matrix from my numerical
> vectors stored in my feature field?
>
>
> Thank you.
> --
> Xavier Favory
> Music Technology Group
> Universitat Pompeu Fabra
>


Cannot import documents in Tutorial, exercise 3

2021-04-29 Thread Thomas Hühn

Hello,

I'm trying to follow the Solr Tutorial and need some pointers where I'm 
going wrong.


In exercise 3 I cannot import any documents in my localDocs collection, 
neither using the example\exampledocs\post.jar helper, nor the File 
Upload in the Admin UI. Neither PDF, nor XML, nor plain text.


The /update endpoint doesn't seem to be reachable. If so, why?

This is on Windows 10. All command line input and output is a 
transcript, so no copy&paste errors.


First, the Java version:

--
PS D:\Solr\solr-8.8.2> java -version
openjdk version "16.0.1" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-16.0.1+9 (build 16.0.1+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-16.0.1+9 (build 16.0.1+9, mixed 
mode, sharing)

--

I'm not comfortable taking shortcuts, so I'm starting Solr, creating 
collections etc. from exercises 1 and 2, as well.


--
PS D:\Solr\solr-8.8.2> .\bin\solr start -e cloud
"java version info is 16.0.1"
"Extracted major version is 16"

Welcome to the SolrCloud example!

This interactive session will help you launch a SolrCloud cluster on 
your local workstation.
To begin, how many Solr nodes would you like to run in your local 
cluster? (specify 1-4 nodes) [2]:


Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:

Please enter the port for node2 [7574]:

Creating Solr home directory D:\Solr\solr-8.8.2\example\cloud\node1\solr
Cloning D:\Solr\solr-8.8.2\example\cloud\node1 into
   D:\Solr\solr-8.8.2\example\cloud\node2

Starting up Solr on port 8983 using command:
"D:\Solr\solr-8.8.2\bin\solr.cmd" start -cloud -p 8983 -s 
"D:\Solr\solr-8.8.2\example\cloud\node1\solr"


"java version info is 16.0.1"
"Extracted major version is 16"
OpenJDK 64-Bit Server VM warning: JVM cannot use large page memory 
because it does not have enough privilege to lock pag

es in memory.
Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!

Starting up Solr on port 7574 using command:
"D:\Solr\solr-8.8.2\bin\solr.cmd" start -cloud -p 7574 -s 
"D:\Solr\solr-8.8.2\example\cloud\node2\solr" -z localhost:998

3

"java version info is 16.0.1"
"Extracted major version is 16"
OpenJDK 64-Bit Server VM warning: JVM cannot use large page memory 
because it does not have enough privilege to lock pag

es in memory.
Waiting up to 30 to see Solr running on port 7574
Started Solr server on port 7574. Happy searching!
INFO  - 2021-04-29 15:22:53.249; 
org.apache.solr.common.cloud.ConnectionManager; Waiting for client to 
connect to ZooKee

per
INFO  - 2021-04-29 15:22:53.275; 
org.apache.solr.common.cloud.ConnectionManager; zkClient has connected
INFO  - 2021-04-29 15:22:53.275; 
org.apache.solr.common.cloud.ConnectionManager; Client is connected to 
ZooKeeper
INFO  - 2021-04-29 15:22:53.290; 
org.apache.solr.common.cloud.ZkStateReader; Updated live nodes from 
ZooKeeper... (0) ->

 (2)
INFO  - 2021-04-29 15:22:53.290; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster 
at localhost:99

83 ready

Now let's create a new collection for indexing documents in your 2-node 
cluster.

Please provide a name for your new collection: [gettingstarted]
techproducts
How many shards would you like to split techproducts into? [2]

How many replicas per shard would you like to create? [2]

Please choose a configuration for the techproducts collection, available 
options are:

_default or sample_techproducts_configs [_default]
sample_techproducts_configs
Created collection 'techproducts' with 2 shard(s), 2 replica(s) with 
config-set 'techproducts'


Enabling auto soft-commits with maxTime 3 secs using the Config API

POSTing request to Config API: 
http://localhost:8983/solr/techproducts/config

{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000


SolrCloud example running, please visit: http://localhost:8983/solr
--

Now I'm deleting it again, as documented in the wrap-up:

--
PS D:\Solr\solr-8.8.2> .\bin\solr delete -c techproducts
"java version info is 16.0.1"
"Extracted major version is 16"
{
  "responseHeader":{
"status":0,
"QTime":237},
  "success":{
"192.168.178.59:8983_solr":{"responseHeader":{
"status":0,
"QTime":45}},
"192.168.178.59:7574_solr":{"responseHeader":{
"status":0,
"QTime":60


Deleted collection 'techproducts' using command:
http://192.168.178.59:7574/solr/admin/collections?action=DELETE&name=techproducts
--

Enriching the schema:

--

Re: Streaming expression and matrix creation from numerical field

2021-04-29 Thread FAVORY , XAVIER
Hi Joel,

Thank you for pointing me to that part of the documentation. valueAt() is
exactly what I needed here.
However, as you point out, there seems to be no way to directly get the
matrix from a multidimensional array.
As a consequence, my streaming expression is very verbose and quite long
for my purpose (I perform this over a thousand documents), but it actually
works by doing it that way (and I get rid of an extra queries to get the
ids from a text search for instance):

let(
s=search(test,q="*",fl="feature"),
f1=valueAt(col(s, feature ),0),
f2=valueAt(col(s, feature ),1),
f3=valueAt(col(s, feature ),2),
m=transpose(matrix(f1,f2,f3)),
d=distance(m,cosine())
)


Thank you again,
Best,

Xavier

On Thu, 29 Apr 2021 at 16:04, Joel Bernstein  wrote:

> That's interesting, it seems like you've indexed a matrix into a field.
>
> If that's the case I think you'll need to access the arrays using the index
> as described here:
>
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
>
> Then you can create a matrix from the arrays.
>
> I guess we need to add a way to materialize the matrix directly from a
> multidimensional array.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER 
> wrote:
>
> > Hello everyone,
> >
> > I am currently trying to create a system for performing distance
> > computation of different documents based on some pre-computed numerical
> > feature vector.
> >
> > I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
> > documents as such, with the feature field being pfloat with multiValued
> set
> > to True:
> >
> >   {
> > "id":"1",
> > "feature":[
> >   0.1,
> >   0.5,
> >   0.6,
> >   1.7],
> >   ,
> >   {
> > "id":"2",
> > "feature":[
> >   0.5,
> >   0.1,
> >   0.7,
> >   0.9],
> >   },
> >   {
> > "id":"3",
> > "feature":[
> >  -0.5,
> >   0.9,
> >   1.5,
> >   0.2],
> >   },
> >
> > I want to create a matrix so I can then use the distance() function to
> > compute the distances for the columns of a matrix. The documentation
> > provides an example of what I am interested in, by defining the vectors
> on
> > the fly:
> >
> > let(a=array(20, 30, 40),
> > b=array(21, 29, 41),
> > c=array(31, 40, 50),
> > d=matrix(a, b, c),
> > c=distance(d))
> >
> > By transposing the matrix I can easily perform the distance between the
> > rows, so I can get what I want.
> >
> > However, now I want to extract the numerical features from a feature
> field
> > indexed in Solr. The documentation explains how to create a matrix from
> > numerical values stored in some fields:
> >
> > let(
> > a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> > b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> > c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> > d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> > e=col(a, price_f),
> > f=col(b, price_f),
> > g=col(c, price_f),
> > h=col(d, price_f),
> > i=matrix(e, f, g, h),
> > j=sumRows(i))
> >
> > However, in my case, I already have an array of float values for each
> > document. So I try to do it that way:
> >
> > let(
> > s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> > s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> > s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> > m=matrix(f1,f2,f3)
> > )
> >
> > But I get this error:
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
> > Numeric value expected but found type java.util.ArrayList for value
> > [0.1,0.5,0.6,1.7]",
> > "EOF": true,
> > "RESPONSE_TIME": 5
> >   }
> > ]
> >   }
> > }
> >
> > When I inspect what I get as f3, I see that I have an array of array,
> which
> > is why I think it is failing here to create the matrix. I've been
> searching
> > a lot on how to create a matrix from float vectors stored in a field of
> my
> > documents, and I still cannot find any solution. What I could do is
> extract
> > the vectors, create them on the fly, and construct the vectors and
> matrix,
> > but I would like to be able to do it in one request. Moreover, I find it
> > really curious that I cannot directly create the matrix on the results
> of a
> > a normal search. For instance, I would prefer to do something like that:
> >
> > s=search(test,q="*",fl="feature,id"), m=col(s,feature))
> >
> > which returns:
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "m": [
> >   [
> > 0.1,
> > 0.5,
> > 0.6,
> > 1.7
> >   ],
> >   [
> > 0.5,
> > 

Re: Streaming expression and matrix creation from numerical field

2021-04-29 Thread Joel Bernstein
I agree this is very verbose. I didn't even realize you could index a
multidimensional array into a multi-value field until now. Knowing this it
makes sense to support matrix creation directly from multi-value arrays.
I'll add this when i get some time.




Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER 
wrote:

> Hi Joel,
>
> Thank you for pointing me to that part of the documentation. valueAt() is
> exactly what I needed here.
> However, as you point out, there seems to be no way to directly get the
> matrix from a multidimensional array.
> As a consequence, my streaming expression is very verbose and quite long
> for my purpose (I perform this over a thousand documents), but it actually
> works by doing it that way (and I get rid of an extra queries to get the
> ids from a text search for instance):
>
> let(
> s=search(test,q="*",fl="feature"),
> f1=valueAt(col(s, feature ),0),
> f2=valueAt(col(s, feature ),1),
> f3=valueAt(col(s, feature ),2),
> m=transpose(matrix(f1,f2,f3)),
> d=distance(m,cosine())
> )
>
>
> Thank you again,
> Best,
>
> Xavier
>
> On Thu, 29 Apr 2021 at 16:04, Joel Bernstein  wrote:
>
> > That's interesting, it seems like you've indexed a matrix into a field.
> >
> > If that's the case I think you'll need to access the arrays using the
> index
> > as described here:
> >
> >
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
> >
> > Then you can create a matrix from the arrays.
> >
> > I guess we need to add a way to materialize the matrix directly from a
> > multidimensional array.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER 
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I am currently trying to create a system for performing distance
> > > computation of different documents based on some pre-computed numerical
> > > feature vector.
> > >
> > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I have
> > > documents as such, with the feature field being pfloat with multiValued
> > set
> > > to True:
> > >
> > >   {
> > > "id":"1",
> > > "feature":[
> > >   0.1,
> > >   0.5,
> > >   0.6,
> > >   1.7],
> > >   ,
> > >   {
> > > "id":"2",
> > > "feature":[
> > >   0.5,
> > >   0.1,
> > >   0.7,
> > >   0.9],
> > >   },
> > >   {
> > > "id":"3",
> > > "feature":[
> > >  -0.5,
> > >   0.9,
> > >   1.5,
> > >   0.2],
> > >   },
> > >
> > > I want to create a matrix so I can then use the distance() function to
> > > compute the distances for the columns of a matrix. The documentation
> > > provides an example of what I am interested in, by defining the vectors
> > on
> > > the fly:
> > >
> > > let(a=array(20, 30, 40),
> > > b=array(21, 29, 41),
> > > c=array(31, 40, 50),
> > > d=matrix(a, b, c),
> > > c=distance(d))
> > >
> > > By transposing the matrix I can easily perform the distance between the
> > > rows, so I can get what I want.
> > >
> > > However, now I want to extract the numerical features from a feature
> > field
> > > indexed in Solr. The documentation explains how to create a matrix from
> > > numerical values stored in some fields:
> > >
> > > let(
> > > a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> > > b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> > > c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> > > d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> > > e=col(a, price_f),
> > > f=col(b, price_f),
> > > g=col(c, price_f),
> > > h=col(d, price_f),
> > > i=matrix(e, f, g, h),
> > > j=sumRows(i))
> > >
> > > However, in my case, I already have an array of float values for each
> > > document. So I try to do it that way:
> > >
> > > let(
> > > s1=search(test,q="id:1",fl="feature"), f1=col(s1, feature),
> > > s2=search(test,q="id:2",fl="feature"), f2=col(s2, feature),
> > > s3=search(test,q="id:3",fl="feature"), f3=col(s3, feature),
> > > m=matrix(f1,f2,f3)
> > > )
> > >
> > > But I get this error:
> > >
> > > {
> > >   "result-set": {
> > > "docs": [
> > >   {
> > > "EXCEPTION": "Failed to evaluate expression matrix(f1,f2,f3) -
> > > Numeric value expected but found type java.util.ArrayList for value
> > > [0.1,0.5,0.6,1.7]",
> > > "EOF": true,
> > > "RESPONSE_TIME": 5
> > >   }
> > > ]
> > >   }
> > > }
> > >
> > > When I inspect what I get as f3, I see that I have an array of array,
> > which
> > > is why I think it is failing here to create the matrix. I've been
> > searching
> > > a lot on how to create a matrix from float vectors stored in a field of
> > my
> > > documents, and I still cannot find any solution. What I could do is
> > ex

Re: Streaming expression and matrix creation from numerical field

2021-04-29 Thread FAVORY , XAVIER
Well, I actually index an array in my field.
But when I use f1=col(s1, feature), it extracts it as a multi-valued field.
I understand that col() is used to extract a field value from multiple
retrieved instances, so it kind of puts it into an array, forming a
multidimensional array.

Could it be possible that I am not using the most adequate field type to
store my features? I just want to store some arrays (for instance one
128-dim feature vector for each document).
Also, as it is now, I need to perform an extra request to know the number
of results I get from the query. This way I can then create the right
streaming expression, with the right number of "fn" variables.




On Thu, 29 Apr 2021 at 16:58, Joel Bernstein  wrote:

> I agree this is very verbose. I didn't even realize you could index a
> multidimensional array into a multi-value field until now. Knowing this it
> makes sense to support matrix creation directly from multi-value arrays.
> I'll add this when i get some time.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER 
> wrote:
>
> > Hi Joel,
> >
> > Thank you for pointing me to that part of the documentation. valueAt() is
> > exactly what I needed here.
> > However, as you point out, there seems to be no way to directly get the
> > matrix from a multidimensional array.
> > As a consequence, my streaming expression is very verbose and quite long
> > for my purpose (I perform this over a thousand documents), but it
> actually
> > works by doing it that way (and I get rid of an extra queries to get the
> > ids from a text search for instance):
> >
> > let(
> > s=search(test,q="*",fl="feature"),
> > f1=valueAt(col(s, feature ),0),
> > f2=valueAt(col(s, feature ),1),
> > f3=valueAt(col(s, feature ),2),
> > m=transpose(matrix(f1,f2,f3)),
> > d=distance(m,cosine())
> > )
> >
> >
> > Thank you again,
> > Best,
> >
> > Xavier
> >
> > On Thu, 29 Apr 2021 at 16:04, Joel Bernstein  wrote:
> >
> > > That's interesting, it seems like you've indexed a matrix into a field.
> > >
> > > If that's the case I think you'll need to access the arrays using the
> > index
> > > as described here:
> > >
> > >
> >
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
> > >
> > > Then you can create a matrix from the arrays.
> > >
> > > I guess we need to add a way to materialize the matrix directly from a
> > > multidimensional array.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER  >
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > I am currently trying to create a system for performing distance
> > > > computation of different documents based on some pre-computed
> numerical
> > > > feature vector.
> > > >
> > > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I
> have
> > > > documents as such, with the feature field being pfloat with
> multiValued
> > > set
> > > > to True:
> > > >
> > > >   {
> > > > "id":"1",
> > > > "feature":[
> > > >   0.1,
> > > >   0.5,
> > > >   0.6,
> > > >   1.7],
> > > >   ,
> > > >   {
> > > > "id":"2",
> > > > "feature":[
> > > >   0.5,
> > > >   0.1,
> > > >   0.7,
> > > >   0.9],
> > > >   },
> > > >   {
> > > > "id":"3",
> > > > "feature":[
> > > >  -0.5,
> > > >   0.9,
> > > >   1.5,
> > > >   0.2],
> > > >   },
> > > >
> > > > I want to create a matrix so I can then use the distance() function
> to
> > > > compute the distances for the columns of a matrix. The documentation
> > > > provides an example of what I am interested in, by defining the
> vectors
> > > on
> > > > the fly:
> > > >
> > > > let(a=array(20, 30, 40),
> > > > b=array(21, 29, 41),
> > > > c=array(31, 40, 50),
> > > > d=matrix(a, b, c),
> > > > c=distance(d))
> > > >
> > > > By transposing the matrix I can easily perform the distance between
> the
> > > > rows, so I can get what I want.
> > > >
> > > > However, now I want to extract the numerical features from a feature
> > > field
> > > > indexed in Solr. The documentation explains how to create a matrix
> from
> > > > numerical values stored in some fields:
> > > >
> > > > let(
> > > > a=random(collection1, q="market:A", rows="5000", fl="price_f"),
> > > > b=random(collection1, q="market:B", rows="5000", fl="price_f"),
> > > > c=random(collection1, q="market:C", rows="5000", fl="price_f"),
> > > > d=random(collection1, q="market:D", rows="5000", fl="price_f"),
> > > > e=col(a, price_f),
> > > > f=col(b, price_f),
> > > > g=col(c, price_f),
> > > > h=col(d, price_f),
> > > > i=matrix(e, f, g, h),
> > > > j=sumRows(i))
> > > >
> > > > However, in my case, I already have an array of float values for each
> > > > document. So 

Re: Streaming expression and matrix creation from numerical field

2021-04-29 Thread Joel Bernstein
If you store it as a multi-value double or float, you would in theory just
get the array. It may be the way you are indexing the data rather than
defining the field that is creating the outer array.


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Apr 29, 2021 at 11:06 AM FAVORY , XAVIER 
wrote:

> Well, I actually index an array in my field.
> But when I use f1=col(s1, feature), it extracts it as a multi-valued field.
> I understand that col() is used to extract a field value from multiple
> retrieved instances, so it kind of puts it into an array, forming a
> multidimensional array.
>
> Could it be possible that I am not using the most adequate field type to
> store my features? I just want to store some arrays (for instance one
> 128-dim feature vector for each document).
> Also, as it is now, I need to perform an extra request to know the number
> of results I get from the query. This way I can then create the right
> streaming expression, with the right number of "fn" variables.
>
>
>
>
> On Thu, 29 Apr 2021 at 16:58, Joel Bernstein  wrote:
>
> > I agree this is very verbose. I didn't even realize you could index a
> > multidimensional array into a multi-value field until now. Knowing this
> it
> > makes sense to support matrix creation directly from multi-value arrays.
> > I'll add this when i get some time.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Thu, Apr 29, 2021 at 10:46 AM FAVORY , XAVIER 
> > wrote:
> >
> > > Hi Joel,
> > >
> > > Thank you for pointing me to that part of the documentation. valueAt()
> is
> > > exactly what I needed here.
> > > However, as you point out, there seems to be no way to directly get the
> > > matrix from a multidimensional array.
> > > As a consequence, my streaming expression is very verbose and quite
> long
> > > for my purpose (I perform this over a thousand documents), but it
> > actually
> > > works by doing it that way (and I get rid of an extra queries to get
> the
> > > ids from a text search for instance):
> > >
> > > let(
> > > s=search(test,q="*",fl="feature"),
> > > f1=valueAt(col(s, feature ),0),
> > > f2=valueAt(col(s, feature ),1),
> > > f3=valueAt(col(s, feature ),2),
> > > m=transpose(matrix(f1,f2,f3)),
> > > d=distance(m,cosine())
> > > )
> > >
> > >
> > > Thank you again,
> > > Best,
> > >
> > > Xavier
> > >
> > > On Thu, 29 Apr 2021 at 16:04, Joel Bernstein 
> wrote:
> > >
> > > > That's interesting, it seems like you've indexed a matrix into a
> field.
> > > >
> > > > If that's the case I think you'll need to access the arrays using the
> > > index
> > > > as described here:
> > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/8_8/vector-math.html#getting-values-by-index
> > > >
> > > > Then you can create a matrix from the arrays.
> > > >
> > > > I guess we need to add a way to materialize the matrix directly from
> a
> > > > multidimensional array.
> > > >
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > >
> > > > On Tue, Apr 27, 2021 at 6:00 PM FAVORY , XAVIER <
> xavier.fav...@upf.edu
> > >
> > > > wrote:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > I am currently trying to create a system for performing distance
> > > > > computation of different documents based on some pre-computed
> > numerical
> > > > > feature vector.
> > > > >
> > > > > I set up Solr (cloud) 8.7 and I am using streaming expressions. I
> > have
> > > > > documents as such, with the feature field being pfloat with
> > multiValued
> > > > set
> > > > > to True:
> > > > >
> > > > >   {
> > > > > "id":"1",
> > > > > "feature":[
> > > > >   0.1,
> > > > >   0.5,
> > > > >   0.6,
> > > > >   1.7],
> > > > >   ,
> > > > >   {
> > > > > "id":"2",
> > > > > "feature":[
> > > > >   0.5,
> > > > >   0.1,
> > > > >   0.7,
> > > > >   0.9],
> > > > >   },
> > > > >   {
> > > > > "id":"3",
> > > > > "feature":[
> > > > >  -0.5,
> > > > >   0.9,
> > > > >   1.5,
> > > > >   0.2],
> > > > >   },
> > > > >
> > > > > I want to create a matrix so I can then use the distance() function
> > to
> > > > > compute the distances for the columns of a matrix. The
> documentation
> > > > > provides an example of what I am interested in, by defining the
> > vectors
> > > > on
> > > > > the fly:
> > > > >
> > > > > let(a=array(20, 30, 40),
> > > > > b=array(21, 29, 41),
> > > > > c=array(31, 40, 50),
> > > > > d=matrix(a, b, c),
> > > > > c=distance(d))
> > > > >
> > > > > By transposing the matrix I can easily perform the distance between
> > the
> > > > > rows, so I can get what I want.
> > > > >
> > > > > However, now I want to extract the numerical features from a
> feature
> > > > field
> > > > > indexed in Solr. The documentation explains how to create a matrix
> > from
> > > > > numerical val

Re: Solr 8.6 Indexing Issue

2021-04-29 Thread Charlie Hull
I meant can you run it on the database directly, without Solr, and what 
happens?


Best

Charlie

On 29/04/2021 14:00, Anuj Bhargava wrote:

{ "responseHeader":{ "status":0, "QTime":11, "params":{ "q":"*:*", "_":
"1619701169621"}}, "response":{"numFound":179573,"start":0,"numFoundExact":
true,"docs":[ { "country":["AU"], "date_c":"2019-03-14T18:30:00Z",

On Thu, 29 Apr 2021 at 17:18, Charlie Hull 
wrote:


What happens if you run exactly that SELECT query on your source database?

Charlie

On 29/04/2021 12:08, Anuj Bhargava wrote:

Ever since installing 8.6 a couple of months ago, the indexing was

working

fine. All of a sudden getting the following error -

4/29/2021, 12:01:59 PM
ERROR false
DocBuilder
Exception while processing: hotels document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK'
OR country ='MV' OR country ='BT' Processing Document # 1

4/29/2021, 12:01:59 PM
ERROR false
DataImporter
Full Import failed:java.lang.RuntimeException:

java.lang.RuntimeException:

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT * FROM hotels WHERE country = 'IN' OR country ='PK'
OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
country ='MV' OR country ='BT' Processing Document # 1

Please help


--
Charlie Hull - Managing Consultant at OpenSource Connections Limited

Founding member of The Search Network 
and co-author of Searching the Enterprise

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828



--
Charlie Hull - Managing Consultant at OpenSource Connections Limited 

Founding member of The Search Network  
and co-author of Searching the Enterprise 


tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828


Learning to Rank - within solr or outside?

2021-04-29 Thread Nitin Arora
Hello, Can someone share the pros and cons of using SOLR's learning to rank
vs having an external reranker after fetching SOLR's top results. Which
option would you recommend?

Thanks in advance,


Permission "all" gets evaluated before more specific ones

2021-04-29 Thread Luca Fregolon
Hello,
I am trying to configure Solr authentication using Basic
Authentication and Role Based Authorization. I've been facing issues
configuring the authorization part, while the authentication part
works fine. My goal is to define three groups, containing one user
each. One user (chatbot) should have read permission on all
collections and should be able to write on only one collection.
Another user should have read permissions on all the collections and
write permissions on all the collections but one, which is the one the
other user is allowed to write on.
Then there is a user (superadmin) that should be able to do everything.

I am using Solr 8, in standalone mode.
I tried to write the following security.json file but every request
made by chatbot and console users gets rejected and the log points out
that superadmin is the only group allowed to perform the request.
If I delete the "all" rule, everything works as supposed to but I
cannot have a privileged user. This, in my opinion, seems not coherent
with what is written in the reference guide about the permission
priority 
(https://solr.apache.org/guide/8_8/rule-based-authorization-plugin.html).
I did a lot of research before posting here but I didn't find any
solutions, so I would appreciate any help to sort it out.

{
  "authentication": {
"class": "solr.BasicAuthPlugin",
"blockUnknown": true,
"credentials": {
  "superadmin-user":"...",
  "chatbot-user":"...",
  "console-user":"..."
}
  },
  "authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"user-role": {
  "chatbot-user": "chatbot",
  "console-user": "console",
  "superadmin-user": "superadmin"
},
"permissions": [
  {"collection":["col1", "col2", "col3", "col4", "col5"],
"role":["chatbot","console"], "path":"/select"},
  {"collection":"col5", "role":"chatbot", "path":"/update"},
  {"collection":["col1", "col2", "col3", "col4"],
"role":"console", "path":"/update"},
  {"name":"all", "role":"superadmin"}
]
  }
}

Luca


childFilter Solr 6.6

2021-04-29 Thread Roopa Rao
Hi -
I am on version Solr 6.6 and using parent and child construct

Query as follows:
 [child parentFilter='field1:value1' childFilter='((field1:(value1)) AND
({!terms f=field2}value2))' limit=1]

Here I want to sort this children by field3 and return only the top 1
element (hence limit=1) . Is there a way to do this?

I notice that when the children are returned, it is being returned in the
order it was indexed. Is this guaranteed?
If it is going to return the order in which is indexed that will work in my
case too, as I index the field3 data in the order which it should be shown.

Thanks,
Roopa


Re: SecureRandom algorithm 'NativePRNG' is in use

2021-04-29 Thread Chris Hostetter


: > I intermittently face this issue sometimes while running the unit tests.

How exactly are you running the tests? ant? IDE? ... It's very strange 
that this would be an intermittent problem.

Can you please post the actual log details from the test so we can see the 
INFO & WARN level logging from assertNonBlockingRandomGeneratorAvailable() 
(just before this assertion would fail) ... I'm very curious what it was 
your java.security.egd value is (and where/why/how it's getting set).

: One more thing,  -Dtest.solr.allowed.securerandom=NativePRNG doesn't seem
: to help and I haven't tried the other option yet.

If test.solr.allowed.securerandom is being set properly (so that the 
forked Test JVM is getting it) then that assertion can't even be run (but 
a diff assertion is to vet that what you specify is what your JVM is 
using) ... one thing that may not be well explained in the docs is that 
when running tests from ant, you need to use '-Dargs=...' in order to pass 
"extra" arguments to the forked test VMs...

ant test -Dtestcase=SampleTest 
-Dargs='-Dtest.solr.allowed.securerandom=BogusPRNG'
...
   [junit4]   2> 1420 INFO  (SUITE-SampleTest-seed#[DDDB05C007992358]-worker) [ 
] o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: 
test.solr.allowed.securerandom=BogusPRNG & java.security.egd=file:/dev/./urandom
...
   [junit4]> Throwable #1: org.junit.ComparisonFailure: Algorithm specified 
using test.solr.allowed.securerandom system property does not match actual 
algorithm expected:<[Bogus]PRNG> but was:<[SHA1]PRNG>





-Hoss
http://www.lucidworks.com/


Re: Lucene Fili Fingerprint & ML

2021-04-29 Thread Furkan KAMACI
Hi Ecem,

1.a) What do you mean with File Fingerprint? If that is hash of a file, you
should check here:

https://solr.apache.org/guide/8_8/de-duplication.html

1.b) Solr has a limited capability of Machine Learning applications. You
can check the list from here:

https://solr.apache.org/guide/8_8/machine-learning.html

2) Solr/Lucene are libraries which are not desktop applications. You can
use them as dependencies at your desktop applications.

Kind Regards,
Furkan KAMACI


On Tue, Apr 20, 2021 at 4:42 PM ECEM YAMAN <16008117...@ogr.bozok.edu.tr>
wrote:

> Hi,
> I just started working on Solr and Lucene. I have some questions on my
> mind, can you help me?
>
> - How to apply File Fingerprint and Machine learning applications to
> lucene?
> - What are the dependencies in the Solr and Lucene desktop application?
>
> Also I would appreciate if you recommend a book.
>
> Best regards,
> Ecem
>


Re: Solr 8.6 Indexing Issue

2021-04-29 Thread Anuj Bhargava
Without solr, directly is also working.

Only it is not indexing any longer

On Thu, 29 Apr 2021 at 21:40, Charlie Hull 
wrote:

> I meant can you run it on the database directly, without Solr, and what
> happens?
>
> Best
>
> Charlie
>
> On 29/04/2021 14:00, Anuj Bhargava wrote:
> > { "responseHeader":{ "status":0, "QTime":11, "params":{ "q":"*:*", "_":
> > "1619701169621"}},
> "response":{"numFound":179573,"start":0,"numFoundExact":
> > true,"docs":[ { "country":["AU"], "date_c":"2019-03-14T18:30:00Z",
> >
> > On Thu, 29 Apr 2021 at 17:18, Charlie Hull <
> ch...@opensourceconnections.com>
> > wrote:
> >
> >> What happens if you run exactly that SELECT query on your source
> database?
> >>
> >> Charlie
> >>
> >> On 29/04/2021 12:08, Anuj Bhargava wrote:
> >>> Ever since installing 8.6 a couple of months ago, the indexing was
> >> working
> >>> fine. All of a sudden getting the following error -
> >>>
> >>> 4/29/2021, 12:01:59 PM
> >>> ERROR false
> >>> DocBuilder
> >>> Exception while processing: hotels document : SolrInputDocument(fields:
> >>> []):org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable
> >>> to execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
> >>> ='PK' OR country ='BD' OR country ='AF' OR country ='NP' OR country
> ='LK'
> >>> OR country ='MV' OR country ='BT' Processing Document # 1
> >>>
> >>> 4/29/2021, 12:01:59 PM
> >>> ERROR false
> >>> DataImporter
> >>> Full Import failed:java.lang.RuntimeException:
> >> java.lang.RuntimeException:
> >>> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> to
> >>> execute query: SELECT * FROM hotels WHERE country = 'IN' OR country
> ='PK'
> >>> OR country ='BD' OR country ='AF' OR country ='NP' OR country ='LK' OR
> >>> country ='MV' OR country ='BT' Processing Document # 1
> >>>
> >>> Please help
> >>>
> >> --
> >> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> >> 
> >> Founding member of The Search Network 
> >> and co-author of Searching the Enterprise
> >> 
> >> tel/fax: +44 (0)8700 118334
> >> mobile: +44 (0)7767 825828
> >>
>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> 
> Founding member of The Search Network 
> and co-author of Searching the Enterprise
> 
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>