Re: Solr scale function

2022-06-01 Thread Vincenzo D'Amore
Hi Mikhail,

sorry for not being clear, I'll try again.
For my understanding the solr scale function, once applied to a field,
needs min and max for that field.
Those min and max values by default are calculated by all the existing
documents, I don't know exactly how this is implemented internally in Solr.
I assume that, in the worst case scenario, all the documents have to be
traversed reading all the values for the given field and then somehow
saving the min/max.
In the Solr scale function documentation is also written:
> The current implementation cannot distinguish when documents have been
deleted or documents that have no value. It uses 0.0 values for these cases.
This means that often the min value can be 0 if you have only positive
values.

But what happens if I need to scale the values of a field only within the
documents that are the result of a query? Only a few hundreds or thousands
of documents?
First of all min and max has to be calculated only on the result set of
your query.
That is what I was trying to say when I wrote "apply the scale function
only to the result set (and not to the entire collection)".

For example, if you apply the scale function to the field price in Solr
techproducts example, "min" and "max" are between 0.0 and 2199.0

http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price

So even if a filter query is added - fq=popularity:(1 OR 7) - the values
are scaled between 0.0 and 2199.0.

http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201)

{
  "responseHeader":{
"status":0,
"QTime":30,
"params":{
  "q":"*:*",
  "fl":"price,scale(price, 0, 1)",
  "fq":"popularity:(1 OR 7)",
  "rows":"100"}},
  "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
  {
"price":74.99,
"scale(price, 0, 1)":0.034101862},
  {
"price":19.95,
"scale(price, 0, 1)":0.009072306},
  {
"price":11.5,
"scale(price, 0, 1)":0.0052296496},
  {
"price":329.95,
"scale(price, 0, 1)":0.15004548},
  {
"price":479.95,
"scale(price, 0, 1)":0.2182583},
  {
"price":649.99,
"scale(price, 0, 1)":0.29558435}]
  }}

As you can see in the results of this query, prices are between 11.5 and
649.99.
What if I want to scale the prices between 11.5 and 649.99?
Or, in other words, what is the easiest way to scale all the values of a
field with the min and max of the current query results?

Right now I'm investigating what's the best way to scale the values of one
or more fields within Solr, but only within the documents that are in the
current result set.

Hope this helps to make things clearer.

Best regards,
Vincenzo




On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev  wrote:

> Vincenzo,
> Can you elaborate what it means ' apply the scale function only to the
> result set (and not to
> the entire collection).'  ?
>
> On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore 
> wrote:
>
> > Hi Mikhail,
> >
> > I'm trying to apply the scale function only to the result set (and not to
> > the entire collection).
> > And I discovered that adding "query($q)" to the scale function does the
> > trick.
> > In other words, adding "query($q)" forces solr to restrict the scale
> > function only to the result set.
> >
> > But if I add an fq to the query parameters the scale function applies
> only
> > to the q param.
> > For example:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s
> >
> > {
> >   "responseHeader":{
> > "status":0,
> > "QTime":8,
> > "params":{
> >   "q":"*:*",
> >   "fl":"price,scale(sum(price,query($q)), 0, 1)",
> >   "fq":"popularity:(1 OR 7)",
> >   "rows":"100"}},
> >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> >   {
> > "price":74.99,
> > "scale(sum(price,query($q)), 0, 1)":0.034101862},
> >   {
> > "price":19.95,
> > "scale(sum(price,query($q)), 0, 1)":0.009072306},
> >   {
> > "price":11.5,
> > "scale(sum(price,query($q)), 0, 1)":0.0052296496},
> >   {
> > "price":329.95,
> > "scale(sum(price,query($q)), 0, 1)":0.15004548},
> >   {
> > "price":479.95,
> > "scale(sum(price,query($q)), 0, 1)":0.2182583},
> >   {
> > "price":649.99,
> > "scale(sum(price,query($q)), 0, 1)":0.29558435}]
> >   }}
> >
> > I can avoid this problem by adding a new parameter query($fq) to the
> scale
> > function, but this solution is cumbersome and not maintainable.
> > For example:
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(sum(price,query($q)),query($f

concatinating fields via schema

2022-06-01 Thread Yirmiyahu Fischer
How do I create a concatenated field via schema.xml?

I am using Solr, version 8.2
In my schema, fields ending in "_s" are of string type, fields ending in
"_t" are of text type, and fields ending in "_txt" are multivalue text type.

   
   
   

I need to create a field that concatenates the fields BrandName_s and
ManufacturerNo_s

I have tried




However, when indexing, I get an error.
Error adding field 'ManufacturerNo_s'='6852806' msg=Multiple values
encountered for non multiValued copy field BrandMfgConcat_t: 6852806

I tried




However, the resulting field BrandMfgConcat_txt is an array of the 2
values, not a concatenation.

I tried



BrandName_s
ManufacturerNo_s
BrandMfgConcat_t


BrandMfgConcat_t





However, after indexing, the field BrandMfgConcat_t does not appear in the
record


*Yirmiyahu Fischer*

*Senior Developer |  **Tel**:* *03-9578900 *

*E**mail: **yfisc...@signature-it.com* * | **W*
*eb: **www.signature-it.com* 


Re: concatinating fields via schema

2022-06-01 Thread Shawn Heisey

On 6/1/2022 3:27 AM, Yirmiyahu Fischer wrote:

I tried

 
 
 BrandName_s
 ManufacturerNo_s
 BrandMfgConcat_t
 
 
 BrandMfgConcat_t
 
 
 
 

However, after indexing, the field BrandMfgConcat_t does not appear in the
record


I bet that you haven't done anything to actually tell Solr to use that 
update processor.


I put this in my config:

    
    
    s1
    s2
    s3
    
    
    s3
     
    
    
    
    

And then indexed a doc with s1 and s2 fields, and very deliberately 
excluding the s3 field.  This is what I get when querying for that doc:


| 
 144 144 name="box">144 144 name="s1">Test Doc Test 
Doc 1734426113514405888 
 The major difference between your config and mine is that I 
marked the update processor as default, which means that it will always 
be used unless another is specified. Thanks, Shawn |




Re: Solr scale function

2022-06-01 Thread Mikhail Khludnev
>From looking at
https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ScaleFloatFunction.java#L70
I conclude that min,max are obtained from all docs in the index.
But if you specify query() as an argument for scale() it takes only
matching docs for evaluating min&max. So, what I get so far you a looking
for a query which matches an intersection of $q AND $fq but yield price
field value as its score.
It seems I've got the problem definition. I'll come up with a proposal a
little bit later.

On Wed, Jun 1, 2022 at 11:33 AM Vincenzo D'Amore  wrote:

> Hi Mikhail,
>
> sorry for not being clear, I'll try again.
> For my understanding the solr scale function, once applied to a field,
> needs min and max for that field.
> Those min and max values by default are calculated by all the existing
> documents, I don't know exactly how this is implemented internally in Solr.
> I assume that, in the worst case scenario, all the documents have to be
> traversed reading all the values for the given field and then somehow
> saving the min/max.
> In the Solr scale function documentation is also written:
> > The current implementation cannot distinguish when documents have been
> deleted or documents that have no value. It uses 0.0 values for these
> cases.
> This means that often the min value can be 0 if you have only positive
> values.
>
> But what happens if I need to scale the values of a field only within the
> documents that are the result of a query? Only a few hundreds or thousands
> of documents?
> First of all min and max has to be calculated only on the result set of
> your query.
> That is what I was trying to say when I wrote "apply the scale function
> only to the result set (and not to the entire collection)".
>
> For example, if you apply the scale function to the field price in Solr
> techproducts example, "min" and "max" are between 0.0 and 2199.0
>
>
> http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price
>
> So even if a filter query is added - fq=popularity:(1 OR 7) - the values
> are scaled between 0.0 and 2199.0.
>
>
> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201)
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":30,
> "params":{
>   "q":"*:*",
>   "fl":"price,scale(price, 0, 1)",
>   "fq":"popularity:(1 OR 7)",
>   "rows":"100"}},
>   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>   {
> "price":74.99,
> "scale(price, 0, 1)":0.034101862},
>   {
> "price":19.95,
> "scale(price, 0, 1)":0.009072306},
>   {
> "price":11.5,
> "scale(price, 0, 1)":0.0052296496},
>   {
> "price":329.95,
> "scale(price, 0, 1)":0.15004548},
>   {
> "price":479.95,
> "scale(price, 0, 1)":0.2182583},
>   {
> "price":649.99,
> "scale(price, 0, 1)":0.29558435}]
>   }}
>
> As you can see in the results of this query, prices are between 11.5 and
> 649.99.
> What if I want to scale the prices between 11.5 and 649.99?
> Or, in other words, what is the easiest way to scale all the values of a
> field with the min and max of the current query results?
>
> Right now I'm investigating what's the best way to scale the values of one
> or more fields within Solr, but only within the documents that are in the
> current result set.
>
> Hope this helps to make things clearer.
>
> Best regards,
> Vincenzo
>
>
>
>
> On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev  wrote:
>
> > Vincenzo,
> > Can you elaborate what it means ' apply the scale function only to the
> > result set (and not to
> > the entire collection).'  ?
> >
> > On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore 
> > wrote:
> >
> > > Hi Mikhail,
> > >
> > > I'm trying to apply the scale function only to the result set (and not
> to
> > > the entire collection).
> > > And I discovered that adding "query($q)" to the scale function does the
> > > trick.
> > > In other words, adding "query($q)" forces solr to restrict the scale
> > > function only to the result set.
> > >
> > > But if I add an fq to the query parameters the scale function applies
> > only
> > > to the q param.
> > > For example:
> > >
> > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s
> > >
> > > {
> > >   "responseHeader":{
> > > "status":0,
> > > "QTime":8,
> > > "params":{
> > >   "q":"*:*",
> > >   "fl":"price,scale(sum(price,query($q)), 0, 1)",
> > >   "fq":"popularity:(1 OR 7)",
> > >   "rows":"100"}},
> > >   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
> > >   {
> > > "price":74.99,
> > > "scale(sum(price,query($q)), 0, 1)":0.034101862},
> > >   {
> > >   

Param Substitution in Stream Expressions?

2022-06-01 Thread James Greene
My application relies on local parameter substitution pretty heavily due to
escaping issues and being able to re-use clauses.

Is there any way to use param substitution in a /stream expression?

(This doesn't work):

POST /stream {
  'expr': 'search(collection1,q=$test,fl="doc_id",sort="doc_id
asc",qt="/export")',   'test': 'contains:blah'
}



Cheers,
JAG


Re: Create a core via SolrClient, single server

2022-06-01 Thread Christopher Schultz

Clemens,

On 5/30/22 02:02, Clemens WYSS (Helbling Technik) wrote:

Given a connection to Solr ( e.g. adminSolrConnection )
CoreAdminRequest.Create createCoreRequest = new CoreAdminRequest.Create();
createCoreRequest.setCoreName( coreName );
createCoreRequest.process( adminSolrConnection );


What is an "admin solr connection"? Is that any different than just a 
plain-old HttpSolrClient instance?


How can I provide the schema for the core once it's been created? Can I 
use the API for that, or do I have to resort to pushing the config file 
directly similar to these kindx of curl commands:


curl -d "{ ... config }" \
   ${SCHEME}://localhost:${PORT}/solr/${CORENAME}/config

curl -H application/json --data-binary '{ ... schema ... }' \
   "${SCHEME}://localhost:${PORT}/solr/${CORENAME}/schema"

The CoreAdminRequest class has a createCore() method which takes a whole 
bunch of arguments, but the javadoc doesn't say what those arguments 
are. With parameter names like "configFile" I assume it's expecting a 
configuration file /name/ and not the actual configuration; same with 
schemaFile. I'm happy to make direct calls to the REST API, but if the 
SolrJ client will do it for me, I'd prefer that.


I'm also happy to write patches for CoreAdminRequest to that end.

-chris


On 2022/05/25 21:25:09 Christopher Schultz wrote:

All,

I have a non-clustered/ZK Solr instance and I'd like to create a core
using the Java SolrClient library. Is that currently possible? I only
see methods for working with documents in the current core (selected
when the client object is initially created, based upon the URL which
contains the core name).

I'm using Solr 7.7.3 and the vanilla SolrJ client library.

Thanks,
-chris





Re: Create a core via SolrClient, single server

2022-06-01 Thread Christopher Schultz

Clemens,

On 6/1/22 13:41, Christopher Schultz wrote:

Clemens,

On 5/30/22 02:02, Clemens WYSS (Helbling Technik) wrote:

Given a connection to Solr ( e.g. adminSolrConnection )
CoreAdminRequest.Create createCoreRequest = new 
CoreAdminRequest.Create();

createCoreRequest.setCoreName( coreName );
createCoreRequest.process( adminSolrConnection );


What is an "admin solr connection"? Is that any different than just a 
plain-old HttpSolrClient instance?


I found that I needed a client that wasn't pointing to any existing 
core, which wasn't a problem.


But I do get this error when trying to create the core:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at ${SOLR_BASE_URL}/solr: Error CREATEing SolrCore 
'test_remote_create_core': Can't find resource 'solrconfig.xml' in 
classpath or '${SOLR_HOME}/server/solr/test_remote_create_core'


Here's the code I used to create the core:

CoreAdminRequest car = new CoreAdminRequest.Create();
car.setCoreName("test_remote_create_core");
CoreAdminResponse response = car.process(solr);

I'd like to be able to bootstrap a core from my application if it 
doesn't exist. I can supply whatever information is necessary, but I 
need to be able to do it without doing anything other than making API 
calls, either via SolrJ or directly via HTTP/REST.


Thanks,
-chris


Re: Param Substitution in Stream Expressions?

2022-06-01 Thread Joel Bernstein
Hi,

You'll need to set a Java system property at startup to run macro expansion
in Streaming Expressions.

See the commit below which references the jira with the security concerns:

https://github.com/apache/solr/commit/9edc557f4526ffbbf35daea06972eb2c595e692b

The parameter setting is as follows:

-DStreamingExpressionMacros=true


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jun 1, 2022 at 1:22 PM James Greene 
wrote:

> My application relies on local parameter substitution pretty heavily due to
> escaping issues and being able to re-use clauses.
>
> Is there any way to use param substitution in a /stream expression?
>
> (This doesn't work):
>
> POST /stream {
>   'expr': 'search(collection1,q=$test,fl="doc_id",sort="doc_id
> asc",qt="/export")',   'test': 'contains:blah'
> }
>
>
>
> Cheers,
> JAG
>


Re: Solr Cloud and /export

2022-06-01 Thread Joel Bernstein
There is no configuration for this but the Stream Expression export/shuffle
function does this automatically.

https://solr.apache.org/guide/8_11/stream-source-reference.html#shuffle

The "export" function name is also mapped to the "shuffle" function so you
can use either name.

This function is also basically the same functionality as calling the
"search" function with the param qt=/export.

In Solr 11 there is also the drill function:

https://solr.apache.org/guide/8_11/stream-source-reference.html#drill

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 31, 2022 at 5:32 PM James Greene 
wrote:

> What do I have to configure or add to the request to have an /export call
> respond with docs from all shards?  I currently only get documents from the
> leader when making an /export call (solr 8.11.1).
>
> Cheers,
> JAG
>


Re: Create a core via SolrClient, single server

2022-06-01 Thread Shawn Heisey

On 6/1/2022 11:41 AM, Christopher Schultz wrote:
How can I provide the schema for the core once it's been created? Can 
I use the API for that, or do I have to resort to pushing the config 
file directly similar to these kindx of curl commands:


curl -d "{ ... config }" \
   ${SCHEME}://localhost:${PORT}/solr/${CORENAME}/config

curl -H application/json --data-binary '{ ... schema ... }' \
   "${SCHEME}://localhost:${PORT}/solr/${CORENAME}/schema"


There's a chicken and egg problem.  You can't use those endpoints until 
the core is created.  And you can't create the core without providing 
the config and schema.


https://solr.apache.org/guide/8_9/coreadmin-api.html#coreadmin-create

(the large WARNING box in this section of the docs is the part I am 
referring you to)


Assuming you're not going to be using cloud mode (in which case all 
configs are in zookeeper) you have two choices:  Create the core 
directory with a conf subdirectory that contains a config and a schema 
before calling the CoreAdmin API, or use the ConfigSets feature.


https://solr.apache.org/guide/8_9/config-sets.html#configsets-in-standalone-mode

Checking SolrJ, you would use createCore("corename", "corename", client) 
if go you with the first option and name the directory "corename" in the 
solr home.  It doesn't look like CoreAdminRequest has a convenience 
method for using a configSet when creating a core.


I worked out how to do it as a generic request with SolrJ if you want to 
use the configsets feature:


https://paste.elyograg.org/view/3cd9aac2

Thanks,
Shawn



RE: Re: SOLR v7 Security Issues Caused Denial of Use - Sonatype Application Composition Report

2022-06-01 Thread Abbas Agakasiri
On 2019/01/04 18:27:42 Gus Heck wrote:
> Hi Bob,
>
> Wrt licensing keep in mind that multi licensed software allows you to
> choose which license you are using the software under. Also there's some
> good detail on the Apache policy here:
>
>
https://www.apache.org/legal/resolved.html#what-can-we-not-include-in-an-asf-project-category-x
>
> One has to be careful with license scanners, often they have very
> conservative settings. I had to spend untold hours getting jfrog's license
> plugin to select the correct license and hunting down missing licenses
when
> I finally sorted out licensing for JesterJ. (though MANY fewer hours than
> if I had done this by hand!)
>
> On Fri, Jan 4, 2019, 11:17 AM Bob Hathaway 
> > The most important feature of any software running today is that it can
be
> > run at all. Security vulnerabilities can preclude software from running
in
> > enterprise environments. Today software must be free of critical and
severe
> > security vulnerabilities or they can't be run at all from Information
> > Security policies. Enterprises today run security scan software to check
> > for security and licensing vulnerabilities because today most
organizations
> > are using open source software where this has become most relevant.
> > Forrester has a good summary on the need for software composition
analysis
> > tools which virtually all enterprises run today befor allowing software
to
> > run in production environments:
> >
> >
https://www.blackducksoftware.com/sites/default/files/images/Downloads/Reports/USA/ForresterWave-Rpt.pdf
> >
> > Solr version 6.5 passes security scans showing no critical security
> > issues.  Solr version 7 fails security scans with over a dozen critical
and
> > severe security vulnerabilities for Solr version from 7.1.  Then we ran
> > scans against the latest Solr version 7.6 which failed as well.  Most of
> > the issues are due to using old libraries including the JSON Jackson
> > framework, Dom 4j and Xerces and should be easy to bring up to date.
Only
> > the latest version of SimpleXML has severe security vulnerabilities.
Derby
> > leads the most severe security violations at Level 9.1 by using an out
of
> > date version.
> >
> > What good is software or any features if enterprises can't run them?
> > Today software cybersecurity is a top priority and risk for enterprises.
> > Solr version 6.5 is very old exposing the zookeeper backend from the
SolrJ
> > client which is a differentiating capability.
> >
> > Is security and remediation a priority for SolrJ?  I believe this
should be
> > a top feature to allow SolrJ to continue providing search features to
> > enterprises and a security roadmap and plan to keep Solr secure and
usable
> > by continually adapting and improving in the ever changing security
> > landscape and ecosystem.  The Darby vulnerability issue CVE-2015-1832
was a
> > passing medium Level 6.2  issue in CVSS 2.0 last year but is the most
> > critical issue with Solr 7.6 at Level 9.1 in this year's CVSS 3.0.
These
> > changes need to be tracked and updates and fixes incorporated into new
Solr
> > versions.
> > https://nvd.nist.gov/vuln/detail/CVE-2015-1832
> >
> > On Thu, Jan 3, 2019 at 12:19 PM Bob Hathaway  wrote:
> >
> > > Critical and Severe security vulnerabilities against Solr v7.1.  Many
of
> > > these appear to be from old open source  framework versions.
> > >
> > > *9* CVE-2017-7525 com.fasterxml.jackson.core : jackson-databind :
2.5.4
> > > Open
> > >
> > >CVE-2016-131 commons-fileupload : commons-fileupload : 1.3.2
Open
> > >
> > >CVE-2015-1832 org.apache.derby : derby : 10.9.1.0 Open
> > >
> > >CVE-2017-7525 org.codehaus.jackson : jackson-mapper-asl : 1.9.13
Open
> > >
> > >CVE-2017-7657 org.eclipse.jetty : jetty-http : 9.3.20.v20170531
Open
> > >
> > >CVE-2017-7658 org.eclipse.jetty : jetty-http : 9.3.20.v20170531
Open
> > >
> > >CVE-2017-1000190 org.simpleframework : simple-xml : 2.7.1 Open
> > >
> > > *7* sonatype-2016-0397 com.fasterxml.jackson.core : jackson-core :
2.5.4
> > > Open
> > >
> > >sonatype-2017-0355 com.fasterxml.jackson.core : jackson-core :
2.5.4
> > > Open
> > >
> > >CVE-2014-0114 commons-beanutils : commons-beanutils : 1.8.3 Open
> > >
> > >CVE-2018-1000632 dom4j : dom4j : 1.6.1 Open
> > >
> > >CVE-2018-8009 org.apache.hadoop : hadoop-common : 2.7.4 Open
> > >
> > >CVE-2017-12626 org.apache.poi : poi : 3.17-beta1 Open
> > >
> > >CVE-2017-12626 org.apache.poi : poi-scratchpad : 3.17-beta1 Open
> > >
> > >CVE-2018-1308 org.apache.solr : solr-dataimporthandler : 7.1.0 Open
> > >
> > >CVE-2016-4434 org.apache.tika : tika-core : 1.16 Open
> > >
> > >CVE-2018-11761 org.apache.tika : tika-core : 1.16 Open
> > >
> > >CVE-2016-1000338 org.bouncycastle : bcprov-jdk15 : 1.45 Open
> > >
> > >CVE-2016-1000343 org.bouncycastle : bcprov-jdk15 : 1.45 Open
> > >
> > >CVE-2018-1000180 org.bouncycastle : bcprov-jdk15 : 1.45 Open
> > >
> > >CVE-2017-7656 org

Re: Collapsing on a field works, but expand=true does nothing

2022-06-01 Thread Joel Bernstein
I just did a quick check on Solr 9 and expand / collapse was working. Here
is the output:

https://gist.github.com/joel-bernstein/6f7f3ee12d5375630f3311c5dbd693ee

Is it possible that the expand component isn't registered in your
deployment? The expand component is a default component but have you
overridden the defaults?

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 31, 2022 at 12:49 PM Andy Lester  wrote:

> I’m working on the collapse/expand functionality. I query for books, and I
> want to collapse my search results on their tf1flrid, basically a family
> ID.  I also want to do an expand so I can see what titles were collapsed
> out.  I’m looking at the docs here:
> https://solr.apache.org/guide/8_11/collapse-and-expand-results.html <
> https://solr.apache.org/guide/8_11/collapse-and-expand-results.html>
>
> Here’s a gist of my working query at
> https://gist.github.com/petdance/34259dee2944a455341748a0e2ef2092 <
> https://gist.github.com/petdance/34259dee2944a455341748a0e2ef2092>
>
> I make the query, I get back 18 titles.  There are rows in the result that
> have duplicated tf1flrid fields. This is what I expect.
>
> Then I try it again,
>
> curl "$URL" --silent --show-error \
> -X POST \
> -F "q=title_tracings_t:\"$Q\"" \
> -F 'fl=title,flrid,tf1flrid' \
> -F 'fq={!collapse field=tf1flrid nullPolicy=expand}' \
> -F 'expand=true' \
> -F 'rows=50' \
> -F 'sort=tf1flrid asc' \
> -F 'wt=json' \
> | jq -S .
>
> and here’s the results:
> https://gist.github.com/petdance/f203c7c2bf0178e0d0c1596999801ae5 <
> https://gist.github.com/petdance/f203c7c2bf0178e0d0c1596999801ae5>
>
> I get back 12 titles, and the rows with duplicated tf1flrid values that
> were duplicated in the first query (1638JX7 and 1638PQ3 ) have been
> collapsed. That’s also as I expect. However, I don’t have an “expand”
> section that shows what the collapsed fields were.  I don’t see any errors
> in my solr.log.
>
> What am I doing wrong?  What do I need to do to get the expand section to
> be returned?
>
> Thanks,
> Andy


Re: Collapsing on a field works, but expand=true does nothing

2022-06-01 Thread Andy Lester
> Is it possible that the expand component isn't registered in your
> deployment? The expand component is a default component but have you
> overridden the defaults?

Yes, that’s exactly what happened.  Turns out that I had pulled out the unused 
handlers.

Thanks,
Andy 

Re. "sow" parameter value not changing and shingle filter

2022-06-01 Thread Rashmeet Kaur Khanuja
Hello,

We use Apache Solr 8.8 and are trying to find out ways to reduce recall (and 
improve precision) for our products. While looking into various options, we 
came across the “split on whitspace” (sow) parameter, the value for which 
changes when multi-word synonyms and stopwords come into picture. The concern 
here though, is, that for query terms that have sow=false by default, we are 
able to set it to true. But for the terms where sow=true by default (reasons 
unknown), we are not able to flip it by setting the param to false. IT still 
remains a term centric search. We also tried setting the uf query in 
solrconfig.xml.
Another point  that we need clarification on is ShingleFilterFactory. How does 
minimum match apply with Shingles in the picture is what we also wanted to 
learn.
Please help us with the same.

Thanks,
Rashmeet


Re: Create a core via SolrClient, single server

2022-06-01 Thread Christopher Schultz

Shawn,

On 6/1/22 15:18, Shawn Heisey wrote:

On 6/1/2022 11:41 AM, Christopher Schultz wrote:
How can I provide the schema for the core once it's been created? Can 
I use the API for that, or do I have to resort to pushing the config 
file directly similar to these kindx of curl commands:


curl -d "{ ... config }" \
   ${SCHEME}://localhost:${PORT}/solr/${CORENAME}/config

curl -H application/json --data-binary '{ ... schema ... }' \
   "${SCHEME}://localhost:${PORT}/solr/${CORENAME}/schema"


There's a chicken and egg problem.  You can't use those endpoints until 
the core is created.  And you can't create the core without providing 
the config and schema.


https://solr.apache.org/guide/8_9/coreadmin-api.html#coreadmin-create

(the large WARNING box in this section of the docs is the part I am 
referring you to)


I'll have a look.

Assuming you're not going to be using cloud mode (in which case all 
configs are in zookeeper) you have two choices:  Create the core 
directory with a conf subdirectory that contains a config and a schema 
before calling the CoreAdmin API, or use the ConfigSets feature.


https://solr.apache.org/guide/8_9/config-sets.html#configsets-in-standalone-mode 


When I used the CoreAdminRequest as suggested by Clemens, the Solr 
server did create the core directory, but it's empty and I got that error.


Would it not be theoretically possible to accept the core-name and 
config and schema all at once and provision the whole thing? This seems 
like a big missing feature after 9 major versions. Maybe Solr standalone 
is only for children :)


My notes for creating the core include:

   $ sudo -u solr ${SOLR_HOME}/bin/solr create -c [corename]

I've never tried to do that from a remote server, but I don't specify a 
config file or a schema in that command, and it works. Of course, I get 
this warning:


WARNING: Using _default configset with data driven schema functionality. 
NOT RECOMMENDED for production use.


My next steps are to provide a small config and a schema using two 
separate curl commands, which obviously only communicate over the REST API.


What magic is "solr create" performing that I can't use via the API? OR 
can I simulate it in some way?


Checking SolrJ, you would use createCore("corename", "corename", client) 
if go you with the first option and name the directory "corename" in the 
solr home.  It doesn't look like CoreAdminRequest has a convenience 
method for using a configSet when creating a core.


I worked out how to do it as a generic request with SolrJ if you want to 
use the configsets feature:


https://paste.elyograg.org/view/3cd9aac2


Cool, though this is basically using SolrClient as an HttpClient ;)


Would this work with configSet=_default ? It would be great if I didn't 
have to prepare a Solr installation other than making sure it's running 
before my application is able to create cores.


Thanks,
-chris


Re: Re: DocValues usage

2022-06-01 Thread Mikhail Khludnev
I don't think so. I'd rather bark on distribute search reduce phase, if you
use one. I don't know.

On Wed, Jun 1, 2022 at 6:58 AM Poorna Murali  wrote:

> Thanks Mikhail! Regarding the id field, is it possible that while doing
> faceting or sorting, the id fields will be loaded  in  field cache by
> default?
> On 2022/05/31 21:27:57 Mikhail Khludnev wrote:
> > Hi, Poorna.
> > Pls check inline below.
> >
> > On Tue, May 31, 2022 at 4:21 PM Poorna Murali 
> > wrote:
> >
> > > Hi,
> > >
> > > We are planning to introduce docValues and reduce the usage of
> fieldcache
> > > in our application.
> > >
> > > I have a few doubts on these. Please help me to clarify them.
> > >
> > > 1) Will there be a performance impact if we use docValues as it does
> I/O
> > > read? Especially at the time of index generation?
> > >
> > Sure, it needs to write more files during indexing. Then, query
> > performance depends on particular IO. It definitely make sense to limit
> jvm
> > heap leaving some RAM free for memory mapping index files.
> >
> >
> > >
> > > 2) Are there any list of recommended fields/ category of fields that
> are
> > > suitable for docValues?
> > >
> > Short enums and numbers. Texts (in terms of analysis) are not supported
> > anyway.
> >
> > >
> > > 3) There is a field named "id" which is used as unique key in solr.
> > > Although it's not used for faceting or sorting, it still comes up in
> > > fieldcache and occupies lot of memory. Please help me understand how
> this
> > > id field came in field cache.
> > >
> > Some internal routines can trigger its loading. I'm afraid only debugger
> > can give a definite answer.  Maybe declaring it as uninvertible=false
> can
> > find a code path, but I'm not sure.
> >
> > >
> > > 4) I understand that by looking to field cache in solr admin page, we
> will
> > > get the  current entries in it. But, is it possible to get the whole
> list
> > > of attributes that can go into field cache?
> > >
> > I don't think it's possible, since field cache is lazy/runtime. ie one
> can
> > trigger new field caching by requesting a facet over it.
> >
> > >
> > > Thanks in advance,
> > > Poorna
> > >
> > Welcome!
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Create a core via SolrClient, single server

2022-06-01 Thread Christopher Schultz

Shawn,

On 6/1/22 16:34, Christopher Schultz wrote:

Shawn,

On 6/1/22 15:18, Shawn Heisey wrote:

On 6/1/2022 11:41 AM, Christopher Schultz wrote:
How can I provide the schema for the core once it's been created? Can 
I use the API for that, or do I have to resort to pushing the config 
file directly similar to these kindx of curl commands:


curl -d "{ ... config }" \
   ${SCHEME}://localhost:${PORT}/solr/${CORENAME}/config

curl -H application/json --data-binary '{ ... schema ... }' \
   "${SCHEME}://localhost:${PORT}/solr/${CORENAME}/schema"


There's a chicken and egg problem.  You can't use those endpoints 
until the core is created.  And you can't create the core without 
providing the config and schema.


https://solr.apache.org/guide/8_9/coreadmin-api.html#coreadmin-create

(the large WARNING box in this section of the docs is the part I am 
referring you to)


I'll have a look.

Assuming you're not going to be using cloud mode (in which case all 
configs are in zookeeper) you have two choices:  Create the core 
directory with a conf subdirectory that contains a config and a schema 
before calling the CoreAdmin API, or use the ConfigSets feature.


https://solr.apache.org/guide/8_9/config-sets.html#configsets-in-standalone-mode 



When I used the CoreAdminRequest as suggested by Clemens, the Solr 
server did create the core directory, but it's empty and I got that error.


Would it not be theoretically possible to accept the core-name and 
config and schema all at once and provision the whole thing? This seems 
like a big missing feature after 9 major versions. Maybe Solr standalone 
is only for children :)


My notes for creating the core include:

    $ sudo -u solr ${SOLR_HOME}/bin/solr create -c [corename]

I've never tried to do that from a remote server, but I don't specify a 
config file or a schema in that command, and it works. Of course, I get 
this warning:


WARNING: Using _default configset with data driven schema functionality. 
NOT RECOMMENDED for production use.


My next steps are to provide a small config and a schema using two 
separate curl commands, which obviously only communicate over the REST API.


What magic is "solr create" performing that I can't use via the API? OR 
can I simulate it in some way?


Checking SolrJ, you would use createCore("corename", "corename", 
client) if go you with the first option and name the directory 
"corename" in the solr home.  It doesn't look like CoreAdminRequest 
has a convenience method for using a configSet when creating a core.


I worked out how to do it as a generic request with SolrJ if you want 
to use the configsets feature:


https://paste.elyograg.org/view/3cd9aac2


Cool, though this is basically using SolrClient as an HttpClient ;)


Would this work with configSet=_default ? It would be great if I didn't 
have to prepare a Solr installation other than making sure it's running 
before my application is able to create cores.


So I tried this with configSet=_default and I /did/ get a core created. 
I didn't get the same thing I got from the CLI:


This is what I get from "solr create -c test_core":

$ find ${SOLR_HOME}/server/solr/test_core

test_core
test_core/core.properties
test_core/data
test_core/data/tlog
test_core/data/snapshot_metadata
test_core/data/index
test_core/data/index/segments_1
test_core/data/index/write.lock
test_core/conf
test_core/conf/managed-schema
test_core/conf/params.json
test_core/conf/lang
test_core/conf/lang/stopwords_gl.txt
test_core/conf/lang/stopwords_es.txt
test_core/conf/lang/stopwords_fi.txt
test_core/conf/lang/stopwords_da.txt
test_core/conf/lang/stopwords_hu.txt
test_core/conf/lang/stopwords_id.txt
test_core/conf/lang/hyphenations_ga.txt
test_core/conf/lang/contractions_it.txt
test_core/conf/lang/stopwords_ro.txt
test_core/conf/lang/stopwords_eu.txt
test_core/conf/lang/stopwords_pt.txt
test_core/conf/lang/stopwords_de.txt
test_core/conf/lang/stoptags_ja.txt
test_core/conf/lang/stopwords_it.txt
test_core/conf/lang/contractions_ca.txt
test_core/conf/lang/stopwords_ca.txt
test_core/conf/lang/stopwords_th.txt
test_core/conf/lang/stopwords_bg.txt
test_core/conf/lang/stopwords_lv.txt
test_core/conf/lang/userdict_ja.txt
test_core/conf/lang/stopwords_cz.txt
test_core/conf/lang/stopwords_ar.txt
test_core/conf/lang/stopwords_tr.txt
test_core/conf/lang/stemdict_nl.txt
test_core/conf/lang/stopwords_no.txt
test_core/conf/lang/stopwords_nl.txt
test_core/conf/lang/stopwords_fa.txt
test_core/conf/lang/stopwords_sv.txt
test_core/conf/lang/stopwords_el.txt
test_core/conf/lang/stopwords_ja.txt
test_core/conf/lang/stopwords_hi.txt
test_core/conf/lang/stopwords_en.txt
test_core/conf/lang/contractions_ga.txt
test_core/conf/lang/contractions_fr.txt
test_core/conf/lang/stopwords_ru.txt
test_core/conf/lang/stopwords_ga.txt
test_core/conf/lang/stopwords_fr.txt
test_core/conf/lang/stopwords_hy.txt
test_core/conf/protwords.txt
test_core/conf/synonyms.txt
test_core/conf/solrconfig.xml
test_core/conf/stopwords.txt

If I

Re: Solr scale function

2022-06-01 Thread Mikhail Khludnev
Ok. what is you try something like
q=*:*&fq=popularity:(1 OR
7)&rows=100&fl=price,scale(query($scopeq),0,1)&scopeq={!filters
param=$fq}{!func}price
It passes price field values to scale function, limiting the scope of
min,max calculation by fq.

On Wed, Jun 1, 2022 at 4:11 PM Mikhail Khludnev  wrote:

> From looking at
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ScaleFloatFunction.java#L70
> I conclude that min,max are obtained from all docs in the index.
> But if you specify query() as an argument for scale() it takes only
> matching docs for evaluating min&max. So, what I get so far you a looking
> for a query which matches an intersection of $q AND $fq but yield price
> field value as its score.
> It seems I've got the problem definition. I'll come up with a proposal a
> little bit later.
>
> On Wed, Jun 1, 2022 at 11:33 AM Vincenzo D'Amore 
> wrote:
>
>> Hi Mikhail,
>>
>> sorry for not being clear, I'll try again.
>> For my understanding the solr scale function, once applied to a field,
>> needs min and max for that field.
>> Those min and max values by default are calculated by all the existing
>> documents, I don't know exactly how this is implemented internally in
>> Solr.
>> I assume that, in the worst case scenario, all the documents have to be
>> traversed reading all the values for the given field and then somehow
>> saving the min/max.
>> In the Solr scale function documentation is also written:
>> > The current implementation cannot distinguish when documents have been
>> deleted or documents that have no value. It uses 0.0 values for these
>> cases.
>> This means that often the min value can be 0 if you have only positive
>> values.
>>
>> But what happens if I need to scale the values of a field only within the
>> documents that are the result of a query? Only a few hundreds or thousands
>> of documents?
>> First of all min and max has to be calculated only on the result set of
>> your query.
>> That is what I was trying to say when I wrote "apply the scale function
>> only to the result set (and not to the entire collection)".
>>
>> For example, if you apply the scale function to the field price in Solr
>> techproducts example, "min" and "max" are between 0.0 and 2199.0
>>
>>
>> http://localhost:8983/solr/techproducts/select?q=*:*&rows=0&stats=true&stats.field=price
>>
>> So even if a filter query is added - fq=popularity:(1 OR 7) - the values
>> are scaled between 0.0 and 2199.0.
>>
>>
>> http://localhost:8983/solr/techproducts/select?q=*:*&fq=popularity:(1%20OR%207)&rows=100&fl=price,scale(price,%200,%201)
>>
>> {
>>   "responseHeader":{
>> "status":0,
>> "QTime":30,
>> "params":{
>>   "q":"*:*",
>>   "fl":"price,scale(price, 0, 1)",
>>   "fq":"popularity:(1 OR 7)",
>>   "rows":"100"}},
>>   "response":{"numFound":6,"start":0,"numFoundExact":true,"docs":[
>>   {
>> "price":74.99,
>> "scale(price, 0, 1)":0.034101862},
>>   {
>> "price":19.95,
>> "scale(price, 0, 1)":0.009072306},
>>   {
>> "price":11.5,
>> "scale(price, 0, 1)":0.0052296496},
>>   {
>> "price":329.95,
>> "scale(price, 0, 1)":0.15004548},
>>   {
>> "price":479.95,
>> "scale(price, 0, 1)":0.2182583},
>>   {
>> "price":649.99,
>> "scale(price, 0, 1)":0.29558435}]
>>   }}
>>
>> As you can see in the results of this query, prices are between 11.5 and
>> 649.99.
>> What if I want to scale the prices between 11.5 and 649.99?
>> Or, in other words, what is the easiest way to scale all the values of a
>> field with the min and max of the current query results?
>>
>> Right now I'm investigating what's the best way to scale the values of one
>> or more fields within Solr, but only within the documents that are in the
>> current result set.
>>
>> Hope this helps to make things clearer.
>>
>> Best regards,
>> Vincenzo
>>
>>
>>
>>
>> On Tue, May 31, 2022 at 9:27 PM Mikhail Khludnev  wrote:
>>
>> > Vincenzo,
>> > Can you elaborate what it means ' apply the scale function only to the
>> > result set (and not to
>> > the entire collection).'  ?
>> >
>> > On Tue, May 31, 2022 at 4:33 PM Vincenzo D'Amore 
>> > wrote:
>> >
>> > > Hi Mikhail,
>> > >
>> > > I'm trying to apply the scale function only to the result set (and
>> not to
>> > > the entire collection).
>> > > And I discovered that adding "query($q)" to the scale function does
>> the
>> > > trick.
>> > > In other words, adding "query($q)" forces solr to restrict the scale
>> > > function only to the result set.
>> > >
>> > > But if I add an fq to the query parameters the scale function applies
>> > only
>> > > to the q param.
>> > > For example:
>> > >
>> > >
>> > >
>> >
>> http://localhost:8983/solr/techproducts/select?q=manu_id_s:(corsair%20belkin%20canon%20viewsonic)&fq=price:[0%20TO%20200]&rows=100&fl=price,scale(sum(price,query($q)),%200,%201),manu_id_s
>> > >
>> > 

Using / searching fields with "structure"

2022-06-01 Thread Christopher Schultz

All,

Since Solr / Lucene can't define arbitrary fields in documents, I wonder 
what the recommended technique is for storing structured-information in 
a document?


I'd like to store information about an entity and is specific to related 
entities (not stored in the index). This isn't my actual use-case, but 
let's take an example of a user who has different privileges in 
different locations.


If Solr / Lucene allowed me to do so, I might model it like this:

users : [
  {
"username": "chris",
"locations" : [ "denver", "chicago", "washington" ],
"location_denver_role" : "admin",
"location_chicago_role" : "staff",
"location_washington_role" : "basic"
  },
  { ... }
]

Since I can't have a field called "location_denver_role", 
"location_chicago_role", etc. (well, I can, but I have a huge number of 
locations to deal with and defining a field for each seems stupid), I 
was thinking of maybe something like this:


users : [
  {
"username": "chris",
"locations" : [ "denver", "chicago", "washington" ],
"location_roles : [ { "denver" : "admin", "chicago" : "staff", 
"washington" : "basic" ]

  },
  { ... }
]

So now I have a single field called "location_roles" but it's got 
"structure" inside of it. I could obviously search the other fields 
directly in Solr and then filter-out the records I want manually, but 
what's the best way to structure the index so that I can tell Solr I 
only care about users who are admins in denver?


Lest you think I can invert the index and use:

 "admin" : [ "denver" ],
 "staff" : [ "chicago" ],
 "basic" : [ "washington" ]

... I can't. The "role" is just a proxy for a bunch of user metadata 
that may need to grow over time, including a large range of possible 
values, so I can't just invert the index.


Thanks,
-chris


Re: Using / searching fields with "structure"

2022-06-01 Thread Walter Underwood
We implemented something like location_denver_role for matching tutors to 
subjects. There were a few thousand subjects and three kinds of scores, so each 
tutor record had about 20,000 fields. Ranking fetched the three fields for that 
subject ID to do the ranking. It wasn’t a big index, under 200k documents, but 
response times were well under 100 ms, as I remember.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 1, 2022, at 3:00 PM, Christopher Schultz 
>  wrote:
> 
> All,
> 
> Since Solr / Lucene can't define arbitrary fields in documents, I wonder what 
> the recommended technique is for storing structured-information in a document?
> 
> I'd like to store information about an entity and is specific to related 
> entities (not stored in the index). This isn't my actual use-case, but let's 
> take an example of a user who has different privileges in different locations.
> 
> If Solr / Lucene allowed me to do so, I might model it like this:
> 
> users : [
>  {
>"username": "chris",
>"locations" : [ "denver", "chicago", "washington" ],
>"location_denver_role" : "admin",
>"location_chicago_role" : "staff",
>"location_washington_role" : "basic"
>  },
>  { ... }
> ]
> 
> Since I can't have a field called "location_denver_role", 
> "location_chicago_role", etc. (well, I can, but I have a huge number of 
> locations to deal with and defining a field for each seems stupid), I was 
> thinking of maybe something like this:
> 
> users : [
>  {
>"username": "chris",
>"locations" : [ "denver", "chicago", "washington" ],
>"location_roles : [ { "denver" : "admin", "chicago" : "staff", 
> "washington" : "basic" ]
>  },
>  { ... }
> ]
> 
> So now I have a single field called "location_roles" but it's got "structure" 
> inside of it. I could obviously search the other fields directly in Solr and 
> then filter-out the records I want manually, but what's the best way to 
> structure the index so that I can tell Solr I only care about users who are 
> admins in denver?
> 
> Lest you think I can invert the index and use:
> 
> "admin" : [ "denver" ],
> "staff" : [ "chicago" ],
> "basic" : [ "washington" ]
> 
> ... I can't. The "role" is just a proxy for a bunch of user metadata that may 
> need to grow over time, including a large range of possible values, so I 
> can't just invert the index.
> 
> Thanks,
> -chris



Re: Create a core via SolrClient, single server

2022-06-01 Thread Shawn Heisey

On 6/1/2022 3:34 PM, Christopher Schultz wrote:
So I tried this with configSet=_default and I /did/ get a core 
created. I didn't get the same thing I got from the CLI:


This is what I get from "solr create -c test_core":


Using bin/solr to create a core does it in multiple steps.  It creates 
the core directory, and COPIES the named configset's conf directory 
(using _default if you don't specify one) to the core directory.  Then 
it calls /solr/admin/cores with "name" and "instanceDir" set to the name 
you gave it, which finds the just-copied config, creates the 
core.properties file, and adds the core to the running config.



If I use SolrClient as in your example with configSet=_default, I get:

test_core2
test_core2/core.properties
test_core2/data
test_core2/data/tlog
test_core2/data/snapshot_metadata
test_core2/data/index
test_core2/data/index/segments_1
test_core2/data/index/write.lock


The end result is the same ... except in the second case, it references 
the configset by name, which will be in the created core.properties 
file.  If you were to change the config in the configset directory and 
then reload each core, test_core would not see the changes, but 
test_core2 would.  That's because test_core has a complete copy of the 
config that is separate from the configset, and test_core2 is 
referencing the configset.


Thanks,
Shawn



Re: Create a core via SolrClient, single server

2022-06-01 Thread Shawn Heisey

On 6/1/2022 6:31 PM, Shawn Heisey wrote:


The end result is the same ... except in the second case, it 
references the configset by name, which will be in the created 
core.properties file.  If you were to change the config in the 
configset directory and then reload each core, test_core would not see 
the changes, but test_core2 would.  That's because test_core has a 
complete copy of the config that is separate from the configset, and 
test_core2 is referencing the configset. 


The configSet feature brings the shared config model from SolrCloud to 
standalone mode, with one difference -- with the configs in zookeeper, 
SolrCloud can share the same config between multiple nodes, but in 
standalone mode, the shared config is local to one Solr node.


Thanks,
Shawn



Re: Solr Cloud and /export

2022-06-01 Thread James Greene
Thanks!!!

I found the stream function earlier today and was able to get my large data
export process refactored from using the /export endpoint in a non-cloud
setup to using /stream in our new distributed setup.


On Wed, Jun 1, 2022, 3:04 PM Joel Bernstein  wrote:

> There is no configuration for this but the Stream Expression export/shuffle
> function does this automatically.
>
> https://solr.apache.org/guide/8_11/stream-source-reference.html#shuffle
>
> The "export" function name is also mapped to the "shuffle" function so you
> can use either name.
>
> This function is also basically the same functionality as calling the
> "search" function with the param qt=/export.
>
> In Solr 11 there is also the drill function:
>
> https://solr.apache.org/guide/8_11/stream-source-reference.html#drill
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, May 31, 2022 at 5:32 PM James Greene 
> wrote:
>
> > What do I have to configure or add to the request to have an /export call
> > respond with docs from all shards?  I currently only get documents from
> the
> > leader when making an /export call (solr 8.11.1).
> >
> > Cheers,
> > JAG
> >
>


Re: Param Substitution in Stream Expressions?

2022-06-01 Thread James Greene
Joel, I owe you a beer!

On Wed, Jun 1, 2022, 2:55 PM Joel Bernstein  wrote:

> Hi,
>
> You'll need to set a Java system property at startup to run macro expansion
> in Streaming Expressions.
>
> See the commit below which references the jira with the security concerns:
>
>
> https://github.com/apache/solr/commit/9edc557f4526ffbbf35daea06972eb2c595e692b
>
> The parameter setting is as follows:
>
> -DStreamingExpressionMacros=true
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jun 1, 2022 at 1:22 PM James Greene 
> wrote:
>
> > My application relies on local parameter substitution pretty heavily due
> to
> > escaping issues and being able to re-use clauses.
> >
> > Is there any way to use param substitution in a /stream expression?
> >
> > (This doesn't work):
> >
> > POST /stream {
> >   'expr': 'search(collection1,q=$test,fl="doc_id",sort="doc_id
> > asc",qt="/export")',   'test': 'contains:blah'
> > }
> >
> >
> >
> > Cheers,
> > JAG
> >
>


Re:Re: Upgrade SOLR 7.3 to 8.9,json.facet query performance drops a lot

2022-06-01 Thread YangLiu






Thank you for your reply, we found the root cause. 


It's our own fault.





At 2022-06-01 00:36:57, "Michael Gibney"  wrote:
>`json.facet` covers a lot of ground and can do a lot of different
>things under the hood. Would you be able to share more specific
>information about the kinds of `json.facet` request you're making,
>configuration of the fields in question, etc.?
>
>On Tue, May 31, 2022 at 11:54 AM slly  wrote:
>>
>> Hello everyone.
>>
>>
>> We recently upgraded our online environment to a new solr version(8.9.0) 
>> from 7.3, we found that facet query performance decreased a lot.
>>
>>
>> Solr 7.3 take about 100 milliseconds, but Solr 8.9 may take about 500 
>> milliseconds. We exclude the effects of GC and caching.
>>
>>
>> Are there any changes in Solr/Lucene that affect query performance?
>> |
>> | |
>> | |
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


Are there any significant performance differences between json.facet and stats?

2022-06-01 Thread slly
Hello everyone. 


We want to Implement a statistical analysis requirement.
There are two ways to meet my requirement after viewing the Solr Ref Guide(8.11)


Schema is defined as follows:
  timestamp, indexd=true, docValues=true
  status, indexed=true, docValues=true
  user_count, indexed=true, docValues=true
  way, indexed=true, docValues=true


1. use json.facet



q=timestamp:[165133440 TO 
165401280]&json.facet={agg:{status:terms,field:way,+facet:{paid_amount:"sum(if(eq(status,+30),user_count,0))",paid_count:"sum(if(eq(status,+30),1,0))",deposit_amount:"sum(if(eq(status,+13),
 
user_count,0))",deposit_count:"sum(if(eq(status,+13),1,0))",refunded_amount:"sum(if(eq(status,+11),
 
user_count,0))",refunded_count:"sum(if(eq(status,+11),1,0))",deposit_canceled_amount:"sum(if(eq(status,+14),user_count,0))",deposit_canceled_count:"sum(if(eq(status,+14),1,0))",canceled_amount:"sum(if(eq(status,+10),user_count,0))",canceled_count:"sum(if(eq(status,+10),1,0))",charge_amount:"sum(if(eq(status,+15),original_amount,0))",charge_count:"sum(if(eq(status,+15),1,0))",charge_refund_amount:"sum(if(eq(status,+21),user_count,0))",charge_refund_count:"sum(if(eq(status,+21),1,0))",order_take_amount:"sum(if(eq(status,+16),original_amount,0))",order_take_count:"sum(if(eq(status,+16),1,0))",order_take_refund_amount:"sum(if(eq(status,+22),user_count,0))",order_take_refund_count:"sum(if(eq(type,+22),1,0))",store_pay_amount:"sum(if(eq(type,+17),user_count,0))",store_pay_count:"sum(if(eq(status,+17),1,0))",store_refund_amount:"sum(if(eq(status,+20),user_count,0))",store_refund_count:"sum(if(eq(status,+20),1,0))",store_in_amount:"sum(if(eq(status,+18),user_count,0))",store_in_count:"sum(if(eq(type,+18),1,0))",store_in_refund_amount:"sum(if(eq(type,+19),user_count,0))",store_in_refund_count:"sum(if(eq(status,+19),1,0))"}}}&rows=0



2. use stats



q=timestamp:[165133440 TO 
165401280]&stats=true&stats.facet=status&stats.field={!sum=true 
count=true}user_count&rows=0




I wonder if there is any performance difference between the two.




Thanks.