Can I insert query result into another collection of the same Solr?

2022-03-23 Thread WU, Zhiqing
Hello,
I did a query based on one collection and want to insert its result into
another collection of the same Solr. The query result has the same fields
as that of another collection. Is there a simple way to do the job?
If the query result has to be moved outside Solr before being added to
another collection of the same Solr, it would not be very efficient when
the query result is very large.
Relevant information would be welcome.
Kind regards,
Zhiqing


Re: Can I insert query result into another collection of the same Solr?

2022-03-23 Thread Susmit
Hi, 
you can look at the updatestream from streaming api of solr, it can take a 
search expression and emitted tuples can be added to a new collection.

https://solr.apache.org/guide/8_4/stream-decorator-reference.html

Sent from my iPhone

> On Mar 23, 2022, at 4:06 AM, WU, Zhiqing  wrote:
> 
> Hello,
> I did a query based on one collection and want to insert its result into
> another collection of the same Solr. The query result has the same fields
> as that of another collection. Is there a simple way to do the job?
> If the query result has to be moved outside Solr before being added to
> another collection of the same Solr, it would not be very efficient when
> the query result is very large.
> Relevant information would be welcome.
> Kind regards,
> Zhiqing


RE: Using Schema API breaks the Upload of Config Set file

2022-03-23 Thread Steffen Moldenhauer
> > > This does sound like a bug ;-(.
> 
> So I will create a bug report for this.

Tracked it in JIRA SOLR-16110 Using Schema/Config API breaks the File-Upload of 
Config Set File
https://issues.apache.org/jira/browse/SOLR-16110



Representative filtering of very large result sets

2022-03-23 Thread Jeremy Buckley - IQ-C
We are using the collapse query parser for consolidating results based on a
field value, and are also faceting on a number of other fields.  The
collapse field and the facet fields all have docValues=true. For very large
(millions of documents) result sets, the heap usage gets a little out of
hand, and the resulting GC is problematic.  I am trying to figure out how
to reduce the number of documents that are being faceted over, and still
display facets that are "representative" of the entire result set.

Some sort of filter query seems to be the obvious answer, but what? I don't
want to accidentally exclude my most relevant results.

How can I facet over only the top N results?

Thanks for any tips.

-- 
Jeremy Buckley


Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Heller, George A III CTR (USA)
I have seen the emails about Solr not being affected by the DoS vulnerability 
associated by LOG4J 2.16, but SOLR failed a security scan because of it and the 
bosses want it upgraded.

 

Can someone tell me where I can download an upgrade or patch for LOG4J and 
instructions on how to implement it?

 

Thanks,

George

 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Andy Lester


> On Mar 23, 2022, at 1:36 PM, Heller, George A III CTR (USA) 
>  wrote:
> 
> Can someone tell me where I can download an upgrade or patch for LOG4J and 
> instructions on how to implement it?
> 


See https://solr.apache.org/security.html

RE: [URL Verdict: Neutral][Non-DoD Source] Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Heller, George A III CTR (USA)
Whatever you sent got removed by our email filters. Can you please resend as 
text.

Thanks,
George

-Original Message-
From: Andy Lester  
Sent: Wednesday, March 23, 2022 2:55 PM
To: users@solr.apache.org
Subject: [URL Verdict: Neutral][Non-DoD Source] Re: Solr 8.11.1 upgrading LOG4J 
from 2.16 to 2.17

All active links contained in this email were disabled.  Please verify the 
identity of the sender, and confirm the authenticity of all links contained 
within the message prior to copying and pasting the address to a Web browser.





> On Mar 23, 2022, at 1:36 PM, Heller, George A III CTR (USA) 
>  wrote:
> 
> Can someone tell me where I can download an upgrade or patch for LOG4J and 
> instructions on how to implement it?
> 


See Caution-https://solr.apache.org/security.html


smime.p7s
Description: S/MIME cryptographic signature


Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Andy Lester
Go to the https://solr.apache.org/security.html URL and you will find 
instructions there on what to do.

Andy

Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Ishan Chattopadhyaya
Here's the issue where Log4J was upgraded. You can look at the pull request
there to find out what you need to change. After that, you can build your
own Solr binaries for your use (fix in github.com/apache/lucene-solr's
branch_8_11 and build using "ant ivy-bootstrap; cd solr; ant package" which
will generate a .tgz file).
https://issues.apache.org/jira/browse/SOLR-15843

On Thu, Mar 24, 2022 at 12:42 AM Andy Lester  wrote:

> Go to the https://solr.apache.org/security.html URL and you will find
> instructions there on what to do.
>
> Andy


Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Ishan Chattopadhyaya
And feel free to open a new JIRA for this log4j upgrade, it will get picked
up in 8.11.2 (whenever someone gets time to release it).

On Thu, Mar 24, 2022 at 3:18 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Here's the issue where Log4J was upgraded. You can look at the pull
> request there to find out what you need to change. After that, you can
> build your own Solr binaries for your use (fix in
> github.com/apache/lucene-solr's branch_8_11 and build using "ant
> ivy-bootstrap; cd solr; ant package" which will generate a .tgz file).
> https://issues.apache.org/jira/browse/SOLR-15843
>
> On Thu, Mar 24, 2022 at 12:42 AM Andy Lester  wrote:
>
>> Go to the https://solr.apache.org/security.html URL and you will find
>> instructions there on what to do.
>>
>> Andy
>
>


Re: Can I insert query result into another collection of the same Solr?

2022-03-23 Thread WU, Zhiqing
Hi Susmit,
Thanks for your reply.
Since I do not have much experience with the streaming api of solr, I only
can understand a part of the page and do not know how to implement related
parts with SolrJ.
Is it possible you could recommend some books or webpages which contain
examples for streaming api?
Looking forward to your reply.
Kind regards,
Zhiqing

On Wed, 23 Mar 2022 at 14:34, Susmit  wrote:

> Hi,
> you can look at the updatestream from streaming api of solr, it can take a
> search expression and emitted tuples can be added to a new collection.
>
> https://solr.apache.org/guide/8_4/stream-decorator-reference.html
>
> Sent from my iPhone
>
> > On Mar 23, 2022, at 4:06 AM, WU, Zhiqing  wrote:
> >
> > Hello,
> > I did a query based on one collection and want to insert its result into
> > another collection of the same Solr. The query result has the same fields
> > as that of another collection. Is there a simple way to do the job?
> > If the query result has to be moved outside Solr before being added to
> > another collection of the same Solr, it would not be very efficient when
> > the query result is very large.
> > Relevant information would be welcome.
> > Kind regards,
> > Zhiqing
>


Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Houston Putman
Please do not create another JIRA, it is already committed, just waiting on
the 8.11.2 release.

https://issues.apache.org/jira/browse/SOLR-15871

The suggestion across multiple threads in the users list has been to remove
the log4j jar, and replace it with the 2.17.1 jar, which will pass security
checks.

On Wed, Mar 23, 2022 at 5:53 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> And feel free to open a new JIRA for this log4j upgrade, it will get picked
> up in 8.11.2 (whenever someone gets time to release it).
>
> On Thu, Mar 24, 2022 at 3:18 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
> > Here's the issue where Log4J was upgraded. You can look at the pull
> > request there to find out what you need to change. After that, you can
> > build your own Solr binaries for your use (fix in
> > github.com/apache/lucene-solr's branch_8_11 and build using "ant
> > ivy-bootstrap; cd solr; ant package" which will generate a .tgz file).
> > https://issues.apache.org/jira/browse/SOLR-15843
> >
> > On Thu, Mar 24, 2022 at 12:42 AM Andy Lester  wrote:
> >
> >> Go to the https://solr.apache.org/security.html URL and you will find
> >> instructions there on what to do.
> >>
> >> Andy
> >
> >
>


Re: Representative filtering of very large result sets

2022-03-23 Thread Joel Bernstein
It sounds like you are collapsing on a high cardinality field and/or
faceting on high cardinality fields. Can you describe the cardinality of
the fields so we can get an idea of how large the problem is?



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Mar 23, 2022 at 12:30 PM Jeremy Buckley - IQ-C
 wrote:

> We are using the collapse query parser for consolidating results based on a
> field value, and are also faceting on a number of other fields.  The
> collapse field and the facet fields all have docValues=true. For very large
> (millions of documents) result sets, the heap usage gets a little out of
> hand, and the resulting GC is problematic.  I am trying to figure out how
> to reduce the number of documents that are being faceted over, and still
> display facets that are "representative" of the entire result set.
>
> Some sort of filter query seems to be the obvious answer, but what? I don't
> want to accidentally exclude my most relevant results.
>
> How can I facet over only the top N results?
>
> Thanks for any tips.
>
> --
> Jeremy Buckley
>


Re: Representative filtering of very large result sets

2022-03-23 Thread Jeremy Buckley - IQ-C
The number of documents in the collection is about 90 million. The
collapse field has about 30 million distinct values, so I guess that
qualifies as high cardinality.  We used to use result grouping but switched
to collapse for improved performance.

The faceting fields are more of a mix, 5-10 fields ranging from around a
dozen to around 250,000 distinct values.

On Wed, Mar 23, 2022 at 8:30 PM Joel Bernstein  wrote:

> It sounds like you are collapsing on a high cardinality field and/or
> faceting on high cardinality fields. Can you describe the cardinality of
> the fields so we can get an idea of how large the problem is?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>


Re: Representative filtering of very large result sets

2022-03-23 Thread Joel Bernstein
To collapse on 30 million distinct values is going to cause memory problems
for sure. If the heap is growing as the result set grows that means you are
likely using a newer version of Solr which collapses into a hashmap. Older
versions of Solr would collapse into an array 30 million in length which
probably would have blown up memory with even small result sets.

I think you're going to need to shard to get this to perform well. With
SolrCloud you can shard on the collapse key (
https://solr.apache.org/guide/8_7/shards-and-indexing-data-in-solrcloud.html#document-routing).
This will send all documents with the same collapse key to the same shard.
Then run the collapse query on the sharded collection.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Mar 23, 2022 at 9:42 PM Jeremy Buckley - IQ-C
 wrote:

> The number of documents in the collection is about 90 million. The
> collapse field has about 30 million distinct values, so I guess that
> qualifies as high cardinality.  We used to use result grouping but switched
> to collapse for improved performance.
>
> The faceting fields are more of a mix, 5-10 fields ranging from around a
> dozen to around 250,000 distinct values.
>
> On Wed, Mar 23, 2022 at 8:30 PM Joel Bernstein  wrote:
>
> > It sounds like you are collapsing on a high cardinality field and/or
> > faceting on high cardinality fields. Can you describe the cardinality of
> > the fields so we can get an idea of how large the problem is?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
>


Re: Solr 8.11.1 upgrading LOG4J from 2.16 to 2.17

2022-03-23 Thread Shawn Heisey

On 3/23/2022 12:36 PM, Heller, George A III CTR (USA) wrote:
Can someone tell me where I can download an upgrade or patch for LOG4J 
and instructions on how to implement it?


Did you try googling?  Because if I enter "log4j download" (minus the 
quotes) into Google, the first hit looks like it is exactly what you 
want.  You'll want the "binary" download, either .tar.gz or .zip format.


As for what to do with it once you download it, just find all the log4j 
jars in your Solr directory and replace them with jars from the log4j 
archive that have the same names and different version numbers.  There 
has been a fair amount of user testing and we have determined that this 
is a safe operation, as long as you don't leave some jars at a different 
version than the rest.  The log4j public API is very stable, which is 
why this is safe to do, but I have no idea how stable their internal 
APIs are.


Depending on the exact Solr version you have, you may have a jar that 
starts with "log4j-layout-template-json" ... this jar won't be in the 
log4j download.  If you have not changed Solr's logging configuration so 
that it outputs JSON formatted logs, you can safely delete this one 
jar.  If you actually need an upgraded version of that jar, you can find 
it on Maven Central.


https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-layout-template-json/2.17.2/log4j-layout-template-json-2.17.2.jar

Thanks,
Shawn

h ttps://lmgtfy.app/?q=log4j+download


Secure SSL connections between Solr and ZooKeeper

2022-03-23 Thread Sam Lee
According to the "Enabling SSL" section of the Apache Solr 8.11
Reference Guide [1]:

> ZooKeeper does not support encrypted communication with clients like
> Solr. There are several related JIRA tickets where SSL support is
> being planned/worked on:
> https://issues.apache.org/jira/browse/ZOOKEEPER-235
> https://issues.apache.org/jira/browse/ZOOKEEPER-236
> https://issues.apache.org/jira/browse/ZOOKEEPER-1000
> https://issues.apache.org/jira/browse/ZOOKEEPER-2120

However, that appears to be outdated information, since Apache ZooKeeper
has already implemented encrypted communications with clients since
around ZooKeeper 3.5 (the current stable ZooKeeper is version 3.6.3).

How do I configure Solr to use SSL when communicating with ZooKeeper?

In my ZooKeeper configuration (zoo.cfg), I have this:

secureClientPort=2182
#clientPort=2181  # Disabled. Allow secure connections only.

ssl.clientAuth=need
ssl.keystore.location=/opt/zookeeper/conf/zk-keystore.jks
ssl.keystore.password=123456
ssl.truststore.location=/opt/zookeeper/conf/zk-truststore.jks
ssl.truststore.password=123456

# ...

Now, what should I do on the SolrCloud side to connect to ZooKeeper
using SSL?


  [1]: https://solr.apache.org/guide/8_11/enabling-ssl.html#configure-zookeeper