Re: Cloudsolrclient.getclusterstateprovider - returns incorrect base_url [http instead of https] - Urgent pls help

2021-07-22 Thread Vincenzo D'Amore
Are you using maven or gradle? You should just add


org.apache.solr
solr-solrj
8.8.2


looking at dependency tree I see there are a lot of jars added:

[INFO] +- org.apache.solr:solr-solrj:jar:8.7.0:compile
[INFO] |  +- commons-io:commons-io:jar:2.8.0:compile
[INFO] |  +- commons-lang:commons-lang:jar:2.6:compile
[INFO] |  +- io.netty:netty-buffer:jar:4.1.50.Final:compile
[INFO] |  +- io.netty:netty-codec:jar:4.1.50.Final:compile
[INFO] |  +- io.netty:netty-common:jar:4.1.50.Final:compile
[INFO] |  +- io.netty:netty-handler:jar:4.1.50.Final:compile
[INFO] |  +- io.netty:netty-resolver:jar:4.1.50.Final:compile
[INFO] |  +- io.netty:netty-transport:jar:4.1.50.Final:compile
[INFO] |  +- io.netty:netty-transport-native-epoll:jar:4.1.50.Final:compile
[INFO] |  +-
io.netty:netty-transport-native-unix-common:jar:4.1.50.Final:compile
[INFO] |  +- org.apache.commons:commons-math3:jar:3.6.1:compile
[INFO] |  +- org.apache.httpcomponents:httpclient:jar:4.5.12:compile
[INFO] |  +- org.apache.httpcomponents:httpcore:jar:4.4.13:compile
[INFO] |  +- org.apache.httpcomponents:httpmime:jar:4.5.12:compile
[INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.6.2:compile
[INFO] |  +- org.apache.zookeeper:zookeeper-jute:jar:3.6.2:compile
[INFO] |  +- org.codehaus.woodstox:stax2-api:jar:3.1.4:compile
[INFO] |  +- org.codehaus.woodstox:woodstox-core-asl:jar:4.4.1:compile
[INFO] |  +-
org.eclipse.jetty:jetty-alpn-client:jar:9.4.27.v20200227:compile
[INFO] |  +-
org.eclipse.jetty:jetty-alpn-java-client:jar:9.4.27.v20200227:compile
[INFO] |  +- org.eclipse.jetty:jetty-client:jar:9.4.27.v20200227:compile
[INFO] |  +- org.eclipse.jetty:jetty-http:jar:9.4.27.v20200227:compile
[INFO] |  +- org.eclipse.jetty:jetty-io:jar:9.4.27.v20200227:compile
[INFO] |  +- org.eclipse.jetty:jetty-util:jar:9.4.27.v20200227:compile
[INFO] |  +-
org.eclipse.jetty.http2:http2-client:jar:9.4.27.v20200227:compile
[INFO] |  +-
org.eclipse.jetty.http2:http2-common:jar:9.4.27.v20200227:compile
[INFO] |  +-
org.eclipse.jetty.http2:http2-hpack:jar:9.4.27.v20200227:compile
[INFO] |  +-
org.eclipse.jetty.http2:http2-http-client-transport:jar:9.4.27.v20200227:compile
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.30:test
[INFO] |  \- org.xerial.snappy:snappy-java:jar:1.1.7.6:compile


On Thu, Jul 22, 2021 at 2:38 AM Reej Nayagam  wrote:

> I tried earlier with zk ensemble, but when i try to get the
> clusterstateprovider.getclusterstate , it throws me
> "NO such method error : org.noggit.JSONParser.getFlags()
> so I was using the solrurl,
> i've added the jars solr-core-8.8.2, solr-solrj-8.8.2, zookeeper-3.6.3 and
> zookeeper-jute-3.6.3
> Not sure if I need to add any additional jars, google didn’t help.
>
> *Thanks,*
> *Reej*
>
>
> On Thu, Jul 22, 2021 at 5:51 AM Vincenzo D'Amore 
> wrote:
>
> > Hi Reej, I'm used to instantiate a new CloudSolrClient with the zookeeper
> > ensemble. Well, something like this:
> >
> >final List zkServers = new ArrayList();
> >zkServers.add("zookeeper1:2181"); zkServers.add("zookeeper2:2181");
> > zkServers.add("zookeeper3:2181");
> >final SolrClient client = new CloudSolrClient.Builder(zkServers,
> > Optional.empty()).build();
> >
> >
> > On Wed, Jul 21, 2021 at 6:13 PM Reej Nayagam  wrote:
> >
> > > Hi All,
> > >
> > > I still face the same issue. Anyone had this issue before?
> > > Im making client connection as below,
> > > CloudSolrClinet client = new
> CloudSolrClient.Builder("solrURL").build();
> > > clusterstate = client.getClusterstateProvider().getClusterState();
> > > when I check the replicas inside the cluster state the baseurl is http
> > > instead of HTTPS
> > > but when i hit the url in browser
> > >  /solr/admin/collections?action=CLUSTERSTATUS, I can see the base_url
> as
> > > https
> > > Im totally confused on whats wrong. Please help. Thanks
> > >
> > > *Thanks,*
> > > *Reej*
> > >
> > >
> > > On Wed, Jul 21, 2021 at 5:16 PM Reej M  wrote:
> > >
> > > >
> > > >
> > > > > On 21 Jul 2021, at 5:07 PM, Vincenzo D'Amore 
> > > wrote:
> > > > > Hi,
> > > > Is ok sometime all of us just loose our cool.
> > > > By the way we have followed the same steps as per the documentation
> > only.
> > > > Im trying to clear the zk data, clear everything and recheck again if
> > > that
> > > > might help. Thanks
> > > >
> > > > > this is your version,
> > > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/8_8/enabling-ssl.html#EnablingSSL-SolrCloud
> > > > > anyway, pay attention to clusterprop
> > > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/8_8/enabling-ssl.html#update-cluster-properties-for-existing-collections
> > > > >
> > > > > On Wed, Jul 21, 2021 at 11:04 AM Vincenzo D'Amore <
> > v.dam...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Have you double checked how ssl has been configured?
> > > > >> I think this doc could help
> > > > >>
> > > >
> > >
> >
> https://solr.apache.org/guide/6_6/enabling-ssl.html#EnablingSSL-SolrCloud
> > > > >>
> > > 

Re: Solr stop doesn't cope with zombie process - should it?

2021-07-22 Thread Colvin Cowie
Ok thanks, I'll bear that in mind. I've raised
https://issues.apache.org/jira/browse/SOLR-15558 and will add a patch when
I have some free time

On Wed, 21 Jul 2021 at 15:56, Mike Drob  wrote:

> That seems like a reasonable check to add, the only caution I would advise
> is that a lot of developers use macs for local testing so make sure that
> whatever flags you invoke are generally cross platform compatible, or
> hidden behind appropriate conditions.
>
> On Wed, Jul 21, 2021 at 5:59 AM Colvin Cowie 
> wrote:
>
> >  Hello,
> >
> > When calling solr stop on linux, this command is used
> > *CHECK_PID=`ps auxww | awk '{print $2}' | grep -w $SOLR_PID | sort -r |
> tr
> > -d ' '`*
> >
> >
> https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/bin/solr#L871
> >
> > If Solr has stopped but remains as a zombie process then its process
> entry
> > will remain in the table, so *ps auxww* will continue to show the PID
> even
> > after kill -9. So that results in something like this, with 3 minutes
> > wasted waiting for a dead process to exit.
> >
> >
> >
> >
> >
> >
> > *[2021-07-21T09:15:12.365Z] Sending stop command to Solr running on port
> > 8983 ... waiting up to 180 seconds to allow Jetty process 12622 to stop
> > gracefully.[2021-07-21T09:18:13.551Z]  [|] Solr process 12622 is still
> > running; jstacking it now.[2021-07-21T09:18:21.806Z] 12622: Unable to
> open
> > socket file /proc/12622/root/tmp/.java_pid12622: target process 12622
> > doesn't respond within 10500ms or HotSpot VM not
> > loaded[2021-07-21T09:18:21.806Z] Solr process 12622 is still running;
> > forcefully killing it now.[2021-07-21T09:18:21.806Z] Killed process
> > 12622[2021-07-21T09:18:31.678Z] ERROR: Failed to kill previous Solr Java
> > process 12622 ... script fails.*
> >
> > But the output of ps auxww does identify Zombie processes under STAT:
> > *USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME
> COMMAND*
> > *root  12622  1.4  0.0  0 0   pts/1 Z
> >  10:42   0:26 [java]*
> >
> > So the CHECK_PID could filter out Zombies.
> > Obviously the bigger issue is why the process has ended up as a Zombie
> (in
> > this case it was because of
> >
> >
> https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
> > and not specifying "--init" when running Solr inside a docker container)
> so
> > maybe a message warning that the process is a zombie is worth having, so
> > that the user has an opportunity to do something about it.
> >
> > I guess I will raise a JIRA issue with a patch to do that unless there's
> > some alternative suggestions?
> >
> > Regards,
> > Colvin
> >
>


Re: Error in Fq parsing while using multiple values in solr 8.7

2021-07-22 Thread Shawn Heisey

On 7/21/2021 9:51 PM, Satya Nand wrote:

Hi Shawn,

Thank you, I also had my suspicion on the sow parameter. but I can't figure
out why it is acting differently for analyzed type and non-analyzed.

for example, If I give this query to solr 8.7

fq=negativeattribute:(citychennai mcat43120
20mcat43120)&debug=query&fq=mcatid:(43120 26527 43015)

It parses both queries as you can see for mcatid field it is working like
sow is true.

   "parsed_filter_queries": [
   "negativeattribute:citychennai mcat43120 mcat43120",
   "mcatid:43120 mcatid:26527 mcatid:43015"

 ]
   }



That's very odd. I duplicated it on 8.9.0 with an index built from the 
default example config:


|"filter_queries":["id:(foo bar baz)"], "parsed_filter_queries":["id:foo 
id:bar id:baz"],|

||
|It seems like a bug to me, but someone with a lot more experience on 
the query parsers will need to chime in. I also tried it another way 
with the same results:|

||
|||"filter_queries":["{!edismax}id:(foo bar baz)"], 
"parsed_filter_queries":["+(id:foo id:bar id:baz)"],|

||
I can confirm from analysis checking on the id field (which is set up as 
StrField like your mcatid field) that it is probably the query parser 
doing the split here. I don't know why it would work differently for 
StrField and TextField with the keyword tokenizer. The same thing 
happens on the q parameter as well.


|Thanks,|
|Shawn|

||



RE: Solr nodes crashing

2021-07-22 Thread Jon Morisi
I looked into this (https://solr.apache.org/guide/7_4/docvalues.html), and it 
looks like I can't use docvalues because my field type is solr.textfield.  
Specifically:


  


  
  

I'm passing in a string of tokens separated by '|'. 

Some (made up) example data would be: 
41654165|This is a phrase|6579813|phrases are all one token|65798761|There can 
be multiple phrases or tokens per doc

 Is there a workaround?

My search would look something like:
.../select?q=ptokens: 41654165%20AND% ptokens: 65798761


-Original Message-
From: Mike Drob  
Sent: Wednesday, July 21, 2021 12:36 PM
To: users@solr.apache.org
Subject: Re: Solr nodes crashing

You may want to look into enabling docvalues for your fields in your scheme, if 
not already enabled. That often helps with memory usage during query, but 
requires a reindex of your data.

There are also first searches and new searches queries you can configure in 
your Solr config, those would be able to warm your caches for you if that is 
the case.

Mike

On Wed, Jul 21, 2021 at 11:06 AM Jon Morisi  wrote:

> Thanks for the help Shawn and Walter.  After increasing the open files 
> setting to 128000 and increasing the JVM-Memory to 16 GB, I was able 
> to load my documents.
>
> I now have a collection with 2.3 T rows / ~480 GB running on a 4-node 
> cluster.  I have found that complicated queries (searching for two 
> search terms in a field with "AND" for example), often timeout.  If I 
> try multiple times the query does eventually complete.  I'm assuming 
> this is a caching / warm-up issue.
>
> Is there a configuration option I can use to cache the indexes for one 
> of the columns or increase the timeout?  Any other advice to get this 
> performing quicker is appreciated.
>
> Thanks again,
> Jon
>
> -Original Message-
> From: Shawn Heisey 
> Sent: Thursday, July 1, 2021 6:48 PM
> To: users@solr.apache.org
> Subject: Re: Solr nodes crashing
>
> On 7/1/2021 4:23 PM, Jon Morisi wrote:
> > I've had an indexing job running for 24+ hours.  I'm importing 100m+
> documents.  After about 8 hours both of the replica nodes crashed but 
> the primary nodes have continued to run and index.
>
> There's a common misconception.  Java programs, including Solr, almost 
> never crash.
>
> If you've started a recent Solr version on a platform other than 
> Windows, then Solr is started with a Java option that runs a script 
> whenever an OutOfMemoryError exception is thrown by the program.  What 
> that script does is simple -- it logs a line to a logfile and then 
> kills Solr with the -9
> (kill) signal.  Note that there are a number of resource depletion 
> scenarios, other than memory, which can result in an OutOfMemoryError.
> That's why you were asked about open file and process limits.
>
> Most operating systems also have what has been named the "oom killer".
> When system memory becomes extremely tight, the OS will find programs 
> using a lot of memory and kill one of them.
>
> These two things will LOOK like a crash, but they're not really crashes.
>
> > JVM-Memory 50.7%
> > 981.38 MB
> > 981.38 MB
> > 497
>
> This indicates that your max heap setting for Solr is in the ballpark 
> of 1GB.  This is extremely small, and so you're probably throwing 
> OutOfMemoryError because of heap space.  Which, on a non-Windows 
> system, will basically cause Solr to commit suicide.  It does this 
> because when OOME is thrown, program operation becomes completely 
> unpredictable, and index corruption is a very real possibility.
>
> There are precisely two ways to deal with OOME.  One is to increase 
> the size of the resource that is being depleted.  The other is to 
> change the program or the program configuration so that it doesn't 
> require as much of that resource.  Often, especially with Solr, the 
> second option is simply not possible.
>
> Most likely you're going to need to increase Solr's heap far beyond 1GB.
>   There's no way for us to come up with a recommendation for you 
> without asking you a lot of very detailed questions about your setup 
> ... and even with that, it's possible that we would give you an 
> incorrect recommendation.  I'll give you a number, and warn you that 
> it could be wrong, either way too small or way too large.  Try an 8GB 
> heap.  You have lots of memory in this system, 8GB is barely a drop in the 
> bucket.
>
> Thanks,
> Shawn
>


Re: Result set order when searching on "*" (asterisk character)

2021-07-22 Thread Steven White
I don't have any sort option configured.  The score I'm getting back is 1.0
for each hit item.

Does anyone know about Lucene's internal functionality to help me
understand what the returned order is?

Steven

On Wed, Jul 21, 2021 at 10:52 AM Vincenzo D'Amore 
wrote:

> if no sort options are configure, just try to add the score field you'll
> see all the documents (are ordered by score), which usually when there are
> no clause is 1.
>
> On Wed, Jul 21, 2021 at 4:36 PM Steven White  wrote:
>
> > Hi everyone,
> >
> > When I search on "*" (asterisk character) what's the result sort order
> > based on?
> >
> > Thanks
> >
> > Steven
> >
>
>
> --
> Vincenzo D'Amore
>


Re: Result set order when searching on "*" (asterisk character)

2021-07-22 Thread Michael Gibney
No sort option configured generally defaults to score (and currently does
so even in cases such as the "*:*" case (MatchAllDocsQuery) where sort is
guaranteed to be irrelevant; see:
https://issues.apache.org/jira/browse/SOLR-14765).

But functionally speaking that doesn't really matter: in the event of a
main-sort "tie" (and in this case what you have is essentially "one big
tie") or no sort at all, the order is determined by the order of docs as
serialized in the Lucene index -- and that order is arbitrary, and can vary
across different replicas of the same "shard" of the index.

If stability is desired (and in many cases it is), you could try adding a
default `sort` param of, e.g.: `sort=score,id` (with `id` as a unique,
explicit tie-breaker). There are other options for handling this situation
and nuances that you may want to account for somehow; but they all stem
from the direct answer to your question, which is that in the event of tie
or no sort, the order of returned results is arbitrary and unstable.

On Thu, Jul 22, 2021 at 11:11 AM Steven White  wrote:

> I don't have any sort option configured.  The score I'm getting back is 1.0
> for each hit item.
>
> Does anyone know about Lucene's internal functionality to help me
> understand what the returned order is?
>
> Steven
>
> On Wed, Jul 21, 2021 at 10:52 AM Vincenzo D'Amore 
> wrote:
>
> > if no sort options are configure, just try to add the score field you'll
> > see all the documents (are ordered by score), which usually when there
> are
> > no clause is 1.
> >
> > On Wed, Jul 21, 2021 at 4:36 PM Steven White 
> wrote:
> >
> > > Hi everyone,
> > >
> > > When I search on "*" (asterisk character) what's the result sort order
> > > based on?
> > >
> > > Thanks
> > >
> > > Steven
> > >
> >
> >
> > --
> > Vincenzo D'Amore
> >
>


RE: Solr nodes crashing

2021-07-22 Thread Jon Morisi
I dug some more into a workaround and found, the SortableTextField, field type:
https://solr.apache.org/guide/7_4/field-types-included-with-solr.html

My max length is 3945.

Any concerns about changing my solr.TextField type to a SortableTextField type 
in order to enable docValues?  
I would then configure the maxCharsForDocValues to 4096.

Is this a bad idea, or am I on the right track?
Is there another way to enable docValues for a pipe delimited string of tokens?

-Original Message-
From: Jon Morisi  
Sent: Thursday, July 22, 2021 8:45 AM
To: users@solr.apache.org
Subject: RE: Solr nodes crashing

I looked into this (https://solr.apache.org/guide/7_4/docvalues.html), and it 
looks like I can't use docvalues because my field type is solr.textfield.  
Specifically:


  


  
  

I'm passing in a string of tokens separated by '|'. 

Some (made up) example data would be: 
41654165|This is a phrase|6579813|phrases are all one 
41654165|token|65798761|There can be multiple phrases or tokens per doc

 Is there a workaround?

My search would look something like:
.../select?q=ptokens: 41654165%20AND% ptokens: 65798761


-Original Message-
From: Mike Drob 
Sent: Wednesday, July 21, 2021 12:36 PM
To: users@solr.apache.org
Subject: Re: Solr nodes crashing

You may want to look into enabling docvalues for your fields in your scheme, if 
not already enabled. That often helps with memory usage during query, but 
requires a reindex of your data.

There are also first searches and new searches queries you can configure in 
your Solr config, those would be able to warm your caches for you if that is 
the case.

Mike

On Wed, Jul 21, 2021 at 11:06 AM Jon Morisi  wrote:

> Thanks for the help Shawn and Walter.  After increasing the open files 
> setting to 128000 and increasing the JVM-Memory to 16 GB, I was able 
> to load my documents.
>
> I now have a collection with 2.3 T rows / ~480 GB running on a 4-node 
> cluster.  I have found that complicated queries (searching for two 
> search terms in a field with "AND" for example), often timeout.  If I 
> try multiple times the query does eventually complete.  I'm assuming 
> this is a caching / warm-up issue.
>
> Is there a configuration option I can use to cache the indexes for one 
> of the columns or increase the timeout?  Any other advice to get this 
> performing quicker is appreciated.
>
> Thanks again,
> Jon
>
> -Original Message-
> From: Shawn Heisey 
> Sent: Thursday, July 1, 2021 6:48 PM
> To: users@solr.apache.org
> Subject: Re: Solr nodes crashing
>
> On 7/1/2021 4:23 PM, Jon Morisi wrote:
> > I've had an indexing job running for 24+ hours.  I'm importing 100m+
> documents.  After about 8 hours both of the replica nodes crashed but 
> the primary nodes have continued to run and index.
>
> There's a common misconception.  Java programs, including Solr, almost 
> never crash.
>
> If you've started a recent Solr version on a platform other than 
> Windows, then Solr is started with a Java option that runs a script 
> whenever an OutOfMemoryError exception is thrown by the program.  What 
> that script does is simple -- it logs a line to a logfile and then 
> kills Solr with the -9
> (kill) signal.  Note that there are a number of resource depletion 
> scenarios, other than memory, which can result in an OutOfMemoryError.
> That's why you were asked about open file and process limits.
>
> Most operating systems also have what has been named the "oom killer".
> When system memory becomes extremely tight, the OS will find programs 
> using a lot of memory and kill one of them.
>
> These two things will LOOK like a crash, but they're not really crashes.
>
> > JVM-Memory 50.7%
> > 981.38 MB
> > 981.38 MB
> > 497
>
> This indicates that your max heap setting for Solr is in the ballpark 
> of 1GB.  This is extremely small, and so you're probably throwing 
> OutOfMemoryError because of heap space.  Which, on a non-Windows 
> system, will basically cause Solr to commit suicide.  It does this 
> because when OOME is thrown, program operation becomes completely 
> unpredictable, and index corruption is a very real possibility.
>
> There are precisely two ways to deal with OOME.  One is to increase 
> the size of the resource that is being depleted.  The other is to 
> change the program or the program configuration so that it doesn't 
> require as much of that resource.  Often, especially with Solr, the 
> second option is simply not possible.
>
> Most likely you're going to need to increase Solr's heap far beyond 1GB.
>   There's no way for us to come up with a recommendation for you 
> without asking you a lot of very detailed questions about your setup 
> ... and even with that, it's possible that we would give you an 
> incorrect recommendation.  I'll give you a number, and warn you that 
> it could be wrong, either way too small or way too large.  Try an 8GB 
> heap.  You have lots of memory in this system, 8GB is barely a drop in the 
>

Re: Solr nodes crashing

2021-07-22 Thread Michael Gibney
SortableTextField uses docValues in a very specific way, and is not a
general-purpose workaround for enabling docValues on TextFields. Possibly
of interest: https://issues.apache.org/jira/browse/SOLR-8362

That said, DocValues are relevant mainly (only?) wrt full-domain per-doc
value-access (e.g., for faceting, sorting, functions, export ...). Enabling
docValues for any field against which you're only running _searches_ is
unlikely to help.

If search latency is the main issue for you now, sharing more detail about
the queries you're running would be helpful (e.g., are you only running
searches? are you also running facets? how are you sorting? etc.). Pasting
a literal, complete search url (and any configured param defaults, if
applicable) could be helpful (fwiw, the example search you provided
earlier, ".../select?q=ptokens: 41654165%20AND% ptokens: 65798761" looks a
bit odd in several respects, and may not be being interpreted the way you
think it should be; e.g., field spec should be immediately adjacent to
field value, with no intervening whitespace, etc...).

I note that you have a small amount of swap space being used; "small amount
used" or not, I would _strongly_ recommend disabling swap entirely
(`swapoff -a`). There are risks associated with disabling in general; but
with an index that large, you should be running with enough memory headroom
for the OS page cache that you shouldn't get anywhere near a situation
where application memory actually _needs_ swap. Also, a shot in the dark:
is there any chance you're running this index on a network filesystem?

On Thu, Jul 22, 2021 at 11:51 AM Jon Morisi  wrote:

> I dug some more into a workaround and found, the SortableTextField, field
> type:
> https://solr.apache.org/guide/7_4/field-types-included-with-solr.html
>
> My max length is 3945.
>
> Any concerns about changing my solr.TextField type to a SortableTextField
> type in order to enable docValues?
> I would then configure the maxCharsForDocValues to 4096.
>
> Is this a bad idea, or am I on the right track?
> Is there another way to enable docValues for a pipe delimited string of
> tokens?
>
> -Original Message-
> From: Jon Morisi 
> Sent: Thursday, July 22, 2021 8:45 AM
> To: users@solr.apache.org
> Subject: RE: Solr nodes crashing
>
> I looked into this (https://solr.apache.org/guide/7_4/docvalues.html),
> and it looks like I can't use docvalues because my field type is
> solr.textfield.  Specifically:
>
>  positionIncrementGap="100" multiValued="false">
>   
>  pattern="|"/>
> 
>   
>   
>
> I'm passing in a string of tokens separated by '|'.
>
> Some (made up) example data would be:
> 41654165|This is a phrase|6579813|phrases are all one
> 41654165|token|65798761|There can be multiple phrases or tokens per doc
>
>  Is there a workaround?
>
> My search would look something like:
> .../select?q=ptokens: 41654165%20AND% ptokens: 65798761
>
>
> -Original Message-
> From: Mike Drob 
> Sent: Wednesday, July 21, 2021 12:36 PM
> To: users@solr.apache.org
> Subject: Re: Solr nodes crashing
>
> You may want to look into enabling docvalues for your fields in your
> scheme, if not already enabled. That often helps with memory usage during
> query, but requires a reindex of your data.
>
> There are also first searches and new searches queries you can configure
> in your Solr config, those would be able to warm your caches for you if
> that is the case.
>
> Mike
>
> On Wed, Jul 21, 2021 at 11:06 AM Jon Morisi 
> wrote:
>
> > Thanks for the help Shawn and Walter.  After increasing the open files
> > setting to 128000 and increasing the JVM-Memory to 16 GB, I was able
> > to load my documents.
> >
> > I now have a collection with 2.3 T rows / ~480 GB running on a 4-node
> > cluster.  I have found that complicated queries (searching for two
> > search terms in a field with "AND" for example), often timeout.  If I
> > try multiple times the query does eventually complete.  I'm assuming
> > this is a caching / warm-up issue.
> >
> > Is there a configuration option I can use to cache the indexes for one
> > of the columns or increase the timeout?  Any other advice to get this
> > performing quicker is appreciated.
> >
> > Thanks again,
> > Jon
> >
> > -Original Message-
> > From: Shawn Heisey 
> > Sent: Thursday, July 1, 2021 6:48 PM
> > To: users@solr.apache.org
> > Subject: Re: Solr nodes crashing
> >
> > On 7/1/2021 4:23 PM, Jon Morisi wrote:
> > > I've had an indexing job running for 24+ hours.  I'm importing 100m+
> > documents.  After about 8 hours both of the replica nodes crashed but
> > the primary nodes have continued to run and index.
> >
> > There's a common misconception.  Java programs, including Solr, almost
> > never crash.
> >
> > If you've started a recent Solr version on a platform other than
> > Windows, then Solr is started with a Java option that runs a script
> > whenever an OutOfMemoryError exception is thrown by the program.  What
> > that script does is s

Re: Solr nodes crashing

2021-07-22 Thread Michael Gibney
ps- wrt requesting a "literal, complete search url" to aid troubleshooting:
facets, `sort`, `offset`, and `rows` params would all be of particular
interest.

On Thu, Jul 22, 2021 at 12:25 PM Michael Gibney 
wrote:

> SortableTextField uses docValues in a very specific way, and is not a
> general-purpose workaround for enabling docValues on TextFields. Possibly
> of interest: https://issues.apache.org/jira/browse/SOLR-8362
>
> That said, DocValues are relevant mainly (only?) wrt full-domain per-doc
> value-access (e.g., for faceting, sorting, functions, export ...). Enabling
> docValues for any field against which you're only running _searches_ is
> unlikely to help.
>
> If search latency is the main issue for you now, sharing more detail about
> the queries you're running would be helpful (e.g., are you only running
> searches? are you also running facets? how are you sorting? etc.). Pasting
> a literal, complete search url (and any configured param defaults, if
> applicable) could be helpful (fwiw, the example search you provided
> earlier, ".../select?q=ptokens: 41654165%20AND% ptokens: 65798761" looks a
> bit odd in several respects, and may not be being interpreted the way you
> think it should be; e.g., field spec should be immediately adjacent to
> field value, with no intervening whitespace, etc...).
>
> I note that you have a small amount of swap space being used; "small
> amount used" or not, I would _strongly_ recommend disabling swap entirely
> (`swapoff -a`). There are risks associated with disabling in general; but
> with an index that large, you should be running with enough memory headroom
> for the OS page cache that you shouldn't get anywhere near a situation
> where application memory actually _needs_ swap. Also, a shot in the dark:
> is there any chance you're running this index on a network filesystem?
>
> On Thu, Jul 22, 2021 at 11:51 AM Jon Morisi 
> wrote:
>
>> I dug some more into a workaround and found, the SortableTextField, field
>> type:
>> https://solr.apache.org/guide/7_4/field-types-included-with-solr.html
>>
>> My max length is 3945.
>>
>> Any concerns about changing my solr.TextField type to a SortableTextField
>> type in order to enable docValues?
>> I would then configure the maxCharsForDocValues to 4096.
>>
>> Is this a bad idea, or am I on the right track?
>> Is there another way to enable docValues for a pipe delimited string of
>> tokens?
>>
>> -Original Message-
>> From: Jon Morisi 
>> Sent: Thursday, July 22, 2021 8:45 AM
>> To: users@solr.apache.org
>> Subject: RE: Solr nodes crashing
>>
>> I looked into this (https://solr.apache.org/guide/7_4/docvalues.html),
>> and it looks like I can't use docvalues because my field type is
>> solr.textfield.  Specifically:
>>
>> > positionIncrementGap="100" multiValued="false">
>>   
>> > pattern="|"/>
>> 
>>   
>>   
>>
>> I'm passing in a string of tokens separated by '|'.
>>
>> Some (made up) example data would be:
>> 41654165|This is a phrase|6579813|phrases are all one
>> 41654165|token|65798761|There can be multiple phrases or tokens per doc
>>
>>  Is there a workaround?
>>
>> My search would look something like:
>> .../select?q=ptokens: 41654165%20AND% ptokens: 65798761
>>
>>
>> -Original Message-
>> From: Mike Drob 
>> Sent: Wednesday, July 21, 2021 12:36 PM
>> To: users@solr.apache.org
>> Subject: Re: Solr nodes crashing
>>
>> You may want to look into enabling docvalues for your fields in your
>> scheme, if not already enabled. That often helps with memory usage during
>> query, but requires a reindex of your data.
>>
>> There are also first searches and new searches queries you can configure
>> in your Solr config, those would be able to warm your caches for you if
>> that is the case.
>>
>> Mike
>>
>> On Wed, Jul 21, 2021 at 11:06 AM Jon Morisi 
>> wrote:
>>
>> > Thanks for the help Shawn and Walter.  After increasing the open files
>> > setting to 128000 and increasing the JVM-Memory to 16 GB, I was able
>> > to load my documents.
>> >
>> > I now have a collection with 2.3 T rows / ~480 GB running on a 4-node
>> > cluster.  I have found that complicated queries (searching for two
>> > search terms in a field with "AND" for example), often timeout.  If I
>> > try multiple times the query does eventually complete.  I'm assuming
>> > this is a caching / warm-up issue.
>> >
>> > Is there a configuration option I can use to cache the indexes for one
>> > of the columns or increase the timeout?  Any other advice to get this
>> > performing quicker is appreciated.
>> >
>> > Thanks again,
>> > Jon
>> >
>> > -Original Message-
>> > From: Shawn Heisey 
>> > Sent: Thursday, July 1, 2021 6:48 PM
>> > To: users@solr.apache.org
>> > Subject: Re: Solr nodes crashing
>> >
>> > On 7/1/2021 4:23 PM, Jon Morisi wrote:
>> > > I've had an indexing job running for 24+ hours.  I'm importing 100m+
>> > documents.  After about 8 hours both of the replica nodes crashed but
>> > the primary nodes have continue

Re: Solr nodes crashing

2021-07-22 Thread Shawn Heisey

On 7/22/2021 10:39 AM, Michael Gibney wrote:

ps- wrt requesting a "literal, complete search url" to aid troubleshooting:
facets, `sort`, `offset`, and `rows` params would all be of particular
interest.



One way to get everything we are after for the query is to add 
"echoParams=all" to the query URL and then include the full 
"responseHeader" part of the response.  That will even include 
parameters defined in solrconfig.xml.


|"responseHeader":{ "status":0, "QTime":55, "params":{ "q":"*:*", 
"df":"_text_", "rows":"10", "echoParams":"all", "_":"1626974542182"}}, 
Thanks, Shawn |




RE: Solr nodes crashing

2021-07-22 Thread Jon Morisi
RE Shawn and Michael,
I am just looking for a way to speed it up.  Mike Drob had mentioned docvalues, 
which is why I was researching that route.

I am running my search tests from solr admin, no facets, no sorting.  I am 
using Dsolr.directoryFactory=HdfsDirectoryFactory

URL:
. /select?q=ptokens:8974561 AND ptokens:9844554 AND ptokens:8564484 AND 
ptokens:9846541&echoParams=all

Response once it ran (timeout on first attempt, waited 5min for re-try):
responseHeader  
zkConnected true
status  0
QTime   2411
params  
q   "ptokens:243796009 AND ptokens:410512000 AND ptokens:410604004 AND 
ptokens:408729009"
df  "data"
rows"10"
echoParams  "all"

dashboard info:
System 0.16 0.13 0.14

Physical Memory 97.7%
377.39 GB
368.77 GB

Swap Space 4.7%
4.00 GB
193.25 MB

File Descriptor Count 0.2%
128000
226

JVM-Memory 22.7%
15.33 GB
15.33 GB

Thanks for looking,
Jon


-Original Message-
From: Shawn Heisey  
Sent: Thursday, July 22, 2021 11:26 AM
To: users@solr.apache.org
Subject: Re: Solr nodes crashing

On 7/22/2021 10:39 AM, Michael Gibney wrote:
> ps- wrt requesting a "literal, complete search url" to aid troubleshooting:
> facets, `sort`, `offset`, and `rows` params would all be of particular 
> interest.


One way to get everything we are after for the query is to add "echoParams=all" 
to the query URL and then include the full "responseHeader" part of the 
response.  That will even include parameters defined in solrconfig.xml.

|"responseHeader":{ "status":0, "QTime":55, "params":{ "q":"*:*",
"df":"_text_", "rows":"10", "echoParams":"all", "_":"1626974542182"}}, Thanks, 
Shawn |



Re: Cloudsolrclient.getclusterstateprovider - returns incorrect base_url [http instead of https] - Urgent pls help

2021-07-22 Thread Reej Nayagam
Thanks Shawn In our application I saw noggit.jar added so after removing it
now I could instantiate with the zk IP’s. thanks for the help

On Thu, 22 Jul 2021 at 8:54 AM, Shawn Heisey  wrote:

> On 7/21/2021 6:37 PM, Reej Nayagam wrote:
> > I tried earlier with zk ensemble, but when i try to get the
> > clusterstateprovider.getclusterstate , it throws me
> > "NO such method error : org.noggit.JSONParser.getFlags()
> > so I was using the solrurl,
> > i've added the jars solr-core-8.8.2, solr-solrj-8.8.2, zookeeper-3.6.3
> and
> > zookeeper-jute-3.6.3
> > Not sure if I need to add any additional jars, google didn’t help.
>
> The jar dependencies you need for SolrJ are included in the Solr
> download.  You will find them in dist/solrj-lib.  Not all of those jars
> are required for every usage ... but figuring out which ones you don't
> need can be challenging, so it's better to include them all.
>
> You do not need solr-core unless you're trying to embed a complete Solr
> server into your app (without http access) ... which we strongly
> recommend NOT doing for nearly everyone.
>
> The noggit library should not be needed for SolrJ, but it IS used by
> solr-core.  You're probably getting that message because you have
> solr-core.
>
> Thanks,
> Shawn
>
-- 
*Thanks,*
*Reej*


Re: Cloudsolrclient.getclusterstateprovider - returns incorrect base_url [http instead of https] - Urgent pls help

2021-07-22 Thread Reej Nayagam
Thanks Vincenzo D'Amore & Shawn
Hi Ours is a legacy system using EJB, no maven or graddle. And now I’m able
to instantiate passing zk ip’s after removing the noggit jar.

As suggested

On Thu, 22 Jul 2021 at 4:01 PM, Vincenzo D'Amore  wrote:

> Are you using maven or gradle? You should just add
>
> 
> org.apache.solr
> solr-solrj
> 8.8.2
> 
>
> looking at dependency tree I see there are a lot of jars added:
>
> [INFO] +- org.apache.solr:solr-solrj:jar:8.7.0:compile
> [INFO] |  +- commons-io:commons-io:jar:2.8.0:compile
> [INFO] |  +- commons-lang:commons-lang:jar:2.6:compile
> [INFO] |  +- io.netty:netty-buffer:jar:4.1.50.Final:compile
> [INFO] |  +- io.netty:netty-codec:jar:4.1.50.Final:compile
> [INFO] |  +- io.netty:netty-common:jar:4.1.50.Final:compile
> [INFO] |  +- io.netty:netty-handler:jar:4.1.50.Final:compile
> [INFO] |  +- io.netty:netty-resolver:jar:4.1.50.Final:compile
> [INFO] |  +- io.netty:netty-transport:jar:4.1.50.Final:compile
> [INFO] |  +- io.netty:netty-transport-native-epoll:jar:4.1.50.Final:compile
> [INFO] |  +-
> io.netty:netty-transport-native-unix-common:jar:4.1.50.Final:compile
> [INFO] |  +- org.apache.commons:commons-math3:jar:3.6.1:compile
> [INFO] |  +- org.apache.httpcomponents:httpclient:jar:4.5.12:compile
> [INFO] |  +- org.apache.httpcomponents:httpcore:jar:4.4.13:compile
> [INFO] |  +- org.apache.httpcomponents:httpmime:jar:4.5.12:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.6.2:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper-jute:jar:3.6.2:compile
> [INFO] |  +- org.codehaus.woodstox:stax2-api:jar:3.1.4:compile
> [INFO] |  +- org.codehaus.woodstox:woodstox-core-asl:jar:4.4.1:compile
> [INFO] |  +-
> org.eclipse.jetty:jetty-alpn-client:jar:9.4.27.v20200227:compile
> [INFO] |  +-
> org.eclipse.jetty:jetty-alpn-java-client:jar:9.4.27.v20200227:compile
> [INFO] |  +- org.eclipse.jetty:jetty-client:jar:9.4.27.v20200227:compile
> [INFO] |  +- org.eclipse.jetty:jetty-http:jar:9.4.27.v20200227:compile
> [INFO] |  +- org.eclipse.jetty:jetty-io:jar:9.4.27.v20200227:compile
> [INFO] |  +- org.eclipse.jetty:jetty-util:jar:9.4.27.v20200227:compile
> [INFO] |  +-
> org.eclipse.jetty.http2:http2-client:jar:9.4.27.v20200227:compile
> [INFO] |  +-
> org.eclipse.jetty.http2:http2-common:jar:9.4.27.v20200227:compile
> [INFO] |  +-
> org.eclipse.jetty.http2:http2-hpack:jar:9.4.27.v20200227:compile
> [INFO] |  +-
>
> org.eclipse.jetty.http2:http2-http-client-transport:jar:9.4.27.v20200227:compile
> [INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.30:test
> [INFO] |  \- org.xerial.snappy:snappy-java:jar:1.1.7.6:compile
>
>
> On Thu, Jul 22, 2021 at 2:38 AM Reej Nayagam  wrote:
>
> > I tried earlier with zk ensemble, but when i try to get the
> > clusterstateprovider.getclusterstate , it throws me
> > "NO such method error : org.noggit.JSONParser.getFlags()
> > so I was using the solrurl,
> > i've added the jars solr-core-8.8.2, solr-solrj-8.8.2, zookeeper-3.6.3
> and
> > zookeeper-jute-3.6.3
> > Not sure if I need to add any additional jars, google didn’t help.
> >
> > *Thanks,*
> > *Reej*
> >
> >
> > On Thu, Jul 22, 2021 at 5:51 AM Vincenzo D'Amore 
> > wrote:
> >
> > > Hi Reej, I'm used to instantiate a new CloudSolrClient with the
> zookeeper
> > > ensemble. Well, something like this:
> > >
> > >final List zkServers = new ArrayList();
> > >zkServers.add("zookeeper1:2181"); zkServers.add("zookeeper2:2181");
> > > zkServers.add("zookeeper3:2181");
> > >final SolrClient client = new CloudSolrClient.Builder(zkServers,
> > > Optional.empty()).build();
> > >
> > >
> > > On Wed, Jul 21, 2021 at 6:13 PM Reej Nayagam 
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I still face the same issue. Anyone had this issue before?
> > > > Im making client connection as below,
> > > > CloudSolrClinet client = new
> > CloudSolrClient.Builder("solrURL").build();
> > > > clusterstate = client.getClusterstateProvider().getClusterState();
> > > > when I check the replicas inside the cluster state the baseurl is
> http
> > > > instead of HTTPS
> > > > but when i hit the url in browser
> > > >  /solr/admin/collections?action=CLUSTERSTATUS, I can see the base_url
> > as
> > > > https
> > > > Im totally confused on whats wrong. Please help. Thanks
> > > >
> > > > *Thanks,*
> > > > *Reej*
> > > >
> > > >
> > > > On Wed, Jul 21, 2021 at 5:16 PM Reej M  wrote:
> > > >
> > > > >
> > > > >
> > > > > > On 21 Jul 2021, at 5:07 PM, Vincenzo D'Amore  >
> > > > wrote:
> > > > > > Hi,
> > > > > Is ok sometime all of us just loose our cool.
> > > > > By the way we have followed the same steps as per the documentation
> > > only.
> > > > > Im trying to clear the zk data, clear everything and recheck again
> if
> > > > that
> > > > > might help. Thanks
> > > > >
> > > > > > this is your version,
> > > > > >
> > > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/8_8/enabling-ssl.html#EnablingSSL-SolrCloud
> > > > > > anyway, pay attention to cl

Print Solr Responses || SOLR 7.5

2021-07-22 Thread Akreeti Agarwal
Classification: Confidential
Hi All,

I am using SOLR 7.5 Master/Slave architecture in my project. Just wanted to 
know, is there any way in which we can print responses generated by SOLR query 
in some log file i.e. as many solr query are getting hit, we can store the 
response in some file.
If it can be done, then can be done on timely basis like we set it for say 5 
minutes.

Please help as I am not able to resolve one issue coming on prod environment.
Issue:
SOLR is returning response with "*" which is causing parsing issue at API side

{* "response":{"numFound":1,"start":0,"docs":[.

Regards,
Akreeti Agarwal

::DISCLAIMER::

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. E-mail transmission is not guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or may contain viruses in transmission. 
The e mail and its contents (with or without referred errors) shall therefore 
not attach any liability on the originator or HCL or its affiliates. Views or 
opinions, if any, presented in this email are solely those of the author and 
may not necessarily reflect the views or opinions of HCL or its affiliates. Any 
form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of authorized representative of HCL is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any email and/or attachments, please check them for 
viruses and other defects.