Re: Streaming of Documents with text columns (_txt)

2023-05-15 Thread ufuk yılmaz
There may be an auto generated field already in your schema then. It’s name 
should be something like yourfield_str

—

> On 12 May 2023, at 15:12, Subhasis Patra  
> wrote:
> 
> I am using solr in cloud mode with schemaless mode. I don’t want to 
> update/touch managed schema. Is there any way I can send the copy of those 
> fields using solj ? So the text value will be copied to string value. 
> 
> 
> Thanks
> Subhasis Patra
> 240-755-2601
> subhasis.pa...@e2open.com
> 
> -Original Message-
> From: ufuk yılmaz  
> Sent: Thursday, May 11, 2023 5:58 AM
> To: users@solr.apache.org
> Subject: Re: Streaming of Documents with text columns (_txt)
> 
> PHISH ALERT! CHECK VALIDITY IF CLICKING, SHARING, RESPONDING
> 
> 
> My solution to this kind of situation is to have a docValues enabled 
> copyField for each text field in the schema, so I can export all of the 
> fields when necessary
> 
> -ufuk yilmaz
> 
> —
> 
>> On 11 May 2023, at 05:08, Subhasis Patra  wrote:
>> 
>> Hi All,
>> 
>> I am using CloudSolrStream to get stream data for documents in Solr. I am 
>> using /export When documents have columns of type STRING, DATE, DOUBLE, 
>> LONG. It does not allow /export when documents have  _txt 
>> column(DocValues=false). So I use as below. I use _txt to support case 
>> insensitive search.
>> 
>> StreamFactory factory = new 
>> StreamFactory().withCollectionZkHost(collection, zkHost); 
>> StreamExpression streamExpression = 
>> StreamExpressionParser.parse("search(" + collection + ", 
>> q=\""+filter+"\", fl=\""+filedsCommaSeparated+"\",rows=\""+count+"\", 
>> sort=\"id asc\")");
>> 
>> This works, but it does not support memory management like /export. Limiting 
>> rows by using start parameters slows down the process.
>> Can anyone help me how to achieve this ?
>> 
>> 
>> Thanks
>> Subhasis Patra
>> 240-755-2601
>> subhasis.pa...@e2open.com
>> 
> 



RE: Streaming of Documents with text columns (_txt)

2023-05-15 Thread Subhasis Patra
Thanks for your response. I am using dynamic schema. But I want to copy all 
_txt fields to _s fields. I know if I add copy statement in the managed schema 
file it will work , but I don’t want to do manual change. Is there a way I can 
use curl command to add copy command for all _txt columns in the managed schema 
file. 

Thanks
Subhasis Patra
240-755-2601
subhasis.pa...@e2open.com

-Original Message-
From: ufuk yılmaz  
Sent: Monday, May 15, 2023 6:39 AM
To: users@solr.apache.org
Subject: Re: Streaming of Documents with text columns (_txt)

PHISH ALERT! CHECK VALIDITY IF CLICKING, SHARING, RESPONDING


There may be an auto generated field already in your schema then. It’s name 
should be something like yourfield_str

—

> On 12 May 2023, at 15:12, Subhasis Patra  
> wrote:
>
> I am using solr in cloud mode with schemaless mode. I don’t want to 
> update/touch managed schema. Is there any way I can send the copy of those 
> fields using solj ? So the text value will be copied to string value.
>
>
> Thanks
> Subhasis Patra
> 240-755-2601
> subhasis.pa...@e2open.com
>
> -Original Message-
> From: ufuk yılmaz 
> Sent: Thursday, May 11, 2023 5:58 AM
> To: users@solr.apache.org
> Subject: Re: Streaming of Documents with text columns (_txt)
>
> PHISH ALERT! CHECK VALIDITY IF CLICKING, SHARING, RESPONDING
>
>
> My solution to this kind of situation is to have a docValues enabled 
> copyField for each text field in the schema, so I can export all of 
> the fields when necessary
>
> -ufuk yilmaz
>
> —
>
>> On 11 May 2023, at 05:08, Subhasis Patra  wrote:
>>
>> Hi All,
>>
>> I am using CloudSolrStream to get stream data for documents in Solr. I am 
>> using /export When documents have columns of type STRING, DATE, DOUBLE, 
>> LONG. It does not allow /export when documents have  _txt 
>> column(DocValues=false). So I use as below. I use _txt to support case 
>> insensitive search.
>>
>> StreamFactory factory = new
>> StreamFactory().withCollectionZkHost(collection, zkHost); 
>> StreamExpression streamExpression = 
>> StreamExpressionParser.parse("search(" + collection + ", 
>> q=\""+filter+"\", fl=\""+filedsCommaSeparated+"\",rows=\""+count+"\",
>> sort=\"id asc\")");
>>
>> This works, but it does not support memory management like /export. Limiting 
>> rows by using start parameters slows down the process.
>> Can anyone help me how to achieve this ?
>>
>>
>> Thanks
>> Subhasis Patra
>> 240-755-2601
>> subhasis.pa...@e2open.com
>>
>



Re: Help regarding solr request timeout because of spellcheck component performance.

2023-05-15 Thread Chris Hostetter


: timeAllowed does not limit spellcheck i have tried.

Hmmm  that doesn't sound right -- unless you are using a really old 
version of solr, any index based spellchecker (like Direct and WordBreak) 
should be respecting timeAllowed due to the underlying Lucene IndexReader 
enforcing it.

- What version of solr are you using?

- do you have enough query volume (and do these time outs happen often 
enough) that you can take a lot of threaddumps and identify any "hot 
spots" in the spellchecking code?

- if the problem is sporadic, do you see any "patterns" in the reuests 
that cause the problem (i'm specifically wondering about long query 
strings that might be triggering the WordBreak bug i linked to before)

- have you tried only using only one dictionary or the other to narrow 
down the problem?

- I know it comes from a documented example, but maxChanges=10 with 
WordBreak is excessive for most "real world" word combinations i've seen 
in practice, and exacerbates the problem in the WordBreak bug i linked to 
before.  does lowering that to something like2 or 3 reduce this problem?


: 
: Following are the spellcheck configuration.  Can you suggest something ?
: 
: 
:   text_general
: 
:   
:   
: default
: text
: solr.DirectSolrSpellChecker
: internal
: 0.5
: 2
: 1
: 5
: 4
: 0.01
: 
:   
: 
:   
:   
: wordbreak
: solr.WordBreakSolrSpellChecker
: name
: true
: true
: 10
:   
: 
: 
: 
: On Thu, 4 May 2023 at 06:34, Chris Hostetter 
: wrote:
: 
: >
: > 1) timeAllowed does limit spellcheck (at least in all the code paths i can
: > think of that may be "slow") ... have you tried it?
: >
: > 2) what is your configuration for the dictionaries you are using?
: >
: > 3) be wary of https://github.com/apache/lucene/issues/12077
: >
: >
: > : Date: Tue, 2 May 2023 00:04:27 +0530
: > : From: kumar gaurav 
: > : Reply-To: users@solr.apache.org
: > : To: solr-u...@lucene.apache.org, users@solr.apache.org
: > : Subject: Re: Help regarding solr request timeout because of spellcheck
: > : component performance.
: > :
: > : Just a reminder if someone can help here.
: > :
: > : On Mon, 24 Apr 2023 at 13:40, kumar gaurav  wrote:
: > :
: > : > ++ users@solr.apache.org
: > : >
: > : > On Mon, 24 Apr 2023 at 13:12, kumar gaurav  wrote:
: > : >
: > : >> HI Everyone
: > : >>
: > : >> I am getting a solr socket timeout exception in the select search
: > query
: > : >> because of bad spellcheck performance.
: > : >>
: > : >> I am using the spellcheck component in solr select request handler.
: > : >> solrconfig
: > : >>
: > : >> 
: > : >>
: > : >>   
: > : >> edismax
: > : >> true
: > : >> 1
: > : >> AND
: > : >> 100
: > : >> true
: > : >> 25
: > : >> false
: > : >> true
: > : >> true
: > : >> true
: > : >> false
: > : >> 10
: > : >> 150
: > : >> 100%
: > : >> default
: > : >> wordbreak
: > : >>   
: > : >>   
: > : >> spellcheck
: > : >>   
: > : >> 
: > : >>
: > : >>
: > : >> Do we have any time allowed parameter for spellcheck like query
: > : >> timeAllowed parameter ?
: > : >>
: > : >> how can i identify query timeout because of spellcheck component
: > process ?
: > : >>
: > : >> Please help. Thanks in advance.
: > : >>
: > : >>
: > : >>
: > : >> --
: > : >> Thanks & Regards
: > : >> Kumar Gaurav
: > : >>
: > : >
: > :
: >
: > -Hoss
: > http://www.lucidworks.com/
: >
: 

-Hoss
http://www.lucidworks.com/


Re: Debug time spent in aggregating the search results

2023-05-15 Thread Chris Hostetter


Ok, my mistake -- aparently this was all heavily changed in 9.0 and I 
didn't notice until you asked about it...

https://issues.apache.org/jira/browse/SOLR-14401
https://solr.apache.org/guide/solr/9_1/deployment-guide/performance-statistics-reference.html

So going back to my previous comment...

> Metrics like "QUERY./select.distrib.requestTimes" tell you the stats on
> handling a "distributed" request -- which is when a core is responsible to
> sending out "per-shard" requests and merging the responses.
> ...

...IIUC in SolrCloud multishard deployments, 9.x's 
"QUERY./select.requestTimes" is the equivilent of 8.x's 
"QUERY./select.distrib.requestTimes", while 
"QUERY./select[shard].requestTimes" is the equivilent of 8.x's 
"QUERY./select.local.requestTimes"

But this has really just attempted to simplify the metric naming -- so 
that monitoring systems can look at the same info, regardless of wether 
there are multishard collection/requests or not -- my caveats about how 
exactly this applies to your original question are still true...

> ...
> But it doesn't *only* include the "time spent in aggregating the search
> results from shards" ... it also includes the time spent determining 
> which requests to send to which shards, and waiting for the responses to 
> those (frequently concurrent) requests"




-Hoss
http://www.lucidworks.com/


About Solr 9.1.1 ERROR "o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling SolrCmdDistributor$Req: xxx => java.io.IOException: java.util.concurrent.TimeoutException: Total timeout 600

2023-05-15 Thread Mingchun Zhao
Hello,

We have recently seen the below errors repeated when update requests were
received by Solr. Could you please advise on how to deal with this issue.
The error and warning from the Solr log as below:


2023-05-09 10:03:05.128 ERROR
(updateExecutor-8-thread-11857-processing-test_shard1_replica_n1 core_node3
null stg-test-search--1.i.kandasearch.com:8983_solr test shard1) [c:test
s:shard1 r:

core_node3 x:test_shard1_replica_n1]
o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req: cmd=add{,id=xxx,commitWithin=3};

node=ForwardNode: http://test.com:8983/solr/test_shard1_replica_n2/

to http://test.com:8983/solr/test_shard1_replica_n2/

=> java.io.IOException: java.util.concurrent.TimeoutException: Total
timeout 60 ms elapsed
at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:197)
java.io.IOException: java.util.concurrent.TimeoutException: Total timeout
60 ms elapsed
at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:197)
~[?:?]
at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:151)
~[?:?]
at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:145)
~[?:?]
at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:207)
~[?:?]
at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:200)
~[?:?]
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:169)
~[?:?]
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:100)
~[?:?]
at
org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:80)
~[?:?]
at
org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:367)
~[?:?]
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:238)
~[?:?]
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:180)
~[?:?]
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
~[metrics-core-4.1.5.jar:4.1.5]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:271)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Suppressed: java.io.IOException: java.util.concurrent.TimeoutException:
Total timeout 60 ms elapsed
at
org.eclipse.jetty.client.util.DeferredContentProvider.flush(DeferredContentProvider.java:197)
~[?:?]
at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.flush(OutputStreamContentProvider.java:151)
~[?:?]
at
org.eclipse.jetty.client.util.OutputStreamContentProvider$DeferredOutputStream.write(OutputStreamContentProvider.java:145)
~[?:?]
at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:207)
~[?:?]
at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:200)
~[?:?]
at org.apache.solr.common.util.JavaBinCodec.close(JavaBinCodec.java:1286)
~[?:?]
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.marshal(JavaBinUpdateRequestCodec.java:99)
~[?:?]
at
org.apache.solr.client.solrj.impl.BinaryRequestWriter.write(BinaryRequestWriter.java:80)
~[?:?]
at
org.apache.solr.client.solrj.impl.Http2SolrClient.send(Http2SolrClient.java:367)
~[?:?]
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:238)
~[?:?]
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:180)
~[?:?]
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
~[metrics-core-4.1.5.jar:4.1.5]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:271)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.util.concurrent.TimeoutException: Total timeout 60 ms
elapsed
at
org.eclipse.jetty.client.HttpDestination$RequestTimeouts.onExpired(

Upgrade to solr cloud 9 from solr cloud 8.10

2023-05-15 Thread Saksham Gupta
Hi team, We are planning to migrate our solr cloud from solr version8.10 to
solr 9.
1. Is it okay to plan a rolling upgrade from solr8.10 to 9?
2. Is this a major update? If yes, what should be the upgrade procedure?


Tuning Merge Settings for Solr Cloud

2023-05-15 Thread Saksham Gupta
Hi team,
We use a solr cloud where more than 4 million search requests are served
and more than 50 million documents are updated daily.
We want to tune the merge configuration of solr to improve searching and
indexing performance.

1. Do we need to perform full indexing in order to bring the changes in
merge setting(segmentsPerTier) to action.
2. How to decide the ideal merge settings for my use case?
*Current merge settings:*

5
3