Streaming of Documents with text columns (_txt)

2023-05-10 Thread Subhasis Patra
Hi All, I am using CloudSolrStream to get stream data for documents in Solr. I am using /export When documents have columns of type STRING, DATE, DOUBLE, LONG. It does not allow /export when documents have _txt column(DocValues=false). So I use as below. I use _txt to support case insensitive

Re: streaming expressions - sharding memory usage

2023-05-10 Thread Joel Bernstein
Unfortunately Solr right now doesn't have a great answer for the kind of bulk extract with scoring that you're doing. The export handler is designed for bulk extract but doesn't score. The select handler is designed for top N retrieval with scoring. I'm surprised that single shard bulk extract with

Re: streaming expressions - sharding memory usage

2023-05-10 Thread Sergio García Maroto
Thanks Joel for your answer. Actually what I need is to return scores from different collections and make some calculations on the scores to retrieve at the end people. Let me show you a more complex sample. This is really fast on all collections in the same servers but very slow once sharding tak

Re: streaming expressions - sharding memory usage

2023-05-10 Thread Joel Bernstein
So the first thing I see is that you're doing a search using the select handler, which is required to sort by score. So in this scenario you will run into deep paging issues as you increase the number of rows. This will effect both memory and performance. A search using the export handler will impr

Re: streaming expressions - sharding memory usage

2023-05-10 Thread Sergio García Maroto
Sure. Let's start by the simplest stream expression. This one only targets person collection. *Stream Expression:* search(person, q="SmartSearchS:"france [$CU] [$PRJ] [$REC] "~100)^4 OR (SmartSearchS:"france [$CU] [$PRJ] [$RECL] "~100)^3 OR (SmartSearchS:"france [$CU] [$PRJ] "~100)^2) OR (((Sm