Re: Prevent Loss of Documents after Implicit Sharding

2023-11-30 Thread Charlie Hull
I may have got this wrong, but I think it might be better to shard 
randomly, not on a value from one of your source documents, as otherwise 
certain searches will only hit some of the shards and possibly overload 
them.  This might also be the cause of the behaviour below.


Charlie

On 30/11/2023 04:36, Saksham Gupta wrote:

Hi All,
Pinging again for some assistance.

On Wed, Nov 29, 2023 at 7:11 PM Saksham Gupta 
wrote:


Hi Solr Developers,

Problem Statement

We have been using solr cloud with implicit sharding. The data of the
collection was divided into 8 shards. In order to reduce the response time,
we thought of sharding the data further.

Therefore we planned on sharding the solr data into 56 shards to reduce
response time. According to this sharding strategy, one of the values of a
multivalued field is being used to decide the shard of the document.

But this has led to loss of documents.

How is the loss Happening? Explaining the problem with an example:

Consider 3 solr Documents:

Doc1

{

FieldA: id21, id29, id60P;

Field2: val2;

}

Doc2

{

FieldA: id19, id9, id8P;

Field2: val1;

}

Doc1

{

FieldA: id101, id29, id108P;

Field2: val4;

}

While Querying on Solr:

Let’s consider the Query---  fq=FieldA: id21+id8+id108;

According to previous sharding, Doc1, Doc2, & Doc3 will be returned in
the results as the filter query matches with at least one values present in
each document i.e. id21 in Doc1, id8 in Doc2 and id108 in Doc3.


According to the new sharding, only Doc2 and Doc3 will be returned and
Doc1 will not be included in results because the query will be routed
only to the shards corresponding to values present in filter query i.e.
shard21,shard8,shard108 and Doc1 is present on shard60.

INDEXING


QUERYING ON THIS COLLECTION

And our query won’t even go to the shard that contains document1.
Therefore, document1 will not be returned in the results.

Probable Solutions

To deal with this, we can index the same document on multiple shards based
on all the values of the field. But handling indexing/deletion if the
values of this field is changed would be very complicated. So, this index
can be very complex to maintain.

Is this the most optimal way or is there a better way to achieve the goal
and avoid losing any documents?



--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II



Re: Prevent Loss of Documents after Implicit Sharding

2023-11-30 Thread Jan Høydahl
I thought multi valued field was not supported as routing field? 
You'll likely need to choose a single-valued stable property for routing and 
not a value that a field that a single document can have several different 
values for.

So have a look at your schema for other candidate single-valued routing fields. 
If you cannot find one, perhaps compositeID (i.e. hash-based) is better for 
you. Having 8 shards on CompositeID, you could easily go to 16 -> 32 -> 64 by 
splitting your existing shards. But also in that case you'd need to have some 
stable single-valued ID to route on if you want more efficient queries, not 
hitting all shards every time. 

https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html

Jan

> 29. nov. 2023 kl. 14:41 skrev Saksham Gupta 
> :
> 
> one of the values of a
> multivalued field is being used to decide the shard of the document.



Invalid JSON response with UUID field

2023-11-30 Thread Andrew Hankinson
Hi, 

I have a schema with a UUID field type configured as a unique key. 



I recently upgraded my Solr installation to 9.3 (from 7.6) and my application 
stopped working. It turns out that Solr has stopped encoding UUIDs as strings 
in the JSON response writer. 

Whereas before I would get:

"id":"76af09e3-db43-4e7e-a46f-9bf03e343db9",

Now I get:

"id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae,

Of course, UUIDs are not a valid JSON data type, so this looks like a bug to 
me? 

-Andrew

Re: Invalid JSON response with UUID field

2023-11-30 Thread Dmitri Maziuk

On 11/26/23 03:40, Andrew Hankinson wrote:


I recently upgraded my Solr installation to 9.3 (from 7.6) and my application 
stopped working. It turns out that Solr has stopped encoding UUIDs as strings 
in the JSON response writer.

Whereas before I would get:

"id":"76af09e3-db43-4e7e-a46f-9bf03e343db9",

Now I get:

"id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae,

Of course, UUIDs are not a valid JSON data type, so this looks like a bug to me?


Still quoted in 8.11.2 FWIW

Dima




Re: Invalid JSON response with UUID field

2023-11-30 Thread Mikhail Khludnev
It might have the same root cause like
https://issues.apache.org/jira/browse/SOLR-10653?filter=-3 Could you share
more details about your env setup: is it "SolrCloud"? is it /get or /select
? etc.

On Fri, Dec 1, 2023 at 12:05 AM Andrew Hankinson
 wrote:

> Hi,
>
> I have a schema with a UUID field type configured as a unique key.
>
>  multiValued="false" />
>
> I recently upgraded my Solr installation to 9.3 (from 7.6) and my
> application stopped working. It turns out that Solr has stopped encoding
> UUIDs as strings in the JSON response writer.
>
> Whereas before I would get:
>
> "id":"76af09e3-db43-4e7e-a46f-9bf03e343db9",
>
> Now I get:
>
> "id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae,
>
> Of course, UUIDs are not a valid JSON data type, so this looks like a bug
> to me?
>
> -Andrew



-- 
Sincerely yours
Mikhail Khludnev


Re: Invalid JSON response with UUID field

2023-11-30 Thread Andrew Hankinson
No SolrCloud, complete wipe and reindex of the data, select handler.

> On 1 Dec 2023, at 07:54, Mikhail Khludnev  wrote:
> 
> It might have the same root cause like
> https://issues.apache.org/jira/browse/SOLR-10653?filter=-3 Could you share
> more details about your env setup: is it "SolrCloud"? is it /get or /select
> ? etc.
> 
>> On Fri, Dec 1, 2023 at 12:05 AM Andrew Hankinson
>>  wrote:
>> 
>> Hi,
>> 
>> I have a schema with a UUID field type configured as a unique key.
>> 
>> > multiValued="false" />
>> 
>> I recently upgraded my Solr installation to 9.3 (from 7.6) and my
>> application stopped working. It turns out that Solr has stopped encoding
>> UUIDs as strings in the JSON response writer.
>> 
>> Whereas before I would get:
>> 
>> "id":"76af09e3-db43-4e7e-a46f-9bf03e343db9",
>> 
>> Now I get:
>> 
>> "id":1b5230fb-a15d-4aea-8720-8e0a1c6e47ae,
>> 
>> Of course, UUIDs are not a valid JSON data type, so this looks like a bug
>> to me?
>> 
>> -Andrew
> 
> 
> 
> --
> Sincerely yours
> Mikhail Khludnev