Re: Index dependent groups of data

2021-09-16 Thread Cassandra Targett
As Shawn explained, when a TLOG replica is not the leader, it does not index the documents directly but pulls index segments from the leader. However, this operation is generally rather fast - within a second or two - since it copies the changed segments, not the full index (and 70 million docs

Re: Index dependent groups of data

2021-09-08 Thread lstusr 5u93n4
> Info you might already know: TLOG (and PULL) replicas do not index, > unless a TLOG replica is the leader, in which case it behaves exactly > like NRT. A PULL replica can never become leader. > > When you have TLOG or PULL replicas, Solr is only going to do indexing > on the shard leaders. Whe

Re: Index dependent groups of data

2021-09-07 Thread Shawn Heisey
On 9/7/2021 3:08 PM, Shawn Heisey wrote: I don't think there's a reliable way of asking Solr to tell you when all replications are complete. You could use the replication handler (/solr/corename/replication) to gather this info and compare info from the leader index with info from the follo

Re: Index dependent groups of data

2021-09-07 Thread Shawn Heisey
On 9/7/2021 10:01 AM, lstusr 5u93n4 wrote: Seems like our experimentation is showing that it doesn't at least for TLOG replica types. If we bound the query to the leaders, we can get accurate results immediately after the commit. If we don't add that restriction, sometimes the results sometimes w

Re: Index dependent groups of data

2021-09-07 Thread lstusr 5u93n4
> How about doing your queries against the leader only? This seems to work. We haven't been able to produce an instance where the primary data isn't there in the case where we bound the queries only to the leaders. > Solr is not transactional. You are assuming ACID properties, > but Solr does no

Re: Index dependent groups of data

2021-09-07 Thread Walter Underwood
How about doing your queries against the leader only? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 7, 2021, at 9:06 AM, lstusr 5u93n4 wrote: > >> Is there a particular reason for using TLOG replica types? > > We used to use NRT replica types

Re: Index dependent groups of data

2021-09-07 Thread Walter Underwood
> On Sep 7, 2021, at 9:01 AM, lstusr 5u93n4 wrote: > > Well that's kind of the crux of the issue. We're issuing a hard commit > which (from what I've read) appears to be a synchronous operation. So. when > the call comes back with a 200 http response code, we can be assured that > the operation h

Re: Index dependent groups of data

2021-09-07 Thread lstusr 5u93n4
> Is there a particular reason for using TLOG replica types? We used to use NRT replica types, but we switched to TLOG a year or two ago in order to prioritize indexing speed above all else, understanding that it might take a while for query results to be identical across replicas. This is the fi

Re: Index dependent groups of data

2021-09-07 Thread lstusr 5u93n4
> How long are you waiting between the hard commit and the query? > Are you waiting for the commit operation to return a response before you try to > query? Well that's kind of the crux of the issue. We're issuing a hard commit which (from what I've read) appears to be a synchronous operation. So.

Re: Index dependent groups of data

2021-09-03 Thread Nick Vladiceanu
Is there a particular reason for using TLOG replica types? For such a small cluster and the scenario you’ve described it sounds more reasonable to use NRT, that will (almost) guarantee that once you write your data - it’ll be (almost) immediately available on all the nodes. > On 3. Sep 2021,

Re: Index dependent groups of data

2021-09-03 Thread Shawn Heisey
On 9/3/2021 9:19 AM, lstusr 5u93n4 wrote: What we're seeing is the following: - index some data - issue a hard commit - issue a query for that data - sometimes the query gets routed to a replica that is not yet updated, and doesn't contain the data. How long are you waiting between the

Index dependent groups of data

2021-09-03 Thread lstusr 5u93n4
Hi All, We have a scenario where we need to: - index a group of data (group 1) - index a second group of data, sometimes querying for records that were added in group 1. We have 6 shards, each composed of two TLOG replicas. What we're seeing is the following: - index some data - issue a hard