Hi Solr Group,

I am not sure the following is a viable use-case, welcoming input and any
implementation recommendations.

I would like to perform joins over two sharded collections.  Where docs are
routed to specific shards based on a date range and are the same for shards
in each collection.

I understand that this means that the replicas from each collection that
hold data to be joined need to be collated on the same Solr Server.   I
have read solutions that use ADD REPLICA to add a Collection B replica to
all SolrServers assuming Collection B has only one Shard.  For my use case
I need Collection B to have multiple shards.

*Collection A                Collection B              SolrServer *
Shard1_2020              Shard1_2020           172.33.0.1:8983_solr
Shard2_2021              Shard2_2021           172.33.0.2:8983_solr
Shard3_2022              Shard3_2022           172.33.0.3:8983_solr

I think my question comes down to how do I break shards by a date range,
and do it in a way that both Collections A and B would be defined by the
same date range?  If could reliably break shards by date, and know the date
range of the shard, I think I could use ADD REPLICA api to align.

Not sure a compositeId routing approach would work, but thinking an
implicit id may be hard to manage over time.

Is an approach like this viable, concerned a bit about
maintenance concerns, other ideas to support this join?

Note: I am considering this within Time series collections...

Matt

Reply via email to