Sorry, I'll add more context. The main collection is a sharded collection
with over ten shards and where each shard has 2 replicas. The from
collection (fromData) has a single shard and one replica in each of the
solr nodes.
The query I send is a Json Query, looking like:

{
  "filter":[{"join":{
        "query":{"lucene":{
            "query":"\"test\"",
            "df":"value_s"}},
        "from":"id",
        "to":"to_s",
        "fromIndex":"fromData"}},
    ],
  "offset":0,
  "query":"*:*",
  "limit":1,
  "params":{
    "TZ":"GMT+01:00",
    "timeAllowed":1800000},
  "fields":["id"]
}

It works perfectly fine when sending it to any random solr node, but it
fails when it gets sent from the coordinator query. Every other query that
doesn't have a join works fine, or at least I haven't found any other
problems.

Thanks

On Tue, 3 Mar 2026 at 17:38, Mikhail Khludnev <[email protected]> wrote:

> Hello,
> I'm in doubt. Assuming you use
>
> https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections
> Please confirm.
> There;s no exact coordinator test for shard joins here
>
> https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/search/join/ShardToShardJoinAbstract.java#L58
> But it creates 5 nodes for 3 shard collections, and I believe pick a
> coordinator randomly. So, we may expect it's working.
> Then, the error you provide might occur at "to"-node when it didn't find
> expected co-shard.
> I'm afraid we need to check shard alignment across cluster, and detailed
> request log across nodes. what exactly happened at coordinator and
> subordinate nodes.
> Regarding shards allocation: even if there's a node with a shard1 of "to"
> collection collocated with "from" shard1, nothing will stop the coordinator
> from attempting to search "to" shard1 at another node where "from" shard1
> is absent, and got the error like this.
>
> On Tue, Mar 3, 2026 at 6:02 PM Endika Posadas <[email protected]>
> wrote:
>
> > Hi,
> >
> > We're running dedicated coordinator nodes for query performance, with
> > collections that are properly co-located across data nodes.
> >
> >
> > When sending a join query (fromIndex pointing to a co-located collection)
> > through the coordinator, we get an error:
> >
> > "error":{
> >
> >
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> >     "msg":"SolrCloud join: To join with a collection that might not be
> > co-located, use method=crossCollection.",
> >     "code":400
> >   }
> >
> >
> > The same query works fine when sent directly to a data node.
> >
> > It seems like the coordinator is trying to resolve the join instead of
> > delegating it to the data nodes. Is there a workaround around this?
> >
> > Thanks
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to