Stuart Bertram created SOLR-9550:
------------------------------------
Summary: innerJoin can succeed with bad sorting
Key: SOLR-9550
URL: https://issues.apache.org/jira/browse/SOLR-9550
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Affects Versions: 6.1
Environment: CentOS 6.8, OpenJDK 1.8
Reporter: Stuart Bertram
The innerJoin streaming function requires that both streams are ordered by the
correct keys for joining. In some situations, you can make a mistake and use an
incorrect sort order but get a successful (but incorrect) return.
Example:
* Collection "UserPosts" has columns: ID, ByUserID
* Collection "User" has columns: ID, Username, Registered, …
* Streaming query {{gatherNodes(User, gatherNodes(UserPosts, walk="42 69->ID",
gather="ByUserID"), walk="node->ID", gather="ID")}} returns the IDs of users
who made posts 42 and 69, but we want the full user details
* Streaming query {{innerJoin(sort(gatherNodes(User, gatherNodes(UserPosts,
walk="42 69->ID", gather="ByUserID"), walk="node->ID", gather="ID"), by="ID
asc"), search(User,qt="/export",q="*:*",fl="ID, Username, Registered, …",
sort="ID asc"), on="node=ID")}} (Note the {{sort(…, by="ID")}}, because we're
gathering the ID field, instead of {{sort(…, by="node")}}, because the gathered
nodes return a tuple with the gathered ID in the "node" field)
(Note: This example is simplified, so while there may be a better way to
perform this specific query, the concept and the underlying issue remains)
Expected result: Solr throws a (useful) exception saying that the sort orders
do not match the join (because the first stream is sorted by ID, but the join
is *node*=ID), as it does if the sort() call wasn't included.
Actual result: Solr believes the queries are correctly sorted and returns each
node from the first set joined with one set of values chosen from the second
stream (each row is joined to the *same* row), so the returned ID and node
values do not match, despite them being used in the join equality.
This seems like a simple mistake to make at first, as I was gathering IDs and
so automatically tried to sort by ID, but should have sorted by node.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]