terrymanu commented on issue #28182:
URL: 
https://github.com/apache/shardingsphere/issues/28182#issuecomment-3677691466

   Issue Understanding
   
     - Environment: ShardingSphere-Proxy 5.4.0. Database sharding by user_id 
(ds_${user_id % 2}); table sharding by web_id (website_${web_id % 2}, 
ntable_${web_id % 2}); website/ntable are bound tables.
     - Symptom: After running the sample DDL/DML, SELECT DISTINCT t1.id FROM 
website t1 LEFT JOIN ntable t2 ON t1.id = t2.id ORDER BY id; returns 1..8 on 
native MySQL single table, but only 1..4 via Proxy.
   
     Root Cause
   
     - Column id uses MySQL local AUTO_INCREMENT per physical shard and is not 
globally unique.
     - Sharding routes to ds_1 (all user_id = 1), and table shards split by 
web_id, producing two physical tables. Each table generates id = 1..4; after 
merge, DISTINCT collapses duplicates, leaving 1..4.
     - The right table ntable is empty; the LEFT JOIN does not filter rows, so 
the discrepancy is solely due to duplicate id values across shards.
   
     Problem Analysis
   
     - The config does not assign a distributed key to id; the Snowflake 
generator is on web_id and inserts explicitly set it, so id remains physical 
auto-increment.
     - At the logical layer, id repeats across shards; any de-duplication, 
ordering, or joins relying on id uniqueness diverge from single-database 
behavior.
     - This stems from sharding design: ShardingSphere does not merge 
auto-increment sequences into a global unique stream; consistent results 
require an explicit globally unique key or using the sharding key in 
identifiers.
   
     Conclusion
   
     - This is expected behavior from the current sharding design, not a LEFT 
JOIN bug. To align with single-database results, use a globally distributed key 
for the primary identifier (e.g., apply Snowflake to the primary key and let 
the engine generate it) or join/query using an identifier that includes the 
sharding key, avoiding reliance on per-shard auto-increment id.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to