AdamGS commented on issue #16452:
URL: https://github.com/apache/datafusion/issues/16452#issuecomment-2988819800

   Some more findings:
   1. `datafusion.optimizer.enable_dynamic_filter_pushdown` doesn't seem to 
make a difference
   2. Played around with the seeds, seems like the only one that's important to 
reproduce the issue is `query_seed`, it doesn't reproduce every time with it 
but changing it seems to make it to not reproduce in a reasonable time. The 
query it generates is:
   ```sql
   SELECT * FROM sort_fuzz_table ORDER BY interval_month_day_nano DESC LIMIT 3
   ```
   
   I also took a deeper look at the actual data and noticed some surprising 
things that might explain the failure:
   1. The top values in the column the query sorts on are often `null`, which 
makes me think there's a sort stability issue here (the implementation of 
`check_equality_of_batches` also points in that direction).
   2. A lot of the value in the table isn't really valid? when displayed we get 
a lot of values that are conversion errors of numbers into all kind of temporal 
types, like `Cast error: Failed to convert -6727098022243200000 to temporal for 
Date64`
   
   I wonder if what's happening all comes down to an unstable sort + when 
running on a multithreaded runtime events interleave in different ways which 
result in different overall outcomes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to