akurmustafa commented on issue #15665: URL: https://github.com/apache/datafusion/issues/15665#issuecomment-2800060351
Hi @lalaorya I didn't reproduce the plans you generated locally, so my thoughts might be wrong or misleading. However, here are my thoughts regarding your problem: > What "GlobalLimitExec requires a single input partition" means in this context? `GlobalLimitExec` operation requires single partititon at its input for sucessful operation. However, in your failied plan its input has 2 partitions (1 comes from the source `ParquetExec` and other comes from the source `MemoryExec`). And after `UnionExec` this results in total of 2 output partitions. Hence, the plan is wrong and cannot be executed in `Datafusion`. I would expect `Datafusion` to not produce this plan. Also, even if your first query doesn't fail I think it is still wrong. That plan would produce more than 40 rows once it is executed (At least this is what I expect). > Is there a specific pattern I should follow when using LIMIT with an offset in DataFusion? No, your query is correct. In this case, generated plan is wrong and it probably stems from an optimization bug. > Are there any configuration settings or query modifications that could help resolve this issue? I think, setting `datafusion.execution.target_partitions`to `1` might help. This way your plan won't be multiple partitions and will be correct. If you can give full reproducer (such as creation of tbl data source), this would help to reproduce bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org