Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-06-09 Thread via GitHub
ozankabak closed issue #14036: Un-cancellable Query when hitting many large files. URL: https://github.com/apache/datafusion/issues/14036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-05-30 Thread via GitHub
alamb commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2921953997 There is a new proposed fix in - https://github.com/apache/datafusion/pull/16196 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-02-26 Thread via GitHub
alamb commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2685750238 @carols10cents has made a benchmark here: - https://github.com/apache/datafusion/pull/14818 -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-09 Thread via GitHub
jeffreyssmith2nd commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2580216590 @berkaysynnada I'll take another stab at getting a reproducer for this that doesn't require customer data -- This is an automated message from the Apache Git Service.

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2579313671 We have also checkpoint tests which will drop the stream after some amount of time, and after the failure, FileStream offsets do not increment more. I think the same

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2578322453 I tried a bit today to re-create this but was not able to What I tried was to create a highly compressed parquet file (48MB that has 1B rows with all repeated strings) and

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
jeffreyssmith2nd commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2577862225 > How do you cancel the query? You mean terminating the next()'s on the stream, or dropping the stream. The queries are running in the context of a gRPC request,

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2577045685 How do you cancel the query? You mean terminating the next()'s on the stream, or dropping the stream. If it is the former, the issue might be related with the RepartitionE

[I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-07 Thread via GitHub
jeffreyssmith2nd opened a new issue, #14036: URL: https://github.com/apache/datafusion/issues/14036 ### Describe the bug **TLDR; Reading many large Parquet files can prevent a query from being cancelled.** We have a customer that is running a query similar to the following