Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918429534 yep, it should be merged after every point is clear, to reduce review burden -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113294832 ## datafusion/datasource/src/file_format.rs: ## @@ -120,7 +121,26 @@ pub trait FileFormatFactory: Sync + Send + GetExt + fmt::Debug { &self,

[PR] Reduce size of `Expr` struct [datafusion]

2025-05-28 Thread via GitHub
hendrikmakait opened a new pull request, #16207: URL: https://github.com/apache/datafusion/pull/16207 ## Which issue does this PR close? - Closes #16199. ## What changes are included in this PR? * Add a test for the size of `Expr` * Change `Expr::WindowFunction

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada commented on PR #16166: URL: https://github.com/apache/datafusion/pull/16166#issuecomment-2918419765 > Thank you for this contribution @berkaysynnada and @mertak-synnada > > I am a little confused about the new structure and exactly what problem is being solved with this

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113286434 ## datafusion/common/src/config.rs: ## @@ -1612,42 +1623,241 @@ impl TableOptions { }; e.0.set(key, value) } +} -/// Initializes

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112282534 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +imp

Re: [PR] chore: manual "git bisect" to try and determine when CI failures started [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove closed pull request #1804: chore: manual "git bisect" to try and determine when CI failures started URL: https://github.com/apache/datafusion-comet/pull/1804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Chore: Moved strings expressions to separate file [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1792: URL: https://github.com/apache/datafusion-comet/pull/1792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] union all +aggregate function in the recursive cte results an infinite loop [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer commented on issue #1131: URL: https://github.com/apache/datafusion-python/issues/1131#issuecomment-2916920418 Ok to close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on code in PR #16157: URL: https://github.com/apache/datafusion/pull/16157#discussion_r2112300128 ## docs/source/user-guide/sql/ddl.md: ## @@ -91,6 +93,23 @@ STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` +:::{note} Review Comment: >

Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]

2025-05-28 Thread via GitHub
xudong963 merged PR #16157: URL: https://github.com/apache/datafusion/pull/16157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916952966 🤔 testing on my machine your adapted version of the code still just keeps on running. ctrl-c does nothing. The only change I've made is to replace `tokio::test` with `tokio::ma

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-28 Thread via GitHub
kazantsev-maksim commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112325535 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with Ada

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112282032 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [I] Spike: evaluate if cuDF can be used with datafusion-python [datafusion-python]

2025-05-28 Thread via GitHub
paleolimbot commented on issue #936: URL: https://github.com/apache/datafusion-python/issues/936#issuecomment-2916913205 Just two things I was involved in that may be useful here: - `cudf::from_arrow()`: https://github.com/rapidsai/cudf/blob/2789fa83d943649b982493d68bbba852f848d82c/c

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2916890212 @alamb, how about starting test next week? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
rluvaton commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916913507 I still think there is a bug here: For this test (when running on main): ```scala test("debug datafusion native filter") { val schema = StructType( Seq(

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2916919380 Sorry for late, I'll check tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[I] Interuptable queries in jupyter notebooks [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer opened a new issue, #1136: URL: https://github.com/apache/datafusion-python/issues/1136 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As a user, if I have written a query that takes a long time, I want to be able t

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2916950224 > Sorry for late, I'll check tomorrow (feel free to directly invite me to review by the button, then I'll notice more) I'm not able to request reviews. I think only commiters

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-28 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2916986710 > [@alamb](https://github.com/alamb), how about starting test next week? I think that would be a great idea. Thanks @xudong963 -- This is an automated message from the Ap

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2917010803 Just tested on Linux. With `USE_TASK = false` I see this ``` Running query; will time out after 5 seconds InfiniteStream::poll_next 1 times InfiniteStream::po

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2917020530 BTW the SpawnService is what should be used now: https://github.com/apache/arrow-rs-object-store/pull/332 Sadly, the docs are broken for the current version of object_store so I

Re: [PR] Propagate .execute() calls immediately in `RepartitionExec` [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16093: URL: https://github.com/apache/datafusion/pull/16093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Interuptable queries in jupyter notebooks [datafusion-python]

2025-05-28 Thread via GitHub
kylebarron commented on issue #1136: URL: https://github.com/apache/datafusion-python/issues/1136#issuecomment-2917052106 See https://pyo3.rs/v0.25.0/faq.html#ctrl-c-doesnt-do-anything-while-my-rust-code-is-executing and https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#method.ch

Re: [I] RepartitionExec not immediately propagating `.execute()` calls to children [datafusion]

2025-05-28 Thread via GitHub
alamb closed issue #16088: RepartitionExec not immediately propagating `.execute()` calls to children URL: https://github.com/apache/datafusion/issues/16088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2917056155 Thank you @kosiew. Clearly what we have now needs work but I think I'd like to defer cleaning this up until some other folks try to implement more things with these APIs

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916340347 Thanks @rluvaton but this PR does not appear to help with the CI issue (the tests are still failing - see https://github.com/apache/datafusion-comet/actions/runs/15278587551/j

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2916186751 Another solution is using CoalescePartitionsExec to wrapper: ```rust diff --git a/datafusion/physical-plan/src/coalesce_partitions.rs b/datafusion/physical-plan/src/

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
rluvaton commented on code in PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#discussion_r2111820738 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2406,19 +2406,19 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916217497 Hi @alamb , i believe we also can do the clickbench benchmark for this PR. But i am not confident about the result since it seems we will always add some overhead to aggrega

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
rluvaton commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916409442 I thought everything that came from JVM is reusing buffers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-05-28 Thread via GitHub
Copilot commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2112167277 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,6 +225,23 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +if let DataTy

Re: [PR] Change default SQL mapping for `VARCAHR` from `Utf8` to `Utf8View` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2916703395 I'll plan to merge this tomorrow unless I hear anything different -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] fix: Re-enable Spark 4 tests on Linux [datafusion-comet]

2025-05-28 Thread via GitHub
comphead commented on code in PR #1806: URL: https://github.com/apache/datafusion-comet/pull/1806#discussion_r2112158414 ## .github/workflows/pr_build_linux_spark4.yml: ## @@ -50,10 +49,53 @@ jobs: java_version: [17] test-target: [java] spark-version:

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2112147587 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-05-28 Thread via GitHub
comphead commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2112176250 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,6 +225,23 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +if let DataT

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16191: URL: https://github.com/apache/datafusion/pull/16191#discussion_r2112173381 ## datafusion/execution/src/disk_manager.rs: ## @@ -91,6 +177,11 @@ pub struct DiskManager { } impl DiskManager { +/// Creates a builder for [DiskManager] +

[PR] perf: Only add CopyExec if source of `ScanExec` is `native_comet` [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove opened a new pull request, #1808: URL: https://github.com/apache/datafusion-comet/pull/1808 ## Which issue does this PR close? N/A ## Rationale for this change Avoid adding unnecessary copies. Thanks to @rluvaton for noticing this issue in https

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-05-28 Thread via GitHub
comphead commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r211218 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,6 +225,23 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +if let DataT

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2916219267 Hi @alamb , i believe we also can do the clickbench benchmark for this PR. But i am not confident about the result since it seems we will always add some overhead to aggregate. T

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-28 Thread via GitHub
onlyjackfrost commented on code in PR #16181: URL: https://github.com/apache/datafusion/pull/16181#discussion_r2112068788 ## datafusion/datasource/src/source.rs: ## @@ -58,8 +58,61 @@ use datafusion_physical_plan::filter_pushdown::{ /// Requires `Debug` to assist debugging ///

Re: [PR] doc: add diagram to describe how DataSource, FileSource, and DataSourceExec are related [datafusion]

2025-05-28 Thread via GitHub
onlyjackfrost commented on PR #16181: URL: https://github.com/apache/datafusion/pull/16181#issuecomment-2916577526 @alamb @xudong963 I've updated the diagram based on the feedback. thanks you @xudong963 =D -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916774947 @pepijnve It works for me, the change code is here: ```rust tokio = { workspace = true, features = ["macros", "signal"]} ``` ```rust use arrow::arra

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-05-28 Thread via GitHub
comphead commented on code in PR #16203: URL: https://github.com/apache/datafusion/pull/16203#discussion_r2112201561 ## datafusion/functions-nested/src/extract.rs: ## @@ -225,6 +225,23 @@ where return Ok(Arc::new(NullArray::new(array.len(; } +if let DataT

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916779875 Interesting, it seems give me an example which we can use in datafusion-cli to support cancel quickly! -- This is an automated message from the Apache Git Service. To res

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917076809 Thank you @liamzwbao -- this looks good to me. I'll start some benchmarks on this PR and as long as that looks good this PR looks nice to me Thanks again -- This is an automa

Re: [PR] Propagate .execute() calls immediately in `RepartitionExec` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16093: URL: https://github.com/apache/datafusion/pull/16093#issuecomment-2917050839 Looks all good to me, so let's go! 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917078781 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2917083133 > I'm not able to request reviews. I think only commiters can do that and I'm not a commiter (yet). I think you will need to do the gitbox thing with your apache account (when i

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112798304 ## datafusion/expr/src/logical_plan/tree_node.rs: ## @@ -400,6 +403,8 @@ impl LogicalPlan { mut f: F, ) -> Result { match self { +

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917657542 > The results are a little inconsistent. __scalar_sq_2."avg(e3.salary)", __scalar_sq_2.dept_id are not valid fields in the above context. Ideally, all the field in e1, e2 and e

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917689696 So here are my thoughts (this plan is to split the work in smaller PRs) while avoid breaking things as much as possible: 1. we introduce 3 optimizors, declared in the order b

Re: [I] Spark-compatible CAST operation [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on issue #11201: URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2917688776 The comet implementation already has a `PhysicalExpr` for cast. I was thinking if we could make it datafusion compatible(perhaps it already is) and while making physical exp

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich commented on code in PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809#discussion_r2112818183 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -321,8 +321,6 @@ public void init() throws Throwable { } long[]

[PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich opened a new pull request, #1809: URL: https://github.com/apache/datafusion-comet/pull/1809 ## Which issue does this PR close? Partially address #1542. ## Rationale for this change ## What changes are included in this PR? We valid

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich commented on code in PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809#discussion_r2112818597 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -613,7 +611,10 @@ public void close() throws IOException { @SuppressWarn

[I] perf: only check Parquet type once in NativeBatchReader [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich opened a new issue, #1810: URL: https://github.com/apache/datafusion-comet/issues/1810 NativeBatchReader calls `checkParquetType` on all of the columns on every invocation of `loadNextBatch`. I tried moving it up to `init` but some Spark SQL tests expect the exceptions that this

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917717164 or an easiest way is to have a large feature branch :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917741777 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917721651 > Beware that this error is thrown after the planning stage has completed, and it is expected because the current limitation of subquery decorrelation. Oh I was under the i

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917730231 true, i've just realized it. Looks like a feature branch for us to work on is the way then? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra merged PR #1785: URL: https://github.com/apache/datafusion-comet/pull/1785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
irenjj commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917853644 It looks like @duongcongtoai addressed the depth issue in #16016. Maybe this PR can be merged with #16016 to better verify the depth-related problem? -- This is an automated messag

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-28 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112941516 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -287,6 +287,105 @@ pub enum LogicalPlan { Unnest(Unnest), /// A variadic query (e.g. "Recursive CTEs")

[I] Excessive Arc-clone in HashJoinStream with StringView on build-side [datafusion]

2025-05-28 Thread via GitHub
ctsk opened a new issue, #16206: URL: https://github.com/apache/datafusion/issues/16206 ### Describe the bug An unfortunate pattern in the hash join implementation leads to excessive Arc-cloning: Assume the build-side carries a string-view column as a payload. Let N be the number of

Re: [PR] fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on PR #1785: URL: https://github.com/apache/datafusion-comet/pull/1785#issuecomment-2917762700 @andygrove @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2917789948 > I still think there is a bug here: > > For this test (when running on main): > > ```scala > test("debug datafusion native filter") { > val schema = Struc

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112886589 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809#issuecomment-2917796361 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1809?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: translate missing or corrupt file exceptions in NativeUtil, fall back native scans if asked to ignore [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2917808380 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1765?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Spark : Fix AQE Tests [datafusion-comet]

2025-05-28 Thread via GitHub
coderfender opened a new pull request, #1811: URL: https://github.com/apache/datafusion-comet/pull/1811 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these chang

Re: [PR] fix: translate missing or corrupt file exceptions in NativeUtil, fall back native scans if asked to ignore [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2917912411 @mbutrovich looks like this is causing ci failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra merged PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] Spark : Fix AQE Tests [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1811: URL: https://github.com/apache/datafusion-comet/pull/1811#issuecomment-2917941265 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1811?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

<    1   2