Re: [I] [Epic] A collection of FFI related tasks [datafusion]

2025-03-27 Thread via GitHub
alamb commented on issue #15283: URL: https://github.com/apache/datafusion/issues/15283#issuecomment-2742866129 > [#14842](https://github.com/apache/datafusion/issues/14842) the Ffi TableProvider doesn't seem to work anymore Added -- This is an automated message from the Apache Git

Re: [I] [EPIC] Add Decimal support [datafusion]

2025-03-27 Thread via GitHub
alamb commented on issue #3523: URL: https://github.com/apache/datafusion/issues/3523#issuecomment-2759323169 Thanks @lostmygithubaccount -- I made a ticket to fix that: - https://github.com/apache/datafusion/issues/15464 -- This is an automated message from the Apache Git Service. To

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15409: URL: https://github.com/apache/datafusion/pull/15409#issuecomment-2759400185 @2010YOUY01 and @alan910127 -- do you have time to review this PR as you filed / expressed interested on https://github.com/apache/datafusion/issues/15088#top -- This is an auto

Re: [I] Analysis to support`SortPreservingMerge` --> `ProgressiveEval` [datafusion]

2025-03-27 Thread via GitHub
alamb commented on issue #15191: URL: https://github.com/apache/datafusion/issues/15191#issuecomment-2759403638 > > * Specifically [feat: Add `ProgressiveEval` operator  #10490](https://github.com/apache/datafusion/pull/10490) > > Hi [@alamb](https://github.com/alamb) [@wiedld](https:

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759361675 > My main concern is that regardless of what phase (optimizer or runtime) the pushdown happens if what we push down is a custom PhysicalExpr we're going to have trouble making that pla

[I] Update ClickBench queries to avoid `to_timestamp_seconds` [datafusion]

2025-03-27 Thread via GitHub
alamb opened a new issue, #15465: URL: https://github.com/apache/datafusion/issues/15465 ### Is your feature request related to a problem or challenge? For some reason the DataFusion version of the ClickBench queries use the `to_timestamp_seconds` function: https://github.com/apac

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2759391428 Also I will redouble my efforts to try and focus to get filter pushdown in by default -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Triggering extended tests through PR comment [datafusion]

2025-03-27 Thread via GitHub
danila-b commented on PR #15101: URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2742784619 > Is this ready for review or is there something outstanding for it to be still in draft? It should be ready for a review, rebased on the latest main and tested it a bit more

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-27 Thread via GitHub
alan910127 commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r2011082531 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> { right: Expr, right_schema:

Re: [PR] refactor: move `CteWorkTable`, `default_table_source` a bunch of files out of core [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15316: URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2744189466 > I thought it mattered because `datasource` has a dependency on `catalog` but on a second look it is only `Session`. Any plans on moving `Session` to `execution` ? I don't kno

Re: [PR] fix: check if handle has been initialized before closing [datafusion-comet]

2025-03-27 Thread via GitHub
andygrove commented on code in PR #1554: URL: https://github.com/apache/datafusion-comet/pull/1554#discussion_r2003770078 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -459,7 +459,9 @@ public void close() throws IOException { importer = nu

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15039: URL: https://github.com/apache/datafusion/pull/15039#issuecomment-2746381811 THanks again ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Draft: Use take-in kernel in repartitioning [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15392: URL: https://github.com/apache/datafusion/pull/15392#issuecomment-2759510327 I tried to run the clickbench queries using bench.sh and I got an error like this: ``` Q1: SELECT COUNT(DISTINCT "HitColor"), COUNT(DISTINCT "BrowserCountry"), COUNT(DISTINCT

Re: [PR] Migrate subtrait tests to insta [datafusion]

2025-03-27 Thread via GitHub
blaginin commented on code in PR #15444: URL: https://github.com/apache/datafusion/pull/15444#discussion_r2017622164 ## datafusion/substrait/tests/cases/emit_kind_tests.rs: ## @@ -109,18 +117,28 @@ mod tests { let df = ctx.sql("SELECT a + 1, b + 2 FROM data").await?;

Re: [PR] Migrate subtrait tests to insta [datafusion]

2025-03-27 Thread via GitHub
blaginin commented on PR #15444: URL: https://github.com/apache/datafusion/pull/15444#issuecomment-2759517667 > Closes https://github.com/apache/datafusion/issues/15398. If you plan to do more prs, consider changing it to "related", so the ticket won't get auto closed -- This is an

[PR] (WIP) Start working on upgrading to arrow 55 [datafusion]

2025-03-27 Thread via GitHub
alamb opened a new pull request, #15466: URL: https://github.com/apache/datafusion/pull/15466 I want to be able to test stuff from arrow-rs (specifically filter pushdown) using datafusion benchmarks, so the first step to get main going This will also be a step testing how the upgrade

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017541733 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is us

Re: [I] Update ClickBench queries to avoid `to_timestamp_seconds` [datafusion]

2025-03-27 Thread via GitHub
alamb commented on issue #15465: URL: https://github.com/apache/datafusion/issues/15465#issuecomment-2759430803 > > However that function does timestamp validation and potentially slows down queries and prevents other optimizations (for example what  [@adriangb](https://github.com/adriangb) 

Re: [PR] Mysql: Add support for := operator [datafusion-sqlparser-rs]

2025-03-27 Thread via GitHub
iffyio merged PR #1779: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] chore(deps): bump clap from 4.5.32 to 4.5.34 [datafusion]

2025-03-27 Thread via GitHub
comphead merged PR #15452: URL: https://github.com/apache/datafusion/pull/15452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2016457111 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [I] Weekly Plan (Andrew Lamb) March 3, 2025 [datafusion]

2025-03-27 Thread via GitHub
mkarbo commented on issue #14978: URL: https://github.com/apache/datafusion/issues/14978#issuecomment-2758445838 Are you no longer looking at #14595 @alamb? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Re-Add CodeCov [datafusion]

2025-03-27 Thread via GitHub
blaginin commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2758447609 Thank you! The feedback I saw in that discussion: - You asked if sqllogictests contribute to the coverage. Yes, here's the [example](https://app.codecov.io/gh/blaginin/datafus

Re: [I] Migrate subtrait tests to `insta` [datafusion]

2025-03-27 Thread via GitHub
blaginin commented on issue #15398: URL: https://github.com/apache/datafusion/issues/15398#issuecomment-2758609569 Hey, @qstommyshu > I assume I should change all the test cases under the subtrait folder for this issue, NOT just the ones in [consumer_integration.rs](https://github.c

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2759160812 Run extended tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Docs: Formatting and Added Extra resources [datafusion]

2025-03-27 Thread via GitHub
2SpaceMasterRace commented on PR #15450: URL: https://github.com/apache/datafusion/pull/15450#issuecomment-2759166790 @alamb @berkaysynnada let me know if this is good enough ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Add short circuit [datafusion]

2025-03-27 Thread via GitHub
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2759162126 Some tests failed, so let me take a look at what exactly is going on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Improve spill performance: Disable re-validation of spilled files [datafusion]

2025-03-27 Thread via GitHub
alamb commented on code in PR #15454: URL: https://github.com/apache/datafusion/pull/15454#discussion_r2017481711 ## Cargo.toml: ## @@ -87,7 +87,7 @@ ahash = { version = "0.8", default-features = false, features = [ "runtime-rng", ] } apache-avro = { version = "0.17", de

Re: [I] [DISCUSS] Consider Vendoring Certain Dependencies [datafusion]

2025-03-27 Thread via GitHub
ozankabak commented on issue #15360: URL: https://github.com/apache/datafusion/issues/15360#issuecomment-2759173968 IMO dependency creep is the most important issue here. For the others, we can run experiments to see what happens to compilation times and debug mode binary sizes. I don't thi

Re: [PR] Migrate subtrait tests to insta [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15444: URL: https://github.com/apache/datafusion/pull/15444#issuecomment-2759206571 > And there are many more functions like this, I can't simply change them as they accepts dynamically generated `plan_str` and `plan.schema()`. Changing them into `assert_snapshot!` wi

Re: [I] [Bug] datafusion-cli may fail to read csv files [datafusion]

2025-03-27 Thread via GitHub
alamb commented on issue #15456: URL: https://github.com/apache/datafusion/issues/15456#issuecomment-2759119415 The problem can be reproduced without the tpchdbgen CLI: Step 1: download [part.zip](https://github.com/user-attachments/files/19492616/part.zip) Step 2: unzip ```shel

Re: [PR] Draft: Use take-in kernel in repartitioning [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15392: URL: https://github.com/apache/datafusion/pull/15392#issuecomment-2759221389 I am firing up the benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Triggering extended tests through PR comment: `Run extended tests` [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15101: URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2759156604 Thanks again @danila-b ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Docs: Formatting and Added Extra resources [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15450: URL: https://github.com/apache/datafusion/pull/15450#issuecomment-2759288089 BTW @oznur-synnada I wonder if you have time to update the page with other recent blog content 🤔 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Change default `EXPLAIN` format in `datafusion-cli` to `tree` format [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15427: URL: https://github.com/apache/datafusion/pull/15427#issuecomment-2759296025 Run extended tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Add documentation for `Run extended tests` command [datafusion]

2025-03-27 Thread via GitHub
alamb opened a new pull request, #15463: URL: https://github.com/apache/datafusion/pull/15463 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/14319 - Follow on to https://github.com/apache/datafusion/pull/15101 ## Rationale

Re: [PR] Improve spill performance: Disable re-validation of spilled files [datafusion]

2025-03-27 Thread via GitHub
alamb commented on PR #15454: URL: https://github.com/apache/datafusion/pull/15454#issuecomment-2759177065 Thank you @zebsme -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: aggregation corner case [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on PR #15457: URL: https://github.com/apache/datafusion/pull/15457#issuecomment-2759900781 but why don't we have "value" column in logical plan schema -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] fix: aggregation corner case [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on PR #15457: URL: https://github.com/apache/datafusion/pull/15457#issuecomment-2759897290 Looks correct ``` query TT explain with test AS (SELECT i as needle FROM generate_series(1, 10) t(i)) select count(*) from test WHERE 1 = 1; logical_plan

Re: [PR] refactor: Move `Memtable` to catalog [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15459: URL: https://github.com/apache/datafusion/pull/15459#discussion_r2017757243 ## datafusion/catalog/src/memory/table.rs: ## @@ -0,0 +1,377 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] added explaination for Schema and DFSchema to documentation [datafusion]

2025-03-27 Thread via GitHub
Jiashu-Hu commented on code in PR #15329: URL: https://github.com/apache/datafusion/pull/15329#discussion_r2006264327 ## docs/source/library-user-guide/working-with-exprs.md: ## @@ -50,6 +50,29 @@ As another example, the SQL expression `a + b * c` would be represented as an `E

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
suibianwanwank commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2760078487 > Another concern with a dynamic physicalexpr: more lock contention. Presumably every time it's evaluated (for each row?) we need to acquire a lock to read from the TopK heap.

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2736411496 I think this is just part of the picture. To fully match DuckDB we'd have to do something like the rewrite proposed in https://github.com/apache/datafusion/issues/15177#issuecomment

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-27 Thread via GitHub
zhuqi-lucas commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2760109748 Updated the latest clickbench for the current main compare the default mapping varchar to utf8view: ```rust python3 ./compare.py /tmp/main /tmp/Utf8ViewDefault ┏

Re: [PR] Migrate tests to insta [datafusion]

2025-03-27 Thread via GitHub
blaginin commented on code in PR #15288: URL: https://github.com/apache/datafusion/pull/15288#discussion_r2004160683 ## datafusion/core/tests/parquet/custom_reader.rs: ## @@ -96,17 +97,15 @@ async fn route_data_access_ops_to_parquet_file_reader_factory() { let task_ctx = s

[PR] Support `Accumulator` for avg duration [datafusion]

2025-03-27 Thread via GitHub
shruti2522 opened a new pull request, #15468: URL: https://github.com/apache/datafusion/pull/15468 ## Which issue does this PR close? - Closes #15458 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] fix: aggregation corner case [datafusion]

2025-03-27 Thread via GitHub
chenkovsky commented on PR #15457: URL: https://github.com/apache/datafusion/pull/15457#issuecomment-2759908753 > but why don't we have "value" column in logical plan schema it's also an alternative solution. but i think that count(*) actually doesnt depends on any column on input log

Re: [PR] Migrate subtrait tests to insta, part1 [datafusion]

2025-03-27 Thread via GitHub
qstommyshu commented on PR #15444: URL: https://github.com/apache/datafusion/pull/15444#issuecomment-2759953861 > In general I suggest we get the basic easy to port tests done and then work on the others as a follow on PR Got it, thanks @alamb and @blaginin for the help. I th

Re: [PR] Add documentation for `Run extended tests` command [datafusion]

2025-03-27 Thread via GitHub
comphead commented on code in PR #15463: URL: https://github.com/apache/datafusion/pull/15463#discussion_r2017751751 ## docs/source/contributor-guide/testing.md: ## @@ -79,8 +79,15 @@ than the standard test suite and add important test coverage such as that the code works when

Re: [PR] feat: make parquet native scan schema case insensitive [datafusion-comet]

2025-03-27 Thread via GitHub
wForget commented on code in PR #1575: URL: https://github.com/apache/datafusion-comet/pull/1575#discussion_r2017798341 ## native/core/src/parquet/parquet_exec.rs: ## @@ -110,6 +110,7 @@ fn get_options(session_timezone: &str) -> (TableParquetOptions, SparkParquetOpti let m

Re: [I] [EPIC] Add Decimal support [datafusion]

2025-03-27 Thread via GitHub
niebayes commented on issue #3523: URL: https://github.com/apache/datafusion/issues/3523#issuecomment-2760010836 This https://github.com/apache/arrow-rs/issues/7343 might help write tests involving decimal data type. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add `downcast_to_source` method for `DataSourceExec` [datafusion]

2025-03-27 Thread via GitHub
mertak-synnada commented on code in PR #15416: URL: https://github.com/apache/datafusion/pull/15416#discussion_r2011761950 ## datafusion/datasource/src/source.rs: ## @@ -230,4 +231,17 @@ impl DataSourceExec { Boundedness::Bounded, ) } + +/// Downca

Re: [PR] minor: Allow to run TPCH bench for a specific query [datafusion]

2025-03-27 Thread via GitHub
comphead commented on PR #15467: URL: https://github.com/apache/datafusion/pull/15467#issuecomment-2760120806 > Thank you @comphead , good improvement, may be a follow-up for other benchmark also support pass the specific query. Thanks @zhuqi-lucas I was meaning to add for in mem benc

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-27 Thread via GitHub
Kontinuation commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2017871489 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -498,81 +502,63 @@ impl ShuffleRepartitioner { Ok(()) } -/// Writes buff

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-27 Thread via GitHub
Kontinuation commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2017870588 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -422,27 +432,29 @@ impl ShuffleRepartitioner { .collect::>>()?;

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-27 Thread via GitHub
Kontinuation commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2017870588 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -422,27 +432,29 @@ impl ShuffleRepartitioner { .collect::>>()?;

Re: [PR] Add `FileScanConfigBuilder` [datafusion]

2025-03-27 Thread via GitHub
mertak-synnada commented on code in PR #15352: URL: https://github.com/apache/datafusion/pull/15352#discussion_r2015906250 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -645,6 +875,7 @@ impl FileScanConfig { // TODO: This function should be moved into DataSource

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
YjyJeff commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2015964312 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +737,159 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Triggering extended tests through PR comment: `Run extended tests` [datafusion]

2025-03-27 Thread via GitHub
danila-b commented on code in PR #15101: URL: https://github.com/apache/datafusion/pull/15101#discussion_r2016062967 ## .github/workflows/pr_comment_commands.yml: ## @@ -0,0 +1,89 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agr

Re: [PR] feat: pushdown filter for native_iceberg_compat [datafusion-comet]

2025-03-27 Thread via GitHub
wForget commented on code in PR #1566: URL: https://github.com/apache/datafusion-comet/pull/1566#discussion_r2016053542 ## spark/src/main/scala/org/apache/comet/parquet/SourceFilterSerde.scala: ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Triggering extended tests through PR comment: `Run extended tests` [datafusion]

2025-03-27 Thread via GitHub
danila-b commented on code in PR #15101: URL: https://github.com/apache/datafusion/pull/15101#discussion_r2016063441 ## .github/workflows/extended.yml: ## @@ -127,4 +145,44 @@ jobs: cargo test --features backtrace --profile release-nonlto --test sqllogictests -- --in

Re: [PR] Triggering extended tests through PR comment: `Run extended tests` [datafusion]

2025-03-27 Thread via GitHub
danila-b commented on code in PR #15101: URL: https://github.com/apache/datafusion/pull/15101#discussion_r2016064025 ## .github/workflows/pr_comment_commands.yml: ## @@ -0,0 +1,89 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agr

Re: [PR] Triggering extended tests through PR comment: `Run extended tests` [datafusion]

2025-03-27 Thread via GitHub
danila-b commented on code in PR #15101: URL: https://github.com/apache/datafusion/pull/15101#discussion_r2016067578 ## .github/workflows/pr_comment_commands.yml: ## @@ -0,0 +1,89 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agr

[I] `native_datafusion/native_iceberg_compat` scans case sensitive [datafusion-comet]

2025-03-27 Thread via GitHub
wForget opened a new issue, #1574: URL: https://github.com/apache/datafusion-comet/issues/1574 ### Describe the bug Currently `native_datafusion/native_iceberg_compat` scans are case-sensitive, which may be inconsistent with vanilla spark. test case: ``` test("test

Re: [PR] refactor: use TypeSignature::Coercible for crypto functions [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on PR #14826: URL: https://github.com/apache/datafusion/pull/14826#issuecomment-2757360811 Thanks @Chen-Yuan-Lai -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] refactor: use TypeSignature::Coercible for crypto functions [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 merged PR #14826: URL: https://github.com/apache/datafusion/pull/14826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-27 Thread via GitHub
dentiny commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2757316969 Thank you for the care ❤️ Let me sync with @Chen-Yuan-Lai offline -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016070941 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] Triggering extended tests through PR comment: `Run extended tests` [datafusion]

2025-03-27 Thread via GitHub
danila-b commented on PR #15101: URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2757347599 > I have completed testing in my fork and I agree this works ❤️ - thank you so much @danila-b (see testing PR here [alamb#33](https://github.com/alamb/datafusion/pull/33)) >

[PR] Update the copyright year [datafusion]

2025-03-27 Thread via GitHub
omkenge opened a new pull request, #15453: URL: https://github.com/apache/datafusion/pull/15453 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016102166 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [I] Emit warning with attached `Diagnostic` when doing `= NULL` [datafusion]

2025-03-27 Thread via GitHub
eliaperantoni commented on issue #14434: URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2757036518 > Quick update: I was a bit busy with schoolwork last week, but I’ll try to fix this ticket this week. Thanks for your patience! No worries at all! <3 -- This is

Re: [I] Discussion: handling comparison of intervals [datafusion]

2025-03-27 Thread via GitHub
emilk commented on issue #8468: URL: https://github.com/apache/datafusion/issues/8468#issuecomment-2757155123 I would be interested in a fix for this @ozankabak ! 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Support computing statistics for FileGroup [datafusion]

2025-03-27 Thread via GitHub
jayzhan211 commented on code in PR #15432: URL: https://github.com/apache/datafusion/pull/15432#discussion_r2016078687 ## datafusion/core/src/datasource/statistics.rs: ## @@ -145,7 +147,142 @@ pub async fn get_statistics_with_limit( Ok((result_files, statistics)) } -fn a

Re: [PR] fix: Making shuffle files generated in native shuffle mode reclaimable [datafusion-comet]

2025-03-27 Thread via GitHub
Kontinuation commented on code in PR #1568: URL: https://github.com/apache/datafusion-comet/pull/1568#discussion_r2016905005 ## spark/src/test/scala/org/apache/comet/exec/CometNativeShuffleSuite.scala: ## @@ -201,6 +204,17 @@ class CometNativeShuffleSuite extends CometTestBase w

Re: [PR] Clean up hash_join's ExecutionPlan::execute [datafusion]

2025-03-27 Thread via GitHub
comphead commented on PR #15418: URL: https://github.com/apache/datafusion/pull/15418#issuecomment-2758423217 > Thank you @ctsk. I believe we can merge this as is. However I'd like to raise one thing that comes to mind whenever I look this code. I'm not very comfortable with adding a `Coale

Re: [PR] chore: move `optimize_subquery_sort` into optimizer [datafusion]

2025-03-27 Thread via GitHub
irenjj commented on code in PR #15441: URL: https://github.com/apache/datafusion/pull/15441#discussion_r2016945349 ## datafusion/optimizer/src/eliminate_sort.rs: ## @@ -0,0 +1,78 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [I] Attach `Diagnostic` to "wrong number of arguments" error [datafusion]

2025-03-27 Thread via GitHub
eliaperantoni commented on issue #14432: URL: https://github.com/apache/datafusion/issues/14432#issuecomment-2757082084 @Chen-Yuan-Lai I only filed the ticket, but it's @jsai28 working on that feature :) You're definitely right though: these two tickets do have some overlap and we should ma

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-27 Thread via GitHub
andygrove commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2006442098 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,131 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-20 +aut

[I] Feature is not implemeneted: Unsupported cast with list of structs [datafusion]

2025-03-27 Thread via GitHub
liamphmurphy opened a new issue, #15338: URL: https://github.com/apache/datafusion/issues/15338 ### Describe the bug This bug for me originated when encountering schema evolutions on Delta tables using the `delta-rs` library. Whenever a schema evolution occurred on my table that cont

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-27 Thread via GitHub
milenkovicm commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2743866878 ah no, sorry for misunderstanding please do not cancel this PR. what I meant, in case of this type of error, ballista job should be cancelled. -- This is an automat

Re: [PR] fix: check if handle has been initialized before closing [datafusion-comet]

2025-03-27 Thread via GitHub
parthchandra commented on PR #1554: URL: https://github.com/apache/datafusion-comet/pull/1554#issuecomment-2741056298 > > I just wanted to see in what condition the NativeBatchReader can be called after close has been called. > > The scenario I encountered was not NativeBatchReader c

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2758981967 Thank you all for the feedback. @alamb I think the main thing to figure out is if this should happen during an optimizer pass and what gets pushed down is an `Arc` or if we c

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017347609 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017347609 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operators is

[I] Plan joins with PartitionMode::Auto by default [datafusion]

2025-03-27 Thread via GitHub
Dandandan opened a new issue, #15349: URL: https://github.com/apache/datafusion/issues/15349 ### Is your feature request related to a problem or challenge? Currently PartitionMode::Partitioned is the default when statistics collection is not used. This lead to suboptimal plans when

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
alamb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2012937304 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1847,6 +1848,28 @@ mod tests { writer.close().unwrap(); } +fn write_file_null(

Re: [PR] feat: introduce hadoop mini cluster to test native scan on hdfs [datafusion-comet]

2025-03-27 Thread via GitHub
wForget commented on code in PR #1556: URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2006772930 ## spark/src/test/scala/org/apache/comet/WithHdfsCluster.scala: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[PR] Fix no effect metrics bug in ParquetSource [datafusion]

2025-03-27 Thread via GitHub
XiangpengHao opened a new pull request, #15460: URL: https://github.com/apache/datafusion/pull/15460 ## Which issue does this PR close? - Closes #. ## Rationale for this change Currently setting metrics to parquet source has no effect, this pr fixes this.

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-27 Thread via GitHub
rkrishn7 commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2759030135 Thanks @alamb - is this ready to get merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] fix: Making shuffle files generated in native shuffle mode reclaimable [datafusion-comet]

2025-03-27 Thread via GitHub
mbutrovich commented on code in PR #1568: URL: https://github.com/apache/datafusion-comet/pull/1568#discussion_r2016848830 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala: ## @@ -232,14 +222,17 @@ object CometShuffleExchangeExe

Re: [PR] feat: make parquet native scan schema case insensitive [datafusion-comet]

2025-03-27 Thread via GitHub
codecov-commenter commented on PR #1575: URL: https://github.com/apache/datafusion-comet/pull/1575#issuecomment-2758515840 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1575?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add `FileScanConfigBuilder` [datafusion]

2025-03-27 Thread via GitHub
blaginin commented on code in PR #15352: URL: https://github.com/apache/datafusion/pull/15352#discussion_r2016857051 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -174,6 +175,219 @@ pub struct FileScanConfig { pub batch_size: Option, } +#[derive(Clone)] +pub st

[PR] refactor: Move `Memtable` to catalog [datafusion]

2025-03-27 Thread via GitHub
logan-keede opened a new pull request, #15459: URL: https://github.com/apache/datafusion/pull/15459 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] [DISCUSS] Consider Vendoring Certain Dependencies [datafusion]

2025-03-27 Thread via GitHub
Omega359 commented on issue #15360: URL: https://github.com/apache/datafusion/issues/15360#issuecomment-2758703831 Pretty sure the linker/LLVM filters out unused code anyways- at least if LTO is on which it is for release builds. It might help decrease build time marginally but I really wou

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-27 Thread via GitHub
comphead commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2017244901 ## spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala: ## @@ -490,8 +490,7 @@ object CometScanExec extends DataTypeSupport { // TODO a

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-27 Thread via GitHub
suibianwanwank commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2017284381 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -644,10 +738,122 @@ impl RecordBatchStore { } } +/// Pushdown of dynamic fitlers from TopK operat

Re: [PR] feat: enable iceberg compat tests, more tests for complex types [datafusion-comet]

2025-03-27 Thread via GitHub
parthchandra commented on code in PR #1550: URL: https://github.com/apache/datafusion-comet/pull/1550#discussion_r2017071919 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2722,7 +2721,11 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSe

Re: [I] [DISCUSS] Consider Vendoring Certain Dependencies [datafusion]

2025-03-27 Thread via GitHub
ozankabak commented on issue #15360: URL: https://github.com/apache/datafusion/issues/15360#issuecomment-2758955432 Right, but we spend a lot of time on CI and debugging without LTO. Also, dependency creep increases the number of "moving" parts and makes maintenance harder. Therefore I thin

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-27 Thread via GitHub
mbutrovich commented on code in PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#discussion_r2017297638 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -498,81 +502,63 @@ impl ShuffleRepartitioner { Ok(()) } -/// Writes buffer

  1   2   3   >