[PR] Snowflake: CREATE DYNAMIC TABLE [datafusion-sqlparser-rs]

2025-07-20 Thread via GitHub
yoavcloud opened a new pull request, #1960: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1960 Added support for `CREATE DYNAMIC TABLE` in Snowflake: 1. Extended `CreateTableBuilder` with new options 2. Removed the `CreateTableBuilder::validate_schema_info` function and r

[I] Missing data when inserting into MemTable [datafusion]

2025-07-20 Thread via GitHub
ableegoldman opened a new issue, #16836: URL: https://github.com/apache/datafusion/issues/16836 ### Describe the bug Hi, I've recently started using DataFusion and have run into an issue trying to copy some results into a local cache implemented using the Memtable. Here is the code:

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218214013 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -125,6 +133,156 @@ impl SpillManager { self.spill_record_batch_and_finish(&batches, reque

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Add benchmark for ByteViewGroupValueBuilder [datafusion]

2025-07-20 Thread via GitHub
zhuqi-lucas commented on PR #16826: URL: https://github.com/apache/datafusion/pull/16826#issuecomment-3095204512 > LGTM, thanks. > > BTW, for end-to-end aggregate queries -- is the hot spot only this `vectorized_append()` in this benchmark? Should we add other functions that also tak

Re: [I] Treat truncated parquet stats as inexact [datafusion]

2025-07-20 Thread via GitHub
nssalian commented on issue #15976: URL: https://github.com/apache/datafusion/issues/15976#issuecomment-3095182816 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Regression: `DataFrameWriteOptions::with_single_file_output` produces a directory [datafusion]

2025-07-20 Thread via GitHub
nssalian commented on issue #13323: URL: https://github.com/apache/datafusion/issues/13323#issuecomment-3095182187 @alamb , is this issue still valid? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: enhance support for Decimal128 and Decimal256 [datafusion]

2025-07-20 Thread via GitHub
2010YOUY01 commented on code in PR #16831: URL: https://github.com/apache/datafusion/pull/16831#discussion_r2218150407 ## datafusion/common/src/scalar/mod.rs: ## @@ -1382,6 +1382,12 @@ impl ScalarValue { DataType::Float16 => ScalarValue::Float16(Some(f16::from_f32(

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-07-20 Thread via GitHub
zhuqi-lucas commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-3095164017 @nssalian This ticket is invalid, i need to rework this: https://github.com/apache/datafusion/pull/14954 -- This is an automated message from the Apache Git Service.

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-07-20 Thread via GitHub
nssalian commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-3095137374 @alamb, checking if my understanding of the issue is accurate since there are a few threads here. This is my first issue on the project, so I want to clarify prior to proceedin

[PR] Fix integration tests not running [datafusion]

2025-07-20 Thread via GitHub
kosiew opened a new pull request, #16835: URL: https://github.com/apache/datafusion/pull/16835 ## Which issue does this PR close? - Closes #16801 ## Rationale for this change - Provides a unified and maintainable test suite for schema adapter functionality by consolidati

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-20 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[PR] Simplify try cast expr evaluation [datafusion]

2025-07-20 Thread via GitHub
lewiszlw opened a new pull request, #16834: URL: https://github.com/apache/datafusion/pull/16834 ## Which issue does this PR close? - Closes #. ## Rationale for this change Simplify code. ## What changes are included in this PR? reuse `Column

Re: [PR] Update README.md - add Sqawk to users list [datafusion-sqlparser-rs]

2025-07-20 Thread via GitHub
github-actions[bot] closed pull request #1838: Update README.md - add Sqawk to users list URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix: allow arbitrary operators with ANY and ALL on Postgres [datafusion-sqlparser-rs]

2025-07-20 Thread via GitHub
github-actions[bot] commented on PR #1842: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1842#issuecomment-3095041837 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

[I] datafusion seems to be single threaded regardless of the number of cores [datafusion]

2025-07-20 Thread via GitHub
djouallah opened a new issue, #16833: URL: https://github.com/apache/datafusion/issues/16833 ### Describe the bug was doing some testing and notice that datafusion don't seems to be using all cores in my notebook runtime ### To Reproduce here is a simplified code

Re: [PR] Refactor binary.rs tests into modular submodules under `binary/tests` [datafusion]

2025-07-20 Thread via GitHub
kosiew commented on PR #16782: URL: https://github.com/apache/datafusion/pull/16782#issuecomment-3094956551 thanks @comphead for your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] RFC: What table provider features would be helpful in an example? [datafusion]

2025-07-20 Thread via GitHub
corasaurus-hex commented on issue #16821: URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3094934112 I have a problem I'd love to solve but I'm not exactly sure how to go about it. My issue is I need to do a join across a time axis, where an event in the past has a corre

Re: [PR] pipe column orderings into pruning predicate creation [datafusion]

2025-07-20 Thread via GitHub
etseidl commented on PR #15821: URL: https://github.com/apache/datafusion/pull/15821#issuecomment-3094849970 I'll be able to revisit this early in the week. It's been a back burner issue for me lately because of the slow progress on parquet format changes. It seems the sort order keeps popp

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-07-20 Thread via GitHub
nssalian commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-3094813498 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[I] Enhance support for types in ScalarValue [datafusion]

2025-07-20 Thread via GitHub
theirix opened a new issue, #16832: URL: https://github.com/apache/datafusion/issues/16832 ### Is your feature request related to a problem or challenge? Some types in `ScalarValue` are not yet supported in helper functions. ### Describe the solution you'd like Introduce

[PR] feat: enhance support for Decimal128 and Decimal256 [datafusion]

2025-07-20 Thread via GitHub
theirix opened a new pull request, #16831: URL: https://github.com/apache/datafusion/pull/16831 ## Which issue does this PR close? - Closes #. ## Rationale for this change Enhancing support for `ScalarValue::Decimal128` and `ScalarValue::Decimal256` ## What changes

[PR] fix(docs): Update broken links to `TableProvider` docs [datafusion]

2025-07-20 Thread via GitHub
jcsherin opened a new pull request, #16830: URL: https://github.com/apache/datafusion/pull/16830 ## Which issue does this PR close? - Closes #. ## Rationale for this change Fixes broken links to the `TableProvider` trait in the Rust and website docs. I found them whi

Re: [PR] Add hooks to `SchemaAdapter` to add custom column generators [datafusion]

2025-07-20 Thread via GitHub
adriangb closed pull request #15261: Add hooks to `SchemaAdapter` to add custom column generators URL: https://github.com/apache/datafusion/pull/15261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add hooks to `SchemaAdapter` to add custom column generators [datafusion]

2025-07-20 Thread via GitHub
adriangb commented on PR #15261: URL: https://github.com/apache/datafusion/pull/15261#issuecomment-3094659803 I'm proposing we replace SchemaAdapter in https://github.com/apache/datafusion/issues/16800 so I don't plan to work on this PR anymore -- This is an automated message from the Ap

Re: [PR] pipe column orderings into pruning predicate creation [datafusion]

2025-07-20 Thread via GitHub
adriangb commented on PR #15821: URL: https://github.com/apache/datafusion/pull/15821#issuecomment-3094659385 @etseidl I'm sorry we haven't made any progress here. I see there are a lot of merge conflicts but we do now finally have all of the building blocks in place since we evaluate predi

Re: [PR] cache generation of dictionary keys and null arrays for ScalarValue [datafusion]

2025-07-20 Thread via GitHub
adriangb merged PR #16789: URL: https://github.com/apache/datafusion/pull/16789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2025-07-20 Thread via GitHub
comphead commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-3094637555 Hey @anovv thanks for reporting it, what does `cargo build --timings` show? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat: Optimize `collect_left_input` processing [datafusion]

2025-07-20 Thread via GitHub
ctsk commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3094628896 The duplicate evaluation of the keys on the left side is no big deal, because those expressions are simple column selections. This is done when lowering a logical plan to a physical pla

Re: [I] update rust edition to 2024 [datafusion-ballista]

2025-07-20 Thread via GitHub
milenkovicm commented on issue #1271: URL: https://github.com/apache/datafusion-ballista/issues/1271#issuecomment-3094456313 We should track, Minimum Supported Rust Version (MSRV) of datafusion as well ``` # Minimum Supported Rust Version (MSRV) rust-version = "1.85.1" ```

Re: [PR] Chore: Refactor QueryPlanSerde, move math exprs in separate file [datafusion-comet]

2025-07-20 Thread via GitHub
kazantsev-maksim closed pull request #2027: Chore: Refactor QueryPlanSerde, move math exprs in separate file URL: https://github.com/apache/datafusion-comet/pull/2027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Chore: Improve array contains test coverage [datafusion-comet]

2025-07-20 Thread via GitHub
kazantsev-maksim commented on code in PR #2030: URL: https://github.com/apache/datafusion-comet/pull/2030#discussion_r2217759307 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -218,7 +218,7 @@ class CometArrayExpressionSuite extends CometTestBase

Re: [PR] Chore: Improve array contains test coverage [datafusion-comet]

2025-07-20 Thread via GitHub
kazantsev-maksim commented on code in PR #2030: URL: https://github.com/apache/datafusion-comet/pull/2030#discussion_r2217758957 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -232,6 +232,78 @@ class CometArrayExpressionSuite extends CometTestBas

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-20 Thread via GitHub
milenkovicm commented on PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#issuecomment-3094397765 Thanks @Huy1Ng! As python binding has its problems I wonder if we should keep just basic python CI active -- This is an automated message from the Apache Git Se

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-20 Thread via GitHub
Huy1Ng commented on code in PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#discussion_r2217684707 ## .github/actions/setup-builder/action.yaml: ## @@ -37,3 +37,5 @@ runs: rustup toolchain install ${{ inputs.rust-version }} Review Comment: I

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-20 Thread via GitHub
Huy1Ng commented on PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#issuecomment-3094376920 this should be ready for rust. I will tackle Python workflow next. This is going to be spitted into 3 small PRs. -- This is an automated message from the Apache Git Service.

Re: [PR] feat(spark): implement Spark datetime function last_day [datafusion]

2025-07-20 Thread via GitHub
Standing-Man commented on PR #16828: URL: https://github.com/apache/datafusion/pull/16828#issuecomment-3094263626 Hi @alamb, I’ve added the `last_day` function. However, running `cargo test --test sqllogictests -- spark` produces some errors. I’m looking into it, but please let me know if y

[PR] Add support for the DROP privilege [datafusion-sqlparser-rs]

2025-07-20 Thread via GitHub
yoavcloud opened a new pull request, #1959: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1959 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] feat(spark): implement Spark math function mod/pmod [datafusion]

2025-07-20 Thread via GitHub
chenkovsky opened a new pull request, #16829: URL: https://github.com/apache/datafusion/pull/16829 ## Which issue does this PR close? ## Rationale for this change ## What changes are included in this PR? implement mod/pmod udf ## Are these changes tested?

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-20 Thread via GitHub
Huy1Ng commented on code in PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#discussion_r2217686513 ## .github/workflows/rust.yml: ## @@ -17,275 +17,151 @@ name: Rust +concurrency: + group: ${{ github.repository }}-${{ github.workflow }} + cancel-in

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-20 Thread via GitHub
Huy1Ng commented on code in PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#discussion_r2217685871 ## .github/workflows/rust.yml: ## @@ -17,275 +17,151 @@ name: Rust +concurrency: + group: ${{ github.repository }}-${{ github.workflow }} + cancel-in

Re: [PR] improve rust workflows without cache [datafusion-ballista]

2025-07-20 Thread via GitHub
Huy1Ng commented on code in PR #1275: URL: https://github.com/apache/datafusion-ballista/pull/1275#discussion_r2217684707 ## .github/actions/setup-builder/action.yaml: ## @@ -37,3 +37,5 @@ runs: rustup toolchain install ${{ inputs.rust-version }} Review Comment: I

[PR] feat(spark): implement Spark datetime function last_day [datafusion]

2025-07-20 Thread via GitHub
Standing-Man opened a new pull request, #16828: URL: https://github.com/apache/datafusion/pull/16828 ## Which issue does this PR close? - Closes #16774. ## Rationale for this change ## What changes are included in this PR? Implement Spark da

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-07-20 Thread via GitHub
findepi commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3093779036 > I re-read the comments on this PR and I wonder if you tried implementing the solution suggested by @ozankabak in [#16625 (comment)](https://github.com/apache/datafusion

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-07-20 Thread via GitHub
findepi commented on code in PR #16625: URL: https://github.com/apache/datafusion/pull/16625#discussion_r2217667332 ## datafusion/ffi/src/udaf/mod.rs: ## @@ -561,6 +561,7 @@ impl AggregateUDFImpl for ForeignAggregateUDF { pub enum FFI_AggregateOrderSensitivity { Insensitiv