Re: [I] [DISCUSS] Switch to `tree` explain by default [datafusion]

2025-03-21 Thread via GitHub
Standing-Man commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2744833718 I agree to use the tree explanation format as the default display, as it is more concise. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-21 Thread via GitHub
qazxcdswe123 commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2744966415 > [@alamb](https://github.com/alamb) What's the reason to cast numeric columns to i64/u64 and not to smallest compatible type? Why not to try something like this: >

Re: [I] Unsupported OS/arch [datafusion-comet]

2025-03-21 Thread via GitHub
jinwenjie123 commented on issue #1552: URL: https://github.com/apache/datafusion-comet/issues/1552#issuecomment-2744707975 Hi @andygrove , Thanks for your previous suggestions. However, when I use the precompiled JARs, I encounter a `GLIBC_2.27 not found` error. I modif

Re: [PR] Fix empty aggregation function count() in Substrait [datafusion]

2025-03-21 Thread via GitHub
jayzhan211 merged PR #15345: URL: https://github.com/apache/datafusion/pull/15345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Improved error for expand wildcard rule [datafusion]

2025-03-21 Thread via GitHub
jayzhan211 merged PR #15287: URL: https://github.com/apache/datafusion/pull/15287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Improved error for expand wildcard rule [datafusion]

2025-03-21 Thread via GitHub
jayzhan211 commented on PR #15287: URL: https://github.com/apache/datafusion/pull/15287#issuecomment-2745020209 Thanks @alamb @Jiashu-Hu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-03-21 Thread via GitHub
goldmedal commented on code in PR #14837: URL: https://github.com/apache/datafusion/pull/14837#discussion_r2008683504 ## datafusion/physical-expr/src/async_scalar_function.rs: ## @@ -0,0 +1,227 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Migrate datasource tests to insta [datafusion]

2025-03-21 Thread via GitHub
shruti2522 commented on PR #15258: URL: https://github.com/apache/datafusion/pull/15258#issuecomment-2745098806 I have migrated all the tests for `datasource` to `insta`, except these two, I’m still working on these and will open a separate PR for it. 1. `physical_plan::json::tests::test

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-21 Thread via GitHub
Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2745079065 > interleave_record_batch is much slower running Q18, this problem is still under investigation. This problem happens on macOS 15.3.1 with Apple M1 Pro chip. I suspec

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-21 Thread via GitHub
Shreyaskr1409 commented on PR #15313: URL: https://github.com/apache/datafusion/pull/15313#issuecomment-2745018228 @alamb @blaginin I have made all necessary changes. I don't think there much to add anymore. -- This is an automated message from the Apache Git Service. To respond to the me

[I] Empty aggregation functions coming from substrait [datafusion]

2025-03-21 Thread via GitHub
gabotechs opened a new issue, #15344: URL: https://github.com/apache/datafusion/issues/15344 ### Describe the bug Providing empty aggregation functions in a Substrait plan results in an invalid logical plan that fails in the physical transformation step with the following error: `

[PR] Fix empty aggregation function count() in Substrait [datafusion]

2025-03-21 Thread via GitHub
gabotechs opened a new pull request, #15345: URL: https://github.com/apache/datafusion/pull/15345 ## Which issue does this PR close? - Closes #15344 ## Rationale for this change Fixing empty `count()` aggregation functions provided through Substrait ## What changes

Re: [PR] fix: Unconditionally wrap UNION BY NAME input nodes w/ `Projection` [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on PR #15242: URL: https://github.com/apache/datafusion/pull/15242#issuecomment-2743119338 @rkrishn7 - what do you think of my above comment for the cause of the issue? I personally think it should be fixed before this PR is approved, not after. -- This is an automated

Re: [PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-21 Thread via GitHub
zhuqi-lucas commented on code in PR #15348: URL: https://github.com/apache/datafusion/pull/15348#discussion_r200748 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -281,6 +281,33 @@ impl CursorArray for GenericByteArray { } } +impl CursorArray for StringViewA

Re: [PR] 1065/enhancement/add ctx to `__init__.py` [datafusion-python]

2025-03-21 Thread via GitHub
timsaucer commented on PR #1072: URL: https://github.com/apache/datafusion-python/pull/1072#issuecomment-2743226539 My question was more about ergonomics - now that we have the global context I suspect most users don't even need to import at all. In your use case, do you find you still nee

Re: [PR] Fix empty aggregation function count() in Substrait [datafusion]

2025-03-21 Thread via GitHub
geoffreyclaude commented on code in PR #15345: URL: https://github.com/apache/datafusion/pull/15345#discussion_r2007056372 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1975,6 +1975,13 @@ pub async fn from_substrait_agg_func( let args = from_substrait_func_

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-21 Thread via GitHub
ctsk commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2007064434 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1067,35 +1067,53 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::execut

Re: [I] [Epic] A collection of FFI related tasks [datafusion]

2025-03-21 Thread via GitHub
ion-elgreco commented on issue #15283: URL: https://github.com/apache/datafusion/issues/15283#issuecomment-2742655163 https://github.com/apache/datafusion/issues/14842 the Ffi TableProvider doesn't seem to work anymore -- This is an automated message from the Apache Git Service. To respo

[I] Merge operation involving map field fails [datafusion]

2025-03-21 Thread via GitHub
chsgray opened a new issue, #15351: URL: https://github.com/apache/datafusion/issues/15351 ### Describe the bug Using the `delta-rs` Python binding, a `DeltaError` was raised when attempting a `DeltaTable.merge` operation. Library-assigned map field names appear to be mutually incomp

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-21 Thread via GitHub
xudong963 commented on code in PR #15296: URL: https://github.com/apache/datafusion/pull/15296#discussion_r2007748334 ## datafusion/expr-common/src/statistics.rs: ## @@ -203,6 +203,138 @@ impl Distribution { }; Ok(dt) } + +/// Merges two distributions

Re: [PR] chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove commented on code in PR #1559: URL: https://github.com/apache/datafusion-comet/pull/1559#discussion_r2007754499 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQueryTestSuite.scala: ## @@ -213,7 +221,7 @@ class CometTPCDSQueryTestSuite extends QueryTest with TP

Re: [PR] chore: Reimplement ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-03-21 Thread via GitHub
mbutrovich commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2743317406 Dumb question: is `take_record_batch` an option here (it looks like interleave can actually coalesce a vector or `RecordBatch`. What would the performance implication be of `

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2007250544 ## src/parser/mod.rs: ## @@ -11145,17 +11145,16 @@ impl<'a> Parser<'a> { } /// Parse a `SET ROLE` statement. Expects SET to be cons

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-21 Thread via GitHub
konjac commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2007565173 ## src/dataframe.rs: ## @@ -771,3 +871,82 @@ fn record_batch_into_schema( RecordBatch::try_new(schema, data_arrays) } + +/// This is a helper function

Re: [PR] Fix empty aggregation function count() in Substrait [datafusion]

2025-03-21 Thread via GitHub
jayzhan211 commented on code in PR #15345: URL: https://github.com/apache/datafusion/pull/15345#discussion_r2007611748 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1975,6 +1975,13 @@ pub async fn from_substrait_agg_func( let args = from_substrait_func_args

Re: [I] [DISCUSS] Switch to `tree` explain by default [datafusion]

2025-03-21 Thread via GitHub
berkaysynnada commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2743413455 > I think this is _very_ important. We found many bugs using the current, detailed plan print-outs. Let's make the tree mode default for the CLI experience (and tests veri

Re: [PR] Fix empty aggregation function count() in Substrait [datafusion]

2025-03-21 Thread via GitHub
gabotechs commented on code in PR #15345: URL: https://github.com/apache/datafusion/pull/15345#discussion_r2007641902 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1975,6 +1975,13 @@ pub async fn from_substrait_agg_func( let args = from_substrait_func_args(

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on code in PR #14837: URL: https://github.com/apache/datafusion/pull/14837#discussion_r2007678119 ## datafusion/physical-expr/src/async_scalar_function.rs: ## @@ -0,0 +1,227 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-21 Thread via GitHub
XiangpengHao commented on PR #61: URL: https://github.com/apache/datafusion-site/pull/61#issuecomment-2743495500 Thank you all for the comments! I'll incorporate them soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-21 Thread via GitHub
jsai28 commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2743516652 @eliaperantoni Awesome! Do you think there is enough here to make it into a GSoC project? Or do you think this could be one part of a project that includes adding diagnostic info

Re: [I] DeltaLake integration not working (Python) (FFI Table providers not working) [datafusion-python]

2025-03-21 Thread via GitHub
timsaucer commented on issue #1077: URL: https://github.com/apache/datafusion-python/issues/1077#issuecomment-2742974702 Thanks. I suspect the culprit was a change made in the ffi signature which is when we also added the ability to get the major version for compatibility checking. I am su

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-21 Thread via GitHub
adriangb commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2007326239 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1067,35 +1067,53 @@ impl ExecutionPlan for SortExec { ) -> Result { trace!("Start SortExec::ex

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-21 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2743016984 > > I ran this against Q23, results look promising! Elapsed 3.173 seconds with `datafusion.optimizer.enable_dynamic_filter_pushdown = true` vs. 4.696 with `false`. Both with predica

Re: [PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-21 Thread via GitHub
2010YOUY01 commented on PR #15348: URL: https://github.com/apache/datafusion/pull/15348#issuecomment-2743150385 Thank you for the work on better Utf8View support. I tried one sort benchmark with sort-preserving merging on a single `Utf8View` column, but it gets slower: Reproducer

Re: [PR] Add regexp_extract func [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on PR #14282: URL: https://github.com/apache/datafusion/pull/14282#issuecomment-2743165457 @SKY-ALIN - are you able to look into the above review comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix: add an "expr_planners" method to SessionState [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on code in PR #15119: URL: https://github.com/apache/datafusion/pull/15119#discussion_r2007427604 ## datafusion/core/src/execution/session_state.rs: ## @@ -1950,6 +1955,16 @@ mod tests { use super::{SessionContextProvider, SessionStateBuilder}; use c

Re: [I] Change mapping of SQL `VARCHAR` from `Utf8` to `Utf8View` [datafusion]

2025-03-21 Thread via GitHub
zhuqi-lucas commented on issue #15096: URL: https://github.com/apache/datafusion/issues/15096#issuecomment-2742815386 New task: Perf: Support Utf8View datatype single column comparisons with optimized cursor implementations for SortPreservingMergeStream -- This is an automated mess

[PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-21 Thread via GitHub
zhuqi-lucas opened a new pull request, #15348: URL: https://github.com/apache/datafusion/pull/15348 …servingMergeStream ## Which issue does this PR close? - Closes partof [#15096](https://github.com/apache/datafusion/issues/15096) ## Rationale for this change Support U

Re: [PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on code in PR #15348: URL: https://github.com/apache/datafusion/pull/15348#discussion_r2007390113 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -281,6 +281,33 @@ impl CursorArray for GenericByteArray { } } +impl CursorArray for StringViewArra

Re: [PR] #5483 [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on PR #15307: URL: https://github.com/apache/datafusion/pull/15307#issuecomment-2743108401 - `cargo fmt` - [prettier](https://datafusion.apache.org/contributor-guide/howtos.html#how-to-format-md-document) - revert the hawkeye github action changes (separate PR) --

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
berkaysynnada commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007773712 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UI

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
berkaysynnada commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007781920 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UI

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
berkaysynnada commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007781920 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UI

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
berkaysynnada commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007773712 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UI

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007786253 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UInt64)

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007786253 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UInt64)

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-21 Thread via GitHub
xudong963 commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743665328 > We can only merge two statistical objects in certain special circumstances. For example, if we have a statistical object that tracks sample averages along with counts, we can mer

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-21 Thread via GitHub
kazuyukitanimura commented on PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#issuecomment-2744310746 Merged thanks @andygrove @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-21 Thread via GitHub
kazuyukitanimura merged PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] chore: Prepare for update to DataFusion 47 [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove commented on code in PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#discussion_r2008245382 ## native/core/src/parquet/schema_adapter.rs: ## @@ -216,57 +216,6 @@ impl SchemaMapper for SchemaMapping { let record_batch = RecordBatch::try_new_

Re: [PR] chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [datafusion-comet]

2025-03-21 Thread via GitHub
kazuyukitanimura commented on PR #1559: URL: https://github.com/apache/datafusion-comet/pull/1559#issuecomment-2744308713 Merged, thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Improve memory pool configuration code, documentation, and tests [datafusion-comet]

2025-03-21 Thread via GitHub
kazuyukitanimura closed issue #1560: Improve memory pool configuration code, documentation, and tests URL: https://github.com/apache/datafusion-comet/issues/1560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] chore: Prepare for update to DataFusion 47 [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove opened a new pull request, #1563: URL: https://github.com/apache/datafusion-comet/pull/1563 ## Which issue does this PR close? N/A ## Rationale for this change The next DataFusion release will be here in a few weeks. We should start getting read

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-03-21 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2008444729 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: Support serde for FileScanConfig `batch_size` [datafusion]

2025-03-21 Thread via GitHub
alamb merged PR #15335: URL: https://github.com/apache/datafusion/pull/15335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: introduce hadoop mini cluster to test native scan on hdfs [datafusion-comet]

2025-03-21 Thread via GitHub
parthchandra commented on code in PR #1556: URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2008459160 ## pom.xml: ## @@ -447,6 +448,13 @@ under the License. 5.1.0 + +org.apache.hadoop +hadoop-client-minicluster Rev

Re: [I] How can Comet be enabled by default without needing to configure memory? [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove commented on issue #1558: URL: https://github.com/apache/datafusion-comet/issues/1558#issuecomment-2743701803 > When enabling off-heap memory, we will use unified memory manager, does that mean the amount of memory will not be doubled? Our current implementation automatical

Re: [PR] refactor: move `CteWorkTable`, `default_table_source` a bunch of files out of core [datafusion]

2025-03-21 Thread via GitHub
logan-keede commented on PR #15316: URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2744214468 > So are you happy with this PR as is now? Does it make more sense? Yes, but I think we should consider moving In-memory format's providers from `catalog` to `datasource` a

Re: [PR] updatted github action by change version tag to sha hashes [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on PR #15315: URL: https://github.com/apache/datafusion/pull/15315#issuecomment-2741203586 Well that is unfortunate. I wonder if the apache regex is correct - the one in the error message is not, should be '.*\/.*@[a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]+'

Re: [PR] chore: Prepare for DataFusion 47.0.0 [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove commented on code in PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#discussion_r2008287770 ## native/core/src/parquet/schema_adapter.rs: ## @@ -216,57 +216,6 @@ impl SchemaMapper for SchemaMapping { let record_batch = RecordBatch::try_new_

Re: [I] multiply overflow in stats.rs [datafusion]

2025-03-21 Thread via GitHub
Speculative commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-271420 I'm reproducing this multiplication overflow panic as well. # Details `datafusion` version 46.0.0 Trying to execute [Join Order Benchmark query 16B](https:/

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
mvzink commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2008324600 ## src/parser/mod.rs: ## @@ -11299,41 +11304,26 @@ impl<'a> Parser<'a> { } if self.dialect.supports_comma_separated_set_assignments(

Re: [I] [DISCUSS] Switch to `tree` explain by default [datafusion]

2025-03-21 Thread via GitHub
ozankabak commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2743148343 > Most tests that use the indent mode should continue to use the indent mode as it contains more information. In other words the tests should not be updated to use the tree mo

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-21 Thread via GitHub
eliaperantoni commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2743707574 > [@eliaperantoni](https://github.com/eliaperantoni?rgh-link-date=2025-03-21T14%3A20%3A58.000Z) Awesome! Do you think there is enough here to make it into a GSoC project?

Re: [PR] Minor: Keep debug symbols for `release-nonlto` build [datafusion]

2025-03-21 Thread via GitHub
comphead commented on PR #15350: URL: https://github.com/apache/datafusion/pull/15350#issuecomment-2743813022 > I personally just added a 'profiling' profile that inherits release-nonlto and set debug to true. thats another option yes, `release-nonlto` is a release-like used for runn

Re: [PR] Re-Add CodeCov [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2743113926 I got a [lot of pushback](https://discord.com/channels/885562378132000778/1166447479609376850/1306995025887887411) on code coverage when I brought it up, fyi. [Here is a report](ht

Re: [I] Dependency conflict with rquest due to async-compression and xz2 linking to lzma [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on issue #15342: URL: https://github.com/apache/datafusion/issues/15342#issuecomment-2743809710 Would forcing df to use O.4.19 (`async-compression = { version = "=0.4.19"`) resolve this issue -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-21 Thread via GitHub
ozankabak commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743724228 > I confused the merge and mix, after reviewing the information, "Merge" suggests combining datasets that maintain their original properties, but what's implemented is actually clo

[PR] Documentation: Plan custom expressions [datafusion]

2025-03-21 Thread via GitHub
Jiashu-Hu opened a new pull request, #15353: URL: https://github.com/apache/datafusion/pull/15353 ## Which issue does this PR close? - [Closes #15267](https://github.com/apache/datafusion/issues/15267) . ## Rationale for this change This PR adds documentation

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-21 Thread via GitHub
waynexia commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2744554741 Ahh, I reproduced the same result. And I also observed a regression on q0: | Query | Before (ms) | After (ms) | |---|-|| | Q0| 1407.04

Re: [PR] feat: simplify regex wildcard pattern [datafusion]

2025-03-21 Thread via GitHub
waynexia commented on PR #15299: URL: https://github.com/apache/datafusion/pull/15299#issuecomment-2744557761 Thank you for reviewing @jayzhan211 @alamb :heart: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feat: simplify regex wildcard pattern [datafusion]

2025-03-21 Thread via GitHub
waynexia merged PR #15299: URL: https://github.com/apache/datafusion/pull/15299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [datafusion-comet]

2025-03-21 Thread via GitHub
kazuyukitanimura merged PR #1559: URL: https://github.com/apache/datafusion-comet/pull/1559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [I] Add support for S3 Object Store to scheduler/executor binaries [datafusion-ballista]

2025-03-21 Thread via GitHub
milenkovicm commented on issue #1205: URL: https://github.com/apache/datafusion-ballista/issues/1205#issuecomment-2744173043 sure @westhide I was thinking something along https://github.com/apache/datafusion/blob/7c902def35601a003f91744ba2829eb451e792d7/datafusion-cli/src/object_storage.rs

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2007630685 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -855,21 +855,14 @@ pub fn binary_numeric_coercion( (UInt64, _) | (_, UInt64) => Some(UInt64)

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-21 Thread via GitHub
konjac commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2007678505 ## src/dataframe.rs: ## @@ -111,56 +116,151 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult { -let df = self.

Re: [PR] chore(deps): bump tokio from 1.43.0 to 1.44.1 [datafusion]

2025-03-21 Thread via GitHub
comphead merged PR #15347: URL: https://github.com/apache/datafusion/pull/15347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Prepare for DataFusion 47.0.0 [datafusion-comet]

2025-03-21 Thread via GitHub
codecov-commenter commented on PR #1563: URL: https://github.com/apache/datafusion-comet/pull/1563#issuecomment-2744428387 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1563?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Use `any` instead of `for_each` [datafusion]

2025-03-21 Thread via GitHub
xudong963 merged PR #15289: URL: https://github.com/apache/datafusion/pull/15289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2007966129 ## src/parser/mod.rs: ## @@ -11299,41 +11304,26 @@ impl<'a> Parser<'a> { } if self.dialect.supports_comma_separated_set_ass

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-21 Thread via GitHub
Dandandan commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2007092631 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccu

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-21 Thread via GitHub
viirya commented on code in PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2007971975 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -63,9 +64,28 @@ class CometExecIterator( }.toArray private val plan = { val c

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
mvzink commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2007913601 ## src/parser/mod.rs: ## @@ -11299,41 +11304,26 @@ impl<'a> Parser<'a> { } if self.dialect.supports_comma_separated_set_assignments(

Re: [PR] chore(deps): bump tempfile from 3.18.0 to 3.19.1 [datafusion]

2025-03-21 Thread via GitHub
comphead merged PR #15346: URL: https://github.com/apache/datafusion/pull/15346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2007956403 ## src/parser/mod.rs: ## @@ -11299,41 +11304,26 @@ impl<'a> Parser<'a> { } if self.dialect.supports_comma_separated_set_ass

Re: [PR] added explaination for Schema and DFSchema to documentation [datafusion]

2025-03-21 Thread via GitHub
Jiashu-Hu commented on code in PR #15329: URL: https://github.com/apache/datafusion/pull/15329#discussion_r2007966257 ## docs/source/library-user-guide/working-with-exprs.md: ## @@ -50,6 +50,25 @@ As another example, the SQL expression `a + b * c` would be represented as an `E

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2007956403 ## src/parser/mod.rs: ## @@ -11299,41 +11304,26 @@ impl<'a> Parser<'a> { } if self.dialect.supports_comma_separated_set_ass

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-21 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2007956403 ## src/parser/mod.rs: ## @@ -11299,41 +11304,26 @@ impl<'a> Parser<'a> { } if self.dialect.supports_comma_separated_set_ass

Re: [PR] Minor: Keep debug symbols for `release-nonlto` build [datafusion]

2025-03-21 Thread via GitHub
Omega359 commented on PR #15350: URL: https://github.com/apache/datafusion/pull/15350#issuecomment-2743159788 I personally just added a 'profiling' profile that inherits release-nonlto and set debug to true. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] WIP: Add `FileScanConfigBuilder` and switch some cases [datafusion]

2025-03-21 Thread via GitHub
blaginin commented on code in PR #15352: URL: https://github.com/apache/datafusion/pull/15352#discussion_r2007989845 ## datafusion-examples/examples/parquet_index.rs: ## @@ -244,9 +244,10 @@ impl TableProvider for IndexTableProvider { let source = Arc::new(

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove commented on code in PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2008018651 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -63,9 +64,28 @@ class CometExecIterator( }.toArray private val plan = { va

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-21 Thread via GitHub
eliaperantoni commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2743128078 @jsai28 Precisely. I agree with everything you said :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [datafusion-comet]

2025-03-21 Thread via GitHub
andygrove commented on code in PR #1559: URL: https://github.com/apache/datafusion-comet/pull/1559#discussion_r2008004761 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQueryTestSuite.scala: ## @@ -213,7 +221,7 @@ class CometTPCDSQueryTestSuite extends QueryTest with TP

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-21 Thread via GitHub
alamb commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2744176846 > Maybe some blog posts / documentation / tutorials on how `Diagnostic` works? Mind if I come up with a draft proposal and send to you (via email or something else?) ? That

Re: [PR] Minor: Keep debug symbols for `release-nonlto` build [datafusion]

2025-03-21 Thread via GitHub
comphead merged PR #15350: URL: https://github.com/apache/datafusion/pull/15350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Reduce number of tokio blocking threads in SortExec spill [datafusion]

2025-03-21 Thread via GitHub
andygrove commented on issue #15323: URL: https://github.com/apache/datafusion/issues/15323#issuecomment-2743795290 > Do you see too many threads when writing the spill files or when reading? This is when reading, during the merge operation. > In merge phase, each spill file wil

Re: [PR] chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [datafusion-comet]

2025-03-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1559: URL: https://github.com/apache/datafusion-comet/pull/1559#discussion_r2007980127 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQueryTestSuite.scala: ## @@ -213,7 +221,7 @@ class CometTPCDSQueryTestSuite extends QueryTest w

[PR] WIP: Add `FileScanConfigBuilder` and switch some cases [datafusion]

2025-03-21 Thread via GitHub
blaginin opened a new pull request, #15352: URL: https://github.com/apache/datafusion/pull/15352 Related to https://github.com/apache/datafusion/pull/14685#issuecomment-2667684649 ## Rationale for this change `FileScanConfig` now violates single responsibility from SOLID. It se

Re: [PR] Use `any` instead of `for_each` [datafusion]

2025-03-21 Thread via GitHub
xudong963 commented on PR #15289: URL: https://github.com/apache/datafusion/pull/15289#issuecomment-2743953225 Thanks for your review @berkaysynnada @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Added tests with are writing into parquet files in memory for issue #… [datafusion]

2025-03-21 Thread via GitHub
XiangpengHao commented on code in PR #15325: URL: https://github.com/apache/datafusion/pull/15325#discussion_r2008190901 ## datafusion/wasmtest/src/lib.rs: ## @@ -182,4 +182,29 @@ mod test { let task_ctx = ctx.task_ctx(); let _ = collect(physical_plan, task_ctx

  1   2   >