Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948528835 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -373,9 +373,10 @@ pub fn ensure_sorting( return adjust_window_sort_removal(requiremen

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948526926 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(()) }

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
xudong963 commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948524638 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(()) }

[I] Official Docker Image is not found [datafusion-ballista]

2025-02-09 Thread via GitHub
Noah-FetchRewards opened a new issue, #1178: URL: https://github.com/apache/datafusion-ballista/issues/1178 Referencing https://datafusion.apache.org/ballista/user-guide/deployment/kubernetes.html When running `docker pull ghcr.io/apache/datafusion-ballista-standalone:0.12.0-rc4`

[PR] Implement predicate pruning for not like expressions [datafusion]

2025-02-09 Thread via GitHub
UBarney opened a new pull request, #14567: URL: https://github.com/apache/datafusion/pull/14567 ## Which issue does this PR close? - Closes #14053 . ## Rationale for this change ## What changes are included in this PR? If both col_min and col_max match

[I] Improve GroupOrderingPartial performance [datafusion]

2025-02-09 Thread via GitHub
ctsk opened a new issue, #14565: URL: https://github.com/apache/datafusion/issues/14565 The current implementation of GroupOrderingPartial updates its internal state by converting any incoming batch into the row format, and then traversing that format to determine if the sort key changed.

Re: [I] Improve GroupOrderingPartial performance [datafusion]

2025-02-09 Thread via GitHub
ctsk commented on issue #14565: URL: https://github.com/apache/datafusion/issues/14565#issuecomment-2646608996 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[PR] Drop RowConverter from GroupOrderingPartial [datafusion]

2025-02-09 Thread via GitHub
ctsk opened a new pull request, #14566: URL: https://github.com/apache/datafusion/pull/14566 ## Which issue does this PR close? - Closes #14565. ## Rationale for this change Faster is better? ## What changes are included in this PR? #

Re: [I] Implement nested join optimization [datafusion]

2025-02-09 Thread via GitHub
clflushopt commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2646640247 @alamb quick question what's considered higher priority here between better join ordering approach (potentially like DuckDB's) vs picking up the couple tickets left in the EPIC

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2646820171 > > Hmm, my immediate thought here is that if we cannot guarantee correctly coerced inputs during the first pass, then it's probably not worth attempting to coerce in the bui

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2646822350 > That could potentially unlock removing Expr::Wildcard. It's not really an expression (just like Expr::Alias isn't https://github.com/apache/datafusion/issues/1468 and Sort

Re: [PR] feat: metadata columns [datafusion]

2025-02-09 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2646833158 > when I save a table to csv, it will also save rowid into csv. no system will do like this. My problem is with this statement. I don't think there's a universal definition a

Re: [I] Implement physical plan for EXISTS subquery [datafusion]

2025-02-09 Thread via GitHub
logan-keede commented on issue #123: URL: https://github.com/apache/datafusion/issues/123#issuecomment-2646559348 > In case anyone is curious -- we support correlated versions of these queries (via a join) but if there is no correlation (not super useful) we do not > > ❯ create ta

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1948299765 ## datafusion/common/src/utils/mod.rs: ## @@ -602,26 +602,46 @@ pub fn base_type(data_type: &DataType) -> DataType { /// /// let data_type = DataType::List(A

Re: [PR] feat: Add implicit casting to `TypeSignature::String` [datafusion]

2025-02-09 Thread via GitHub
github-actions[bot] closed pull request #13404: feat: Add implicit casting to `TypeSignature::String` URL: https://github.com/apache/datafusion/pull/13404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Question: `to_char(date, timstamp format)` [datafusion]

2025-02-09 Thread via GitHub
xudong963 commented on issue #14536: URL: https://github.com/apache/datafusion/issues/14536#issuecomment-2646775055 > [@xudong963](https://github.com/xudong963) maybe we should make the error message better? Yes, this is a good point -- This is an automated message from the Apache

Re: [I] Current Ballista release is broken? [datafusion-ballista]

2025-02-09 Thread via GitHub
Noah-FetchRewards commented on issue #1179: URL: https://github.com/apache/datafusion-ballista/issues/1179#issuecomment-2646854039 Another error: ./dev/build-set-env.sh: line 21: cargo: command not found [+] Building 0.0s (0/0)

[I] Current Ballista release is broken? [datafusion-ballista]

2025-02-09 Thread via GitHub
Noah-FetchRewards opened a new issue, #1179: URL: https://github.com/apache/datafusion-ballista/issues/1179 I'm running the build image command for kuberentes using the documentation at: https://datafusion.apache.org/ballista/user-guide/deployment/kubernetes.html Using: `git clone

Re: [PR] feat: metadata columns [datafusion]

2025-02-09 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2646850170 > > when I save a table to csv, it will also save rowid into csv. no system will do like this. > > My problem is with this statement. I don't think there's a universal defi

Re: [I] Current Ballista release is broken? [datafusion-ballista]

2025-02-09 Thread via GitHub
Noah-FetchRewards commented on issue #1179: URL: https://github.com/apache/datafusion-ballista/issues/1179#issuecomment-2646856304 Just realizing, that the new docker pull command is found at: docker pull ghcr.io/apache/arrow-ballista-standalone:43.0.0-rc2 Found this shoveling

[PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
xudong963 opened a new pull request, #14568: URL: https://github.com/apache/datafusion/pull/14568 ## Which issue does this PR close? - Closes #. ## Rationale for this change It's clear that `fetch` will miss after removing SPM ## What changes are in

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
xudong963 closed pull request #14568: Fix: limit is missing after removing SPM URL: https://github.com/apache/datafusion/pull/14568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Construct source plan schema with correct nullability during `INSERT` planning. [datafusion]

2025-02-09 Thread via GitHub
zhuqi-lucas commented on issue #14550: URL: https://github.com/apache/datafusion/issues/14550#issuecomment-2646894154 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
xudong963 opened a new pull request, #14569: URL: https://github.com/apache/datafusion/pull/14569 ## Rationale for this change It's clear that fetch will miss after removing SPM ## What changes are included in this PR? If SPM is with fetch, we won't remove it.

[PR] Remove useless test util [datafusion]

2025-02-09 Thread via GitHub
xudong963 opened a new pull request, #14570: URL: https://github.com/apache/datafusion/pull/14570 Just code 🧹 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Document PREPARE statements [datafusion]

2025-02-09 Thread via GitHub
dhegberg commented on issue #13570: URL: https://github.com/apache/datafusion/issues/13570#issuecomment-2647102910 I've started to put together some basic documentation for prepared statements, but when generating examples I'm struggling to use named parameters. Using `SessionContext:

Re: [I] Extended tests are failing on main [datafusion]

2025-02-09 Thread via GitHub
ozankabak commented on issue #14549: URL: https://github.com/apache/datafusion/issues/14549#issuecomment-2647105941 Extended tests are still failing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
zhuqi-lucas commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948494619 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(())

[PR] bug: improve schema checking for instert into cases [datafusion]

2025-02-09 Thread via GitHub
zhuqi-lucas opened a new pull request, #14572: URL: https://github.com/apache/datafusion/pull/14572 ## Which issue does this PR close? Describe the bug In, https://github.com/apache/datafusion/issues/14394, it was reported that while attempting to implement a DataSink different sch

Re: [PR] Fix: limit is missing after removing SPM [datafusion]

2025-02-09 Thread via GitHub
zhuqi-lucas commented on code in PR #14569: URL: https://github.com/apache/datafusion/pull/14569#discussion_r1948492572 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -1943,6 +1943,30 @@ async fn test_remove_unnecessary_spm1() -> Result<()> { Ok(())

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-09 Thread via GitHub
Weijun-H commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646145904 > Impressive work! I got a suggestion and a high-level question: > > ### Suggestion > I think to justify this change, we have to make sure: > > * No performance regre

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948051074 ## datafusion/expr-common/src/signature.rs: ## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( Review Comm

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948051074 ## datafusion/expr-common/src/signature.rs: ## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( Review Comm

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-09 Thread via GitHub
berkaysynnada commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646154037 > * No performance regression (benchmarks already showed) > * Reduce memory footprint, for queries which batch can accumulate in `RepartitionExec` (as the origin issue said)

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948051074 ## datafusion/expr-common/src/signature.rs: ## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( Review Comm

Re: [PR] chore: generate change log for 44.0.0 [datafusion-ballista]

2025-02-09 Thread via GitHub
milenkovicm commented on PR #1173: URL: https://github.com/apache/datafusion-ballista/pull/1173#issuecomment-2646611454 when you get chance it would be great if we could release ver.44 @andygrove. please merge #1171, #1175 before ver.44 release. thanks a lot on your help ! -- This

Re: [I] Support accessing a map with non-literal key [datafusion]

2025-02-09 Thread via GitHub
Lordworms commented on issue #14552: URL: https://github.com/apache/datafusion/issues/14552#issuecomment-2646572272 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-09 Thread via GitHub
rkrishn7 commented on code in PR #14538: URL: https://github.com/apache/datafusion/pull/14538#discussion_r1948419349 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -0,0 +1,264 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] feat: Implement UNION ALL BY NAME [datafusion]

2025-02-09 Thread via GitHub
rkrishn7 commented on code in PR #14538: URL: https://github.com/apache/datafusion/pull/14538#discussion_r1948419729 ## datafusion/sqllogictest/test_files/union_by_name.slt: ## @@ -0,0 +1,264 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review Comment: Mo

Re: [PR] feat: metadata columns [datafusion]

2025-02-09 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2646922625 > > when I save a table to csv, it will also save rowid into csv. no system will do like this. > > My problem is with this statement. I don't think there's a universal defi

Re: [PR] Enable custom dialects to support `MATCH() AGAINST()` [datafusion-sqlparser-rs]

2025-02-09 Thread via GitHub
iffyio merged PR #1719: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: replace simple with complex UDF implementation [datafusion-python]

2025-02-09 Thread via GitHub
milenkovicm closed pull request #1003: feat: replace simple with complex UDF implementation URL: https://github.com/apache/datafusion-python/pull/1003 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Implement faster join traversal [datafusion]

2025-02-09 Thread via GitHub
Dandandan commented on PR #14539: URL: https://github.com/apache/datafusion/pull/14539#issuecomment-2646565220 seeing some regressions in imdb benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-09 Thread via GitHub
goldmedal commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2646780333 > Since `exprlist_to_fields` is called in the builder, it seems that wildcard expansion still hasn't been delayed. > I see. I think we can revert the change https://gith

Re: [PR] feat: metadata columns [datafusion]

2025-02-09 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2646781160 > > if a system/metadata column is not selected in project, after project, in spark system/metadata column's still a valid system/metadata column, but it's not that in #14362 >

Re: [I] Question: `to_char(date, timstamp format)` [datafusion]

2025-02-09 Thread via GitHub
matthewmturner commented on issue #14536: URL: https://github.com/apache/datafusion/issues/14536#issuecomment-2646783166 I had the impression (although perhaps it is dated) that datafusion sought to be compatible with postgres to the extent reasonable. Assuming thats still the case is ther

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1948299517 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1948299517 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,13 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd,

Re: [PR] feat: metadata columns [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2646732851 > if a system/metadata column is not selected in project, after project, in spark system/metadata column's still a valid system/metadata column, but it's not that in https://githu

Re: [PR] Remove useless test util [datafusion]

2025-02-09 Thread via GitHub
xudong963 merged PR #14570: URL: https://github.com/apache/datafusion/pull/14570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] minor: Move file compression [datafusion]

2025-02-09 Thread via GitHub
alamb commented on code in PR #14555: URL: https://github.com/apache/datafusion/pull/14555#discussion_r1948063270 ## datafusion/core/Cargo.toml: ## @@ -43,7 +43,7 @@ array_expressions = ["nested_expressions"] # Used to enable the avro format avro = ["apache-avro", "num-traits"

Re: [I] Why does `PruningPredicate` reference a `row_count` for each column? [datafusion]

2025-02-09 Thread via GitHub
alamb closed issue #13836: Why does `PruningPredicate` reference a `row_count` for each column? URL: https://github.com/apache/datafusion/issues/13836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948058156 ## datafusion/expr-common/src/signature.rs: ## @@ -460,6 +521,44 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone, Eq

Re: [PR] Make it easier to create a ScalarValure representing typed null (#14548) [datafusion]

2025-02-09 Thread via GitHub
alamb commented on code in PR #14558: URL: https://github.com/apache/datafusion/pull/14558#discussion_r1948061227 ## datafusion/common/src/scalar/mod.rs: ## @@ -974,6 +974,129 @@ impl ScalarValue { ) } +/// Create a Null instance of ScalarValue for this datat

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2646195392 > but it would also be useful to be able to use the wildcard in the return_type function as well. Using TypeSignatureClass:Timestamp can represent wildcard of timestamp. De

[I] Make it easier to use rust DataFusion UDFs in datafusion-python [datafusion-python]

2025-02-09 Thread via GitHub
timsaucer opened a new issue, #1017: URL: https://github.com/apache/datafusion-python/issues/1017 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Suppose someone wants to build a library that is usable by both rust and python

Re: [PR] Implement faster join traversal [datafusion]

2025-02-09 Thread via GitHub
Dandandan commented on PR #14539: URL: https://github.com/apache/datafusion/pull/14539#issuecomment-2646347055 This is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Emit warning with attached `Diagnostic` when doing `= NULL` [datafusion]

2025-02-09 Thread via GitHub
ugoa commented on issue #14434: URL: https://github.com/apache/datafusion/issues/14434#issuecomment-2646349335 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[I] Expose user defined functions in the FFI [datafusion]

2025-02-09 Thread via GitHub
timsaucer opened a new issue, #14562: URL: https://github.com/apache/datafusion/issues/14562 ### Is your feature request related to a problem or challenge? By adding the foreign function interface to this project, we exposed a host of features to end users. It allows for easy integrat

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-09 Thread via GitHub
2010YOUY01 commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646218873 > Hi @2010YOUY01. I'd like to thank you firstly for this investigation. I actually expect higher memory consumption—especially in systems where the upstream part of the Repartitio

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-09 Thread via GitHub
alamb commented on PR #14521: URL: https://github.com/apache/datafusion/pull/14521#issuecomment-2646172628 I merged up from main to resolve a conflict with this branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Update proto to support to/from json with an extension codec [datafusion]

2025-02-09 Thread via GitHub
alamb commented on code in PR #14561: URL: https://github.com/apache/datafusion/pull/14561#discussion_r1948062976 ## datafusion/proto/src/bytes/mod.rs: ## @@ -199,11 +199,7 @@ pub fn logical_plan_to_bytes(plan: &LogicalPlan) -> Result { #[cfg(feature = "json")] pub fn logical

Re: [I] Update proto to support to/from json with an extension codec [datafusion]

2025-02-09 Thread via GitHub
alamb closed issue #14560: Update proto to support to/from json with an extension codec URL: https://github.com/apache/datafusion/issues/14560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Update proto to support to/from json with an extension codec [datafusion]

2025-02-09 Thread via GitHub
alamb merged PR #14561: URL: https://github.com/apache/datafusion/pull/14561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948062578 ## datafusion/expr-common/src/signature.rs: ## @@ -209,14 +210,13 @@ impl TypeSignature { #[derive(Debug, Clone, Eq, PartialEq, PartialOrd, Hash)] pub enum Ty

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-09 Thread via GitHub
alamb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2646174704 I merge this PR up from main locally and ran the tests again to be sure and everything looks good. Thanks @adriangb ❤️ -- This is an automated message from the Apache Git Service.

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-09 Thread via GitHub
berkaysynnada commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646294361 > The point is: I think there should be no memory-bloat issue in TPCH/clickbench queries caused by `RepartitionExec`, just wondering do you have any bad query can reproduce the

Re: [PR] Implement faster join traversal [datafusion]

2025-02-09 Thread via GitHub
Dandandan commented on PR #14539: URL: https://github.com/apache/datafusion/pull/14539#issuecomment-2646327023 Update after implementing emitting in batch size: in memory performs about the same as before, but tpch_10 regressed compared to earlier implementation (so seems more having to wit

[PR] Enable custom dialects to support `MATCH() AGAINST()` [datafusion-sqlparser-rs]

2025-02-09 Thread via GitHub
joocer opened a new pull request, #1719: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1719 The logic for if a dialect supports `MATCH() AGAINST()` was coded into the parser to look for specific dialects rather than the dialects attesting they support the syntax. This meant th

Re: [PR] chore(deps): bump nix from 0.28.0 to 0.29.0 [datafusion]

2025-02-09 Thread via GitHub
alamb merged PR #14559: URL: https://github.com/apache/datafusion/pull/14559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-09 Thread via GitHub
alamb commented on PR #14544: URL: https://github.com/apache/datafusion/pull/14544#issuecomment-2646174275 Close/reopent o start CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Test all examples from library-user-guide & user-guide docs [datafusion]

2025-02-09 Thread via GitHub
alamb closed pull request #14544: Test all examples from library-user-guide & user-guide docs URL: https://github.com/apache/datafusion/pull/14544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-09 Thread via GitHub
alamb merged PR #14295: URL: https://github.com/apache/datafusion/pull/14295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add Common Subexpression Elimination for `PhysicalExpr` trees [datafusion]

2025-02-09 Thread via GitHub
peter-toth commented on PR #13046: URL: https://github.com/apache/datafusion/pull/13046#issuecomment-2646358282 @andygrove , I've updated the PR from `main`. Please note that the new `EliminateCommonPhysicalSubexprs` rule is not part of the default `PhysicalOptimizer` as the rule has no u

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-09 Thread via GitHub
Weijun-H commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646358808 I believe we cannot use the customized channel `DistributionReceiver` currently, as `OnDemandRepartitionExec` can prevent channels from filling up endlessly. Additionally, I noticed

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-09 Thread via GitHub
2010YOUY01 commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2646119135 Impressive work! I got a suggestion and a high-level question: ### Suggestion I think to justify this change, we have to make sure: - No performance regression (benchma

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948037292 ## datafusion/expr-common/src/signature.rs: ## @@ -431,6 +463,35 @@ impl TypeSignature { } } +fn get_possible_types_from_signature_classes( +signatu

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948036125 ## datafusion/common/src/types/builtin.rs: ## @@ -47,3 +49,11 @@ singleton!(LOGICAL_FLOAT64, logical_float64, Float64); singleton!(LOGICAL_DATE, logical_date,

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-09 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1948036125 ## datafusion/common/src/types/builtin.rs: ## @@ -47,3 +49,11 @@ singleton!(LOGICAL_FLOAT64, logical_float64, Float64); singleton!(LOGICAL_DATE, logical_date,

[PR] feat: `INSERT INTO` support [datafusion-ballista]

2025-02-09 Thread via GitHub
milenkovicm opened a new pull request, #1177: URL: https://github.com/apache/datafusion-ballista/pull/1177 # Which issue does this PR close? Depends on: #1176 Closes #1164. # Rationale for this change Add support for insert into `INSERT INTO` # What changes ar

[PR] Benchmark showcasing with_column and with_column_renamed function performance [datafusion]

2025-02-09 Thread via GitHub
Omega359 opened a new pull request, #14564: URL: https://github.com/apache/datafusion/pull/14564 ... or lack thereof ## Which issue does this PR close? Part of #14563 ## Rationale for this change Benchmark showcasing the issue. ## What changes are included

Re: [I] `UnwrapCastInComparison` produces incorrect results [datafusion]

2025-02-09 Thread via GitHub
Spaarsh commented on issue #14303: URL: https://github.com/apache/datafusion/issues/14303#issuecomment-2646433642 I this comment here (though erroneous) explains that this is the expected behavior of the optimizer: https://github.com/apache/datafusion/blob/9c12919786be0cfce5c4817101a

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-02-09 Thread via GitHub
Spaarsh commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2646416918 I began contributing to this repo only from a month ago so please pardon an errs from my side, but I just wanted to suggest something. Since we're planning of having a sepa

Re: [PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]

2025-02-09 Thread via GitHub
Omega359 commented on PR #14367: URL: https://github.com/apache/datafusion/pull/14367#issuecomment-2646418656 While I can see opportunities for improvements in the code I think they are relatively minor and this PR is suitable for inclusion in DF. Thanks! @alamb -- This is an aut

[I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-09 Thread via GitHub
Omega359 opened a new issue, #14563: URL: https://github.com/apache/datafusion/issues/14563 ### Describe the bug Dataframe functions `.with_column` and `.with_column_renamed` (and possibly others) are slow. One can really see this in dataframe's with many many columns where a .with_c

Re: [PR] wip: proto to physical plan conversion [datafusion]

2025-02-09 Thread via GitHub
jatin510 commented on code in PR #14530: URL: https://github.com/apache/datafusion/pull/14530#discussion_r1948219182 ## datafusion/proto/proto/datafusion.proto: ## @@ -798,6 +799,19 @@ message UnnestExecNode { UnnestOptions options = 5; } +message MemoryExecNode { Review

Re: [I] Make it easier to use rust DataFusion UDFs in datafusion-python [datafusion-python]

2025-02-09 Thread via GitHub
Spaarsh commented on issue #1017: URL: https://github.com/apache/datafusion-python/issues/1017#issuecomment-2646548490 @timsaucer so if I understood this correctly, we are trying to either create an ```FFI_UDF``` that is capable of ingesting different types of UDF or, we create an FFI for

Re: [I] Make it easier to use rust DataFusion UDFs in datafusion-python [datafusion-python]

2025-02-09 Thread via GitHub
timsaucer commented on issue #1017: URL: https://github.com/apache/datafusion-python/issues/1017#issuecomment-2646550034 @Spaarsh I’m thinking the latter. I should have a draft PR for the scalar variant in the next few days to demonstrate. -- This is an automated message from the Apache

Re: [PR] start refactoring process by setting up base + init [datafusion]

2025-02-09 Thread via GitHub
logan-keede commented on PR #14306: URL: https://github.com/apache/datafusion/pull/14306#issuecomment-2646553608 @Rachelint this is just a reminder. Please disregard if this isn't needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Make it easier to use rust DataFusion UDFs in datafusion-python [datafusion-python]

2025-02-09 Thread via GitHub
Spaarsh commented on issue #1017: URL: https://github.com/apache/datafusion-python/issues/1017#issuecomment-2646557713 @timsaucer okay! If no one is doing this already, I will try and understand how it can be done for Aggregator or Window UDFs. Though I am unclear of the approach, I am goi