[PR] chore(deps): bump rand_distr from 0.4.3 to 0.5.1 [datafusion]

2025-02-21 Thread via GitHub
dependabot[bot] opened a new pull request, #14807: URL: https://github.com/apache/datafusion/pull/14807 Bumps [rand_distr](https://github.com/rust-random/rand_distr) from 0.4.3 to 0.5.1. Release notes Sourced from https://github.com/rust-random/rand_distr/releases";>rand_distr's r

Re: [PR] chore(deps): update rand_distr requirement from 0.4.3 to 0.5.0 [datafusion]

2025-02-21 Thread via GitHub
dependabot[bot] closed pull request #14334: chore(deps): update rand_distr requirement from 0.4.3 to 0.5.0 URL: https://github.com/apache/datafusion/pull/14334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] chore(deps): update rand_distr requirement from 0.4.3 to 0.5.0 [datafusion]

2025-02-21 Thread via GitHub
dependabot[bot] commented on PR #14334: URL: https://github.com/apache/datafusion/pull/14334#issuecomment-2673890329 Superseded by #14807. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2673896595 @timsaucer I believe improving Python bindings in Apache DataFusion would be a great step forward in making it more accessible to data engineers and analysts. Expan

Re: [PR] chore(deps): bump testcontainers from 0.23.2 to 0.23.3 [datafusion]

2025-02-21 Thread via GitHub
jonahgao merged PR #14787: URL: https://github.com/apache/datafusion/pull/14787 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Fix test not to litter in repository [datafusion]

2025-02-21 Thread via GitHub
findepi merged PR #14795: URL: https://github.com/apache/datafusion/pull/14795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] Create more user friendly aliases from `col` [datafusion-python]

2025-02-21 Thread via GitHub
sidshehria commented on issue #754: URL: https://github.com/apache/datafusion-python/issues/754#issuecomment-2673992155 **Problem Statement:** - When using DataFusion in Python, column names often include a default alias like `?table?` . - This can make the column names less u

[PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-21 Thread via GitHub
berkaysynnada opened a new pull request, #14813: URL: https://github.com/apache/datafusion/pull/14813 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/pull/14271#discussion_r1937319791. ## Rationale for this change https://

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2674736651 > [@kazuyukitanimura](https://github.com/kazuyukitanimura) I am not sure but I think the slowness comes from CometExchange that is executed after the join with BuildLeft

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#issuecomment-2674739703 > @andygrove Thanks for opening this PR! I have one questions though. > > I also tried to apply the same build side selection logic but found that with multi executors,

Re: [PR] Update Community Events in concepts-readings-events.md [datafusion]

2025-02-21 Thread via GitHub
oznur-synnada commented on PR #14629: URL: https://github.com/apache/datafusion/pull/14629#issuecomment-2674796742 I'd definitely be willing to do so, but I'm going to need someone to help me list what we need to mention and what details they should include. -- This is an automated messag

Re: [PR] Implemented `simplify` for the `starts_with` function to convert it into a LIKE expression. [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on PR #14119: URL: https://github.com/apache/datafusion/pull/14119#issuecomment-2674797113 > The challenge at the moment is that PruningPredicate can't refer directly to the function implementations I see. This might be the tradeoff -- This is an automated mess

Re: [PR] Implemented `simplify` for the `starts_with` function to convert it into a LIKE expression. [datafusion]

2025-02-21 Thread via GitHub
adriangb commented on PR #14119: URL: https://github.com/apache/datafusion/pull/14119#issuecomment-2674803103 Yes that makes sense, but presumably that should happen once per pattern not once per row and be quite fast, especially for the common case of a prefix search, so I'd guess it's neg

Re: [PR] Bump MSRV to 1.82, toolchain to 1.85 [datafusion]

2025-02-21 Thread via GitHub
comphead merged PR #14811: URL: https://github.com/apache/datafusion/pull/14811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-21 Thread via GitHub
comphead commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1965696512 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-21 Thread via GitHub
comphead commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1965698780 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified

Re: [PR] Feat/ffi scalar udf [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on PR #1033: URL: https://github.com/apache/datafusion-python/pull/1033#issuecomment-2675401682 Putting into draft because we need the upstream DataFusion repository to release version 46 before this can be enabled. -- This is an automated message from the Apache Git

[PR] Feat/ffi scalar udf [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer opened a new pull request, #1033: URL: https://github.com/apache/datafusion-python/pull/1033 # Which issue does this PR close? This addresses part of #1017 - the scalar UDFs # Rationale for this change This change enables users who have written DataFusion scala

Re: [I] Add some DataFrame method(s) to combine two inputs where the schema can be different [datafusion]

2025-02-21 Thread via GitHub
Omega359 commented on issue #12650: URL: https://github.com/apache/datafusion/issues/12650#issuecomment-2675405695 This should be much easier to implement now that https://github.com/apache/datafusion/issues/14508 has landed -- This is an automated message from the Apache Git Service. To

Re: [PR] Window Functions Order Conservation -- Follow-up On Set Monotonicity [datafusion]

2025-02-21 Thread via GitHub
berkaysynnada commented on code in PR #14813: URL: https://github.com/apache/datafusion/pull/14813#discussion_r1965740381 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -222,208 +227,6 @@ async fn test_remove_unnecessary_sort5() -> Result<()> { Ok(())

Re: [I] Comet native shuffle reader [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #1125: URL: https://github.com/apache/datafusion-comet/issues/1125#issuecomment-2675433947 I am closing this issue for now because I no longer belive it to be a priority. We can reopen if needed. -- This is an automated message from the Apache Git Service. To

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove closed issue #708: Implement native version of ColumnarToRow URL: https://github.com/apache/datafusion-comet/issues/708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] date_part is calculating results incorrectly for intervals [datafusion]

2025-02-21 Thread via GitHub
Omega359 opened a new issue, #14817: URL: https://github.com/apache/datafusion/issues/14817 ### Describe the bug Splitting out from https://github.com/apache/datafusion/issues/14738#issuecomment-2666570269: ```sql SELECT date_part('seconds', interval '1 hour'); -- re

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion]

2025-02-21 Thread via GitHub
Omega359 commented on issue #14771: URL: https://github.com/apache/datafusion/issues/14771#issuecomment-2675418811 I would posit that the behaviour should in general mirror postgresql unless there is a good reason to not to. -- This is an automated message from the Apache Git Service. To

Re: [I] Comet native shuffle reader [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove closed issue #1125: Comet native shuffle reader URL: https://github.com/apache/datafusion-comet/issues/1125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966105015 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor licen

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
kevinjqliu commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966110882 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor lice

Re: [I] date_part is calculating results incorrectly for intervals [datafusion]

2025-02-21 Thread via GitHub
Omega359 commented on issue #14817: URL: https://github.com/apache/datafusion/issues/14817#issuecomment-2675449565 It seems that duckdb is also following the interval rules like pg for date_part - https://duckdb.org/docs/sql/data_types/interval.html ```sql D SELECT datepart('second

[PR] chore: fix clippy after rust 1.85 update [datafusion-ballista]

2025-02-21 Thread via GitHub
milenkovicm opened a new pull request, #1188: URL: https://github.com/apache/datafusion-ballista/pull/1188 # Which issue does this PR close? Closes #. # Rationale for this change Fix clippy after rust 1.85 update # What changes are included in this PR? # A

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966118342 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSpa

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1966124188 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -31,14 +32,29 @@ import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec,

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1966123689 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -48,7 +64,7 @@ object RewriteJoin extends JoinSelectionHelper { def rewrite(plan

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1966123856 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -31,14 +32,29 @@ import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec,

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966133439 ## spark/src/main/scala/org/apache/comet/GenerateDocs.scala: ## @@ -69,7 +69,8 @@ object GenerateDocs { w.write("|-|-|-|\n".getBytes) for

Re: [PR] test: Register Spark-compatible expressions with a DataFusion context [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on PR #1432: URL: https://github.com/apache/datafusion-comet/pull/1432#issuecomment-2675473580 Thanks @viczsaurav looks like the format checks are failing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: Support IntegralDivide function [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1428: URL: https://github.com/apache/datafusion-comet/pull/1428#discussion_r1966140038 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2641,4 +2641,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966132652 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlan

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675258479 @timsaucer Thanks for the clarity! I understand the explanation on the DataFrame API, lazy mode of evaluation, and Pandas/Polars integration better. I will re

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-02-21 Thread via GitHub
duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2675276468 From what is see in current code, this struct `PullUpCorrelatedExpr` is applied for scalar subquery as well as predicate subquery. For that paper, i'll try my best,

Re: [PR] perf: Update RewriteJoin logic to choose optimal build side [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on code in PR #1424: URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1965987283 ## spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala: ## @@ -48,7 +64,7 @@ object RewriteJoin extends JoinSelectionHelper { def rewri

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#issuecomment-2675282815 Per the vote we have 3 PMCs with +1: https://lists.apache.org/thread/1nvpzpdkxjz17kmlg4wlty7pt5y6jvh4 I am moving this to ready, but I will need a PMC to do the final s

Re: [PR] chore: Update protobuf to 3.25.5 [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on PR #1434: URL: https://github.com/apache/datafusion-comet/pull/1434#issuecomment-2675202150 Merged thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Support `UNNEST` as table function (UDTF) [datafusion]

2025-02-21 Thread via GitHub
waynexia commented on issue #14801: URL: https://github.com/apache/datafusion/issues/14801#issuecomment-2675201056 Thank you @jonahgao 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] chore: Update protobuf to 3.25.5 [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura merged PR #1434: URL: https://github.com/apache/datafusion-comet/pull/1434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] chore: Update guava to 33.2.1-jre [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura merged PR #1435: URL: https://github.com/apache/datafusion-comet/pull/1435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2675237730 For the high level abstractions, I believe these are already met. The DataFrame API is available and widely used (in fact, its the only way I personally use it). The [co

[I] [Discussion] Efficient Row Selection for Multi-Engine Support [datafusion]

2025-02-21 Thread via GitHub
Arpit-Bandejiya opened a new issue, #14816: URL: https://github.com/apache/datafusion/issues/14816 BackgroundWe have an usecase where data is stored in multiple engines/formats and Parquet is the primary format containing all the data. While text queries are handled by inverted index format

[PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
lovasoa opened a new pull request, #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Implemented `simplify` for the `starts_with` function to convert it into a LIKE expression. [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on PR #14119: URL: https://github.com/apache/datafusion/pull/14119#issuecomment-2674769799 I wonder if converting `starts_with` to `like` add overhead. https://github.com/apache/arrow-rs/blob/a0c3186c55ac8ed3f6b8a15d1305548fd6305ebb/arrow-string/src/predicate.rs#L

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1965956601 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1966030422 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2265,6 +2265,35 @@ select array_sort([]); [] +# test with null arguments +# expected error: +#

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-21 Thread via GitHub
comphead commented on code in PR #14769: URL: https://github.com/apache/datafusion/pull/14769#discussion_r1965693790 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1926,6 +1930,71 @@ impl DataFrame { plan, }) } + +/// Fill null values in specified

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
kevinjqliu commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966017216 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor lice

Re: [PR] Add user documentation for the FFI approach [datafusion-python]

2025-02-21 Thread via GitHub
kevinjqliu commented on code in PR #1031: URL: https://github.com/apache/datafusion-python/pull/1031#discussion_r1966029158 ## docs/source/contributor-guide/ffi.rst: ## @@ -0,0 +1,212 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor lice

Re: [PR] [WIP] Store spans for Value expressions [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
lovasoa commented on PR #1738: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1738#issuecomment-2675136484 feel free to edit the code directly without asking me first :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] chore: Update guava to 33.2.1-jre [datafusion-comet]

2025-02-21 Thread via GitHub
kazuyukitanimura commented on PR #1435: URL: https://github.com/apache/datafusion-comet/pull/1435#issuecomment-2675203415 Merged thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Cancellation benchmark [datafusion]

2025-02-21 Thread via GitHub
carols10cents opened a new pull request, #14818: URL: https://github.com/apache/datafusion/pull/14818 ## Which issue does this PR close? Connects to #14036 (does not close it). ## Rationale for this change The behavior observed in #14036 was hard to reproduce and quantify

[PR] Improve benchmark docs [datafusion]

2025-02-21 Thread via GitHub
carols10cents opened a new pull request, #14820: URL: https://github.com/apache/datafusion/pull/14820 ## Which issue does this PR close? - Closes #14819. ## Rationale for this change I added a new benchmark in #14818. There wasn't documentation on how to add a new benchm

Re: [PR] Cancellation benchmark [datafusion]

2025-02-21 Thread via GitHub
carols10cents commented on PR #14818: URL: https://github.com/apache/datafusion/pull/14818#issuecomment-2675566618 Let's see if this works too /benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Add documentation for why FFI is needed [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer closed issue #1027: Add documentation for why FFI is needed URL: https://github.com/apache/datafusion-python/issues/1027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Upgrade to Rust 1.85 [datafusion]

2025-02-21 Thread via GitHub
comphead closed issue #14808: Upgrade to Rust 1.85 URL: https://github.com/apache/datafusion/issues/14808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] fix: enable full decimal to decimal support [datafusion-comet]

2025-02-21 Thread via GitHub
himadripal commented on code in PR #1385: URL: https://github.com/apache/datafusion-comet/pull/1385#discussion_r1966132652 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1126,27 +1129,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlan

Re: [PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-21 Thread via GitHub
alan910127 commented on code in PR #14737: URL: https://github.com/apache/datafusion/pull/14737#discussion_r1965962598 ## datafusion/functions-nested/src/sort.rs: ## @@ -143,6 +169,13 @@ pub fn array_sort_inner(args: &[ArrayRef]) -> Result { return exec_err!("array_sor

[I] Shuffle spilled_bytes metric is incorrect [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove opened a new issue, #1437: URL: https://github.com/apache/datafusion-comet/issues/1437 ### Describe the bug In `ShuffleWriterExec`, we are writing incorrect data for `spilled_bytes`. We are adding the size of the current memory reservation rather than the number of bytes wr

Re: [I] Implement native version of ColumnarToRow [datafusion-comet]

2025-02-21 Thread via GitHub
andygrove commented on issue #708: URL: https://github.com/apache/datafusion-comet/issues/708#issuecomment-2675431383 I am closing this issue for now because I believe that we determined that this is no longer a priority. We can reopen the issue if this changes. -- This is an automated m

Re: [PR] fix: type checking [datafusion-python]

2025-02-21 Thread via GitHub
timsaucer commented on code in PR #993: URL: https://github.com/apache/datafusion-python/pull/993#discussion_r1966217074 ## python/datafusion/context.py: ## @@ -783,7 +783,9 @@ def register_parquet( file_extension, skip_metadata, schema, -

Re: [PR] [wip] replace TypeSignature::String with TypeSignature::Coercible [datafusion]

2025-02-21 Thread via GitHub
zjregee commented on PR #14812: URL: https://github.com/apache/datafusion/pull/14812#issuecomment-2674530942 Hi, @jayzhan211. I encountered some problems when trying to discard `TypeSignature::String`. After a simple direct replacement, the original test failed. The specific test fa

Re: [PR] fix: normalize column names in table constraints [datafusion]

2025-02-21 Thread via GitHub
alamb commented on code in PR #14794: URL: https://github.com/apache/datafusion/pull/14794#discussion_r1965402509 ## datafusion/sqllogictest/test_files/ddl.slt: ## @@ -828,3 +828,39 @@ drop table table_with_pk; statement ok set datafusion.catalog.information_schema = false;

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
alamb commented on PR #14699: URL: https://github.com/apache/datafusion/pull/14699#issuecomment-2674449689 FYI @clflushopt as I think this may be related to this as well - https://github.com/apache/datafusion/pull/14735 -- This is an automated message from the Apache Git Service. To re

Re: [PR] StatisticsV2: initial statistics framework redesign [datafusion]

2025-02-21 Thread via GitHub
alamb commented on code in PR #14699: URL: https://github.com/apache/datafusion/pull/14699#discussion_r1965417121 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -1119,11 +1180,11 @@ fn next_value_helper(value: ScalarValue) -> ScalarValue { match value {

Re: [PR] support simple/cross lateral joins [datafusion]

2025-02-21 Thread via GitHub
peter-toth commented on code in PR #14595: URL: https://github.com/apache/datafusion/pull/14595#discussion_r1965506306 ## datafusion/optimizer/src/decorrelate_lateral_join.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: Improve datafusion-cli memory usage and considering reserve mem… [datafusion]

2025-02-21 Thread via GitHub
Dandandan commented on code in PR #14766: URL: https://github.com/apache/datafusion/pull/14766#discussion_r1965517356 ## datafusion-cli/src/exec.rs: ## @@ -247,10 +253,29 @@ pub(super) async fn exec_and_print( let stream = execute_stream(physical_plan, task_ctx.clon

Re: [PR] [wip] replace TypeSignature::String with TypeSignature::Coercible [datafusion]

2025-02-21 Thread via GitHub
jayzhan211 commented on PR #14812: URL: https://github.com/apache/datafusion/pull/14812#issuecomment-2674648296 https://github.com/apache/datafusion/blob/9ca09cf8f769a3f0a64dbc87ec84eb6fe08b36f6/datafusion/functions/src/string/starts_with.rs#L98-L138 The problem seems like you need to

Re: [PR] Allow `FileSource`-specific repartitioning [datafusion]

2025-02-21 Thread via GitHub
alamb commented on code in PR #14754: URL: https://github.com/apache/datafusion/pull/14754#discussion_r1965394266 ## datafusion/core/src/datasource/physical_plan/avro.rs: ## @@ -255,10 +255,15 @@ impl FileSource for AvroSource { fn file_type(&self) -> &str { "avro"

[PR] [wip] replace TypeSignature::String with TypeSignature::Coercible [datafusion]

2025-02-21 Thread via GitHub
zjregee opened a new pull request, #14812: URL: https://github.com/apache/datafusion/pull/14812 ## Which issue does this PR close? - Closes #14759. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ## Are there any user-facing ch

Re: [PR] feat: use edition 2024 [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
iffyio commented on PR #1736: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1736#issuecomment-2674478094 Oh good point, yeah it does look like MSRV is 1.85 for 2024 edition, did a quick test ``` Caused by: feature `edition2024` is required The package requi

Re: [PR] Allow `FileSource`-specific repartitioning [datafusion]

2025-02-21 Thread via GitHub
alamb commented on PR #14754: URL: https://github.com/apache/datafusion/pull/14754#issuecomment-2674419198 THanks again @AdamGS and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] refactor: move `DataSource` to `datafusion-datasource` [datafusion]

2025-02-21 Thread via GitHub
alamb commented on code in PR #14671: URL: https://github.com/apache/datafusion/pull/14671#discussion_r1965352142 ## datafusion/physical-plan/src/test.rs: ## @@ -49,8 +49,15 @@ use futures::{Future, FutureExt}; pub mod exec; +/// `TestMemoryExec` is a mock equivalent to [`M

Re: [PR] feat: adjust create and drop trigger for mysql dialect [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
invm commented on code in PR #1734: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1734#discussion_r1965349013 ## src/parser/mod.rs: ## @@ -4970,14 +4970,19 @@ impl<'a> Parser<'a> { /// DROP TRIGGER [ IF EXISTS ] name ON table_name [ CASCADE | RESTRICT ]

Re: [PR] feat: adjust create and drop trigger for mysql dialect [datafusion-sqlparser-rs]

2025-02-21 Thread via GitHub
invm commented on code in PR #1734: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1734#discussion_r1965350368 ## src/parser/mod.rs: ## @@ -5061,20 +5066,19 @@ impl<'a> Parser<'a> { } pub fn parse_trigger_period(&mut self) -> Result { -Ok( -

Re: [PR] refactor: move `DataSource` to `datafusion-datasource` [datafusion]

2025-02-21 Thread via GitHub
alamb merged PR #14671: URL: https://github.com/apache/datafusion/pull/14671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix Clippy 1.85 warnings [datafusion]

2025-02-21 Thread via GitHub
alamb commented on PR #14800: URL: https://github.com/apache/datafusion/pull/14800#issuecomment-2674346745 Thanks @mbrobbel and @Dandandan I filed a ticket to track the work to update to latest rust - https://github.com/apache/datafusion/pull/1557 -- This is an automated message

Re: [PR] refactor: move `DataSource` to `datafusion-datasource` [datafusion]

2025-02-21 Thread via GitHub
alamb commented on PR #14671: URL: https://github.com/apache/datafusion/pull/14671#issuecomment-2674345092 Thanks @logan-keede and @mertak-synnada -- this looks great and thanks (again) for driving the plan forward -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Fix Clippy 1.85 warnings [datafusion]

2025-02-21 Thread via GitHub
alamb merged PR #14800: URL: https://github.com/apache/datafusion/pull/14800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Improve datafusion-cli memory usage and considering reserve mem… [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas commented on code in PR #14766: URL: https://github.com/apache/datafusion/pull/14766#discussion_r1965369704 ## datafusion-cli/src/exec.rs: ## @@ -247,10 +253,29 @@ pub(super) async fn exec_and_print( let stream = execute_stream(physical_plan, task_ctx.cl

Re: [I] Support `UNNEST` as table function (UDTF) [datafusion]

2025-02-21 Thread via GitHub
jonahgao commented on issue #14801: URL: https://github.com/apache/datafusion/issues/14801#issuecomment-2674055482 That's a great proposal; I plan to try it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] chore: Benchmark deps cleanup [datafusion]

2025-02-21 Thread via GitHub
findepi merged PR #14793: URL: https://github.com/apache/datafusion/pull/14793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Fix Clippy 1.85 warnings [datafusion]

2025-02-21 Thread via GitHub
mbrobbel commented on PR #14800: URL: https://github.com/apache/datafusion/pull/14800#issuecomment-2674366889 > [Update to rust 1.58 #1557](https://github.com/apache/datafusion/pull/1557) I think you meant to link https://github.com/apache/datafusion/issues/14808. -- This is an auto

[PR] Bump MSRV to 1.82, toolchain to 1.85 [datafusion]

2025-02-21 Thread via GitHub
mbrobbel opened a new pull request, #14811: URL: https://github.com/apache/datafusion/pull/14811 ## Which issue does this PR close? - Closes #14808. ## Rationale for this change Rust `1.85.0` is now the latest stable release. ## What changes are included in this PR

Re: [PR] Fix CI fail for extended test (by freeing up more disk space in CI runner) [datafusion]

2025-02-21 Thread via GitHub
alamb commented on PR #14745: URL: https://github.com/apache/datafusion/pull/14745#issuecomment-2674389846 FWIW I double checked and this commit did indeed get the CI passing again https://github.com/apache/datafusion/actions/runs/13450199907/job/37583206394 -- This is an automated

Re: [PR] Add DataFrame fill_null [datafusion]

2025-02-21 Thread via GitHub
comphead commented on PR #14769: URL: https://github.com/apache/datafusion/pull/14769#issuecomment-2674922087 > Thanks @kosiew looks great some minor comments > > I'm surprised though we dont have a documentation with DataFrame API its documented in https://docs.rs/datafusion/l

[I] Datafusion-cli, when the max rows setting inf, we are missing the unlimited case for bounded streaming. [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas opened a new issue, #14814: URL: https://github.com/apache/datafusion/issues/14814 ### Describe the bug https://github.com/apache/datafusion/pull/14766 After above improvement, we improved the datafusion-cli memory usage and memory reservation, but we forgot one cas

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-02-21 Thread via GitHub
djanderson commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1965798308 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] Chore/Add additional FFI unit tests [datafusion]

2025-02-21 Thread via GitHub
timsaucer merged PR #14802: URL: https://github.com/apache/datafusion/pull/14802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] fix: we are missing the unlimited case for bounded streaming when usi… [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas commented on PR #14815: URL: https://github.com/apache/datafusion/pull/14815#issuecomment-2674993678 cc @alamb @2010YOUY01 Found one bug, please help review, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Google Summer of Code - Ideas and Coordination [datafusion-python]

2025-02-21 Thread via GitHub
sidshehria commented on issue #1032: URL: https://github.com/apache/datafusion-python/issues/1032#issuecomment-2674996952 @timsaucer Yes, kind of some solutions I have in my mind Kindly review them, **1. Higher-Level Abstractions:** - Introduce a DataFrame-like API that feels m

Re: [I] Further improve datafusion-cli memory usage if we setting huge number for maxrow size. [datafusion]

2025-02-21 Thread via GitHub
zhuqi-lucas commented on issue #14810: URL: https://github.com/apache/datafusion/issues/14810#issuecomment-2674966485 Thank you @alamb for the great idea. Besides this improvement, i also found a bug for unlimited cases which we are missing for the buffer. Filed a ticket now: h

Re: [PR] chore(deps): bump serde from 1.0.217 to 1.0.218 [datafusion]

2025-02-21 Thread via GitHub
jonahgao merged PR #14788: URL: https://github.com/apache/datafusion/pull/14788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Fix failing UNION [ALL] BY NAME tests due to sanity checker [datafusion]

2025-02-21 Thread via GitHub
alamb commented on issue #14806: URL: https://github.com/apache/datafusion/issues/14806#issuecomment-2674280412 THanks @rkrishn7 FYI @wiedld we may hit similar things in Influx (as we use unions heavily) -- This is an automated message from the Apache Git Service. To respond to the

[I] Upgrade to Rust 1.85 [datafusion]

2025-02-21 Thread via GitHub
alamb opened a new issue, #14808: URL: https://github.com/apache/datafusion/issues/14808 ### Is your feature request related to a problem or challenge? Rust 1.85 is released: https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html Currently DataFusion uses the version of rust spe

  1   2   >