Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-22 Thread via GitHub
xudong963 commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2743593315 > Do you know any use cases where this method would be especially useful? If so, maybe we can study one of those cases in more detail. That could help us understand the real need a

Re: [I] multiply overflow in stats.rs [datafusion]

2025-03-22 Thread via GitHub
LindaSummer commented on issue #13775: URL: https://github.com/apache/datafusion/issues/13775#issuecomment-2746049186 Hi @Speculative , Thanks very much for your investigation! ❤ I'm sorry that in last month due to my personal circumstances, I didn't have enough time to follow

Re: [I] Reduce number of tokio blocking threads in SortExec spill [datafusion]

2025-03-22 Thread via GitHub
alamb commented on issue #15323: URL: https://github.com/apache/datafusion/issues/15323#issuecomment-2744217807 Makes sense -- with 183 spill files, we probably would need to merge in stages For example starting with 183 spill files 1. run 10 jobs, each merging about 10 files into

Re: [PR] Improve performance of `first_value` by implementing special `GroupsAccumulator` [datafusion]

2025-03-22 Thread via GitHub
Dandandan commented on code in PR #15266: URL: https://github.com/apache/datafusion/pull/15266#discussion_r2008942439 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -179,6 +292,420 @@ impl AggregateUDFImpl for FirstValue { } } +struct FirstPrimitiveGroupsAccu

Re: [PR] Always use `PartitionMode::Auto` in planner [datafusion]

2025-03-22 Thread via GitHub
ozankabak commented on code in PR #15339: URL: https://github.com/apache/datafusion/pull/15339#discussion_r2008738870 ## datafusion/sqllogictest/test_files/explain_tree.slt: ## @@ -345,63 +345,68 @@ FROM physical_plan 01)┌───┐ -02)│CoalesceBat

Re: [PR] Only unnest source for `EmptyRelation` [datafusion]

2025-03-22 Thread via GitHub
goldmedal commented on code in PR #15159: URL: https://github.com/apache/datafusion/pull/15159#discussion_r2008701304 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -633,6 +633,18 @@ fn roundtrip_statement_with_dialect() -> Result<()> { parser_dialect: Box::new

Re: [PR] Triggering extended tests through PR comment [datafusion]

2025-03-22 Thread via GitHub
Omega359 commented on PR #15101: URL: https://github.com/apache/datafusion/pull/15101#issuecomment-2743066275 @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2008764438 ## src/dataframe.rs: ## @@ -771,3 +871,82 @@ fn record_batch_into_schema( RecordBatch::try_new(schema, data_arrays) } + +/// This is a helper funct

Re: [PR] Support Avg distinct [datafusion]

2025-03-22 Thread via GitHub
qazxcdswe123 commented on code in PR #15356: URL: https://github.com/apache/datafusion/pull/15356#discussion_r2008792246 ## datafusion/functions-aggregate-common/src/aggregate/avg_distinct/decimal.rs: ## @@ -0,0 +1,133 @@ +// Licensed to the Apache Software Foundation (ASF) unde

[I] Add "end to end parquet reading test" for WASM [datafusion]

2025-03-22 Thread via GitHub
alamb opened a new issue, #15357: URL: https://github.com/apache/datafusion/issues/15357 I think the current code tests the re-exported Parquet functionalities, not touching the DataFusion-related code. Ideally, we should test the end-to-end Parquet reading process. The process rough

Re: [I] Add "end to end parquet reading test" for WASM [datafusion]

2025-03-22 Thread via GitHub
jsai28 commented on issue #15357: URL: https://github.com/apache/datafusion/issues/15357#issuecomment-2745315413 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Improve Spill Performance: `mmap` the spill files [datafusion]

2025-03-22 Thread via GitHub
alamb commented on issue #15321: URL: https://github.com/apache/datafusion/issues/15321#issuecomment-2745343112 BTW I am not sure the code is really in a great position to do this one yet -- it might help to wait for @2010YOUY01 (or help him) to pull some of the spilling code into its own

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-22 Thread via GitHub
Omega359 commented on PR #15341: URL: https://github.com/apache/datafusion/pull/15341#issuecomment-2745343678 > However, I am struggling to understand the implications of this change to a user. Like for example, if we were going to add a note about this in the upgrade / release notes, what

Re: [I] Support `merge` for `Distribution` [datafusion]

2025-03-22 Thread via GitHub
xudong963 closed issue #15290: Support `merge` for `Distribution` URL: https://github.com/apache/datafusion/issues/15290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-22 Thread via GitHub
xudong963 closed pull request #15296: feat: support merge for `Distribution` URL: https://github.com/apache/datafusion/pull/15296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074#issuecomment-2745345231 Let me know when it's ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: support merge for `Distribution` [datafusion]

2025-03-22 Thread via GitHub
xudong963 commented on PR #15296: URL: https://github.com/apache/datafusion/pull/15296#issuecomment-2745345742 Thanks for your suggestions!! @alamb @ozankabak @berkaysynnada and @kosiew I'll continue to do such work after the `Migrate to Distribution from Precision` work is done. I t

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074#issuecomment-2745345097 I do love this PR. I hadn't looked at it since it's in draft, but I fully endorse. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] DeltaLake integration not working (Python) (FFI Table providers not working) [datafusion-python]

2025-03-22 Thread via GitHub
riziles closed issue #1077: DeltaLake integration not working (Python) (FFI Table providers not working) URL: https://github.com/apache/datafusion-python/issues/1077 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-22 Thread via GitHub
westhide commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2743805128 > I believe whole job should be cancelled Yes, working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-22 Thread via GitHub
XiangpengHao commented on PR #61: URL: https://github.com/apache/datafusion-site/pull/61#issuecomment-2745401638 Thank you @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] build(deps): bump datafusion-substrait from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] closed pull request #1048: build(deps): bump datafusion-substrait from 45.0.0 to 46.0.0 URL: https://github.com/apache/datafusion-python/pull/1048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] build(deps): bump datafusion from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] commented on PR #1051: URL: https://github.com/apache/datafusion-python/pull/1051#issuecomment-2745403955 Looks like datafusion is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] build(deps): bump datafusion-proto from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] commented on PR #1049: URL: https://github.com/apache/datafusion-python/pull/1049#issuecomment-2745403935 Looks like datafusion-proto is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] build(deps): bump datafusion-ffi from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] closed pull request #1050: build(deps): bump datafusion-ffi from 45.0.0 to 46.0.0 URL: https://github.com/apache/datafusion-python/pull/1050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] build(deps): bump datafusion-proto from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] closed pull request #1049: build(deps): bump datafusion-proto from 45.0.0 to 46.0.0 URL: https://github.com/apache/datafusion-python/pull/1049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] build(deps): bump uuid from 1.13.1 to 1.16.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] commented on PR #1068: URL: https://github.com/apache/datafusion-python/pull/1068#issuecomment-2745403994 Looks like uuid is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] build(deps): bump datafusion-substrait from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] commented on PR #1048: URL: https://github.com/apache/datafusion-python/pull/1048#issuecomment-2745403927 Looks like datafusion-substrait is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] build(deps): bump arrow from 54.2.0 to 54.2.1 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] closed pull request #1038: build(deps): bump arrow from 54.2.0 to 54.2.1 URL: https://github.com/apache/datafusion-python/pull/1038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] feat: Update DataFusion dependency to 46 [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer merged PR #1079: URL: https://github.com/apache/datafusion-python/pull/1079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Parse `SUBSTR` as alias for `SUBSTRING` [datafusion-sqlparser-rs]

2025-03-22 Thread via GitHub
iffyio merged PR #1769: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] feat: Update DataFusion dependency to 46 [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer opened a new pull request, #1079: URL: https://github.com/apache/datafusion-python/pull/1079 # Which issue does this PR close? None # Rationale for this change In preparing for the next release, update the datafusion dependency. # What changes are includ

Re: [PR] Documentation: Plan custom expressions [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15353: URL: https://github.com/apache/datafusion/pull/15353#discussion_r2008792240 ## docs/source/library-user-guide/adding-udfs.md: ## @@ -1160,6 +1160,89 @@ async fn main() -> Result<()> { // +---+ ``` +## Custom Expression Planning + +DataFu

Re: [I] DeltaLake integration not working (Python) (FFI Table providers not working) [datafusion-python]

2025-03-22 Thread via GitHub
riziles commented on issue #1077: URL: https://github.com/apache/datafusion-python/issues/1077#issuecomment-2745324305 @timsaucer , I upgraded deltalake to 0.25.4 and datafusion to 45.2.0, and the example above now works fine. Thank you! -- This is an automated message from the Apache Gi

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2008804385 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2008804758 ## datafusion/optimizer/tests/optimizer_integration.rs: ## @@ -267,8 +267,8 @@ fn push_down_filter_groupby_expr_contains_alias() { let sql = "SELECT * FROM (SEL

Re: [PR] fix type coercion for uint/int's [datafusion]

2025-03-22 Thread via GitHub
Omega359 commented on code in PR #15341: URL: https://github.com/apache/datafusion/pull/15341#discussion_r2008808511 ## datafusion/optimizer/tests/optimizer_integration.rs: ## @@ -267,8 +267,8 @@ fn push_down_filter_groupby_expr_contains_alias() { let sql = "SELECT * FROM (

Re: [I] Extended tests failing on main [datafusion]

2025-03-22 Thread via GitHub
alamb commented on issue #15359: URL: https://github.com/apache/datafusion/issues/15359#issuecomment-2745343956 Instructions to update the expected results are in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/README.md#running-tests-sqlite -- This is an automated

[I] [DISCUSS] In-house implementation of certain dependency package. [datafusion]

2025-03-22 Thread via GitHub
logan-keede opened a new issue, #15360: URL: https://github.com/apache/datafusion/issues/15360 ### Is your feature request related to a problem or challenge? DataFusion has many dependencies that leads to increased binary size and compilation while saving us the trouble of maintaining

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-03-22 Thread via GitHub
deanm commented on PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074#issuecomment-2745349544 I'll mark it ready and just do a second PR for more methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Support Avg distinct [datafusion]

2025-03-22 Thread via GitHub
qazxcdswe123 commented on code in PR #15356: URL: https://github.com/apache/datafusion/pull/15356#discussion_r2008777574 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -6686,3 +6688,35 @@ SELECT a, median(b), arrow_typeof(median(b)) FROM group_median_all_nulls GROUP

[I] Extended tests failing on main [datafusion]

2025-03-22 Thread via GitHub
alamb opened a new issue, #15359: URL: https://github.com/apache/datafusion/issues/15359 ### Describe the bug example https://github.com/apache/datafusion/actions/runs/14005503215/job/39218865563 ``` External error: query is expected to fail with error: (regex)

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-22 Thread via GitHub
alamb commented on PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#issuecomment-2745343279 Thank you all for keeping this train (of PRs) rolling. Very impressive I must say -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074#issuecomment-2745373049 At a high level I like some of the aspects of a namespace. It would especially be nice to clean up our documentation using them in `functions`. I haven't thought through what

Re: [I] [DISCUSS] In-house implementation of certain dependencies. [datafusion]

2025-03-22 Thread via GitHub
ozankabak commented on issue #15360: URL: https://github.com/apache/datafusion/issues/15360#issuecomment-2745373075 If we can write a script to approximately estimate "how much" of an external crate we are using, we can create a starter list of candidates for this -- This is an automated

Re: [PR] build(deps): bump datafusion-ffi from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] commented on PR #1050: URL: https://github.com/apache/datafusion-python/pull/1050#issuecomment-2745404001 Looks like datafusion-ffi is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] build(deps): bump datafusion from 45.0.0 to 46.0.0 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] closed pull request #1051: build(deps): bump datafusion from 45.0.0 to 46.0.0 URL: https://github.com/apache/datafusion-python/pull/1051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] build(deps): bump arrow from 54.2.0 to 54.2.1 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] commented on PR #1038: URL: https://github.com/apache/datafusion-python/pull/1038#issuecomment-2745403943 Looks like arrow is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] simplify `array_has` UDF to `InList` expr when haystack is constant [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15354: URL: https://github.com/apache/datafusion/pull/15354#discussion_r2008786953 ## datafusion/functions-nested/src/array_has.rs: ## @@ -121,6 +123,43 @@ impl ScalarUDFImpl for ArrayHas { Ok(DataType::Boolean) } +fn simplify(

[PR] Support Avg distinct [datafusion]

2025-03-22 Thread via GitHub
qazxcdswe123 opened a new pull request, #15356: URL: https://github.com/apache/datafusion/pull/15356 ## Which issue does this PR close? - Closes #2408 ## Rationale for this change Related: https://github.com/apache/datafusion/pull/15099 ## What changes are included

Re: [PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-03-22 Thread via GitHub
deanm commented on PR #1074: URL: https://github.com/apache/datafusion-python/pull/1074#issuecomment-2745350211 @timsaucer did you have any thoughts on the namespaces because going from this to that would wind up being a breaking change. -- This is an automated message from the Apach

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-22 Thread via GitHub
XiangpengHao commented on PR #61: URL: https://github.com/apache/datafusion-site/pull/61#issuecomment-2745351963 Thank you @alamb @comphead @Omega359 @kevinjqliu I believe all comments have been addressed! -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-22 Thread via GitHub
kevinjqliu commented on code in PR #61: URL: https://github.com/apache/datafusion-site/pull/61#discussion_r2008823867 ## content/blog/2025-03-21-parquet-pushdown.md: ## @@ -0,0 +1,312 @@ +--- +layout: post +title: Efficient Filter Pushdown in Parquet +date: 2025-03-21 +author: X

[PR] build(deps): bump mimalloc from 0.1.43 to 0.1.44 [datafusion-python]

2025-03-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1080: URL: https://github.com/apache/datafusion-python/pull/1080 Bumps [mimalloc](https://github.com/purpleprotocol/mimalloc_rust) from 0.1.43 to 0.1.44. Release notes Sourced from https://github.com/purpleprotocol/mimalloc_rust/releases"

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-22 Thread via GitHub
westhide commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2745126578 > ah no, sorry for misunderstanding please do not cancel this PR. what I meant, in case of this type of error, ballista job should be cancelled. Hello @milenkovicm, Ca

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-22 Thread via GitHub
Shreyaskr1409 commented on PR #15313: URL: https://github.com/apache/datafusion/pull/15313#issuecomment-2745028266 @alamb @blaginin I have resolved all the conversations. Please look into it. Also my bad for few unnecessary commits here and there, I am still getting acquainted with the wo

[I] Ballista client keep blocking when prepare_task_definition or prepare_multi_task_definition fail [datafusion-ballista]

2025-03-22 Thread via GitHub
westhide opened a new issue, #1214: URL: https://github.com/apache/datafusion-ballista/issues/1214 **Describe the bug** A clear and concise description of what the bug is. Ballista client keep blocking when prepare_task_definition or prepare_multi_task_definition fail **To Repro

Re: [I] [DISCUSS] Switch to `tree` explain by default [datafusion]

2025-03-22 Thread via GitHub
alamb commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2745139051 I wonder if we could just change the default format for `datafusion-cli` (it is a config setting) 🤔 Downstream projects could also then "opt-in" if they wanted nicer default expl

Re: [PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-22 Thread via GitHub
zhuqi-lucas commented on PR #15348: URL: https://github.com/apache/datafusion/pull/15348#issuecomment-2745186535 Updated the result for short string sort which will benefit a lot from StringView type, add Q 11 for sort: ```rust -const SORT_QUERIES: [&'static str; 10] = [ +

Re: [PR] Improved error for expand wildcard rule [datafusion]

2025-03-22 Thread via GitHub
ozankabak commented on PR #15287: URL: https://github.com/apache/datafusion/pull/15287#issuecomment-2745187508 This PR seems to have broken `main`. @alamb -- Extended tests break very frequently these days. We should prioritize completing the work on running them before merge. -- T

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-22 Thread via GitHub
alamb commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2008719590 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,134 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-20 +author:

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-22 Thread via GitHub
alamb commented on PR #63: URL: https://github.com/apache/datafusion-site/pull/63#issuecomment-2745149502 Released site: https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0/ 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] DeltaLake integration not working (Python) (FFI Table providers not working) [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on issue #1077: URL: https://github.com/apache/datafusion-python/issues/1077#issuecomment-2745248393 Which version of datafusion-python are you using? If you have 45.2.0 can you please try downgrading to 44.0.0? If that solves it, I know what the problem is. -- This i

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2008766096 ## src/dataframe.rs: ## @@ -111,56 +116,151 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult { -let df = se

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2008764218 ## src/dataframe.rs: ## @@ -771,3 +871,82 @@ fn record_batch_into_schema( RecordBatch::try_new(schema, data_arrays) } + +/// This is a helper funct

[I] Improve html table rendering formatting [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer opened a new issue, #1078: URL: https://github.com/apache/datafusion-python/issues/1078 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This is a follow on to https://github.com/apache/datafusion-python/pull/1036

[PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-22 Thread via GitHub
2010YOUY01 opened a new pull request, #15355: URL: https://github.com/apache/datafusion/pull/15355 ## Which issue does this PR close? Related to https://github.com/apache/datafusion/pull/14975 ## Rationale for this change ### What's the inefficiency Let's w

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2008764621 ## src/dataframe.rs: ## @@ -70,6 +72,9 @@ impl PyTableProvider { PyTable::new(table_provider) } } +const MAX_TABLE_BYTES_TO_DISPLAY: usize =

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer commented on code in PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036#discussion_r2008766157 ## src/dataframe.rs: ## @@ -111,56 +116,151 @@ impl PyDataFrame { } fn __repr__(&self, py: Python) -> PyDataFusionResult { -let df = se

Re: [PR] Enforce JOIN plan to require condition [datafusion]

2025-03-22 Thread via GitHub
goldmedal commented on code in PR #15334: URL: https://github.com/apache/datafusion/pull/15334#discussion_r2008714660 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -65,7 +65,7 @@ logical_plan 07)Projection: customer.c_phone, customer.c_acctbal

Re: [I] [DISCUSS] Switch to `tree` explain by default [datafusion]

2025-03-22 Thread via GitHub
ozankabak commented on issue #15343: URL: https://github.com/apache/datafusion/issues/15343#issuecomment-2745139974 > I wonder if we could just change the default format for `datafusion-cli` (it is a config setting) 🤔 Downstream projects could also then "opt-in" if they wanted nicer default

Re: [I] `avg(distinct)` support [datafusion]

2025-03-22 Thread via GitHub
qazxcdswe123 commented on issue #2408: URL: https://github.com/apache/datafusion/issues/2408#issuecomment-2745155964 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Added tests with are writing into parquet files in memory for issue #… [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15325: URL: https://github.com/apache/datafusion/pull/15325#discussion_r2008783615 ## datafusion/wasmtest/src/lib.rs: ## @@ -182,4 +182,29 @@ mod test { let task_ctx = ctx.task_ctx(); let _ = collect(physical_plan, task_ctx).await

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-22 Thread via GitHub
alamb commented on PR #15313: URL: https://github.com/apache/datafusion/pull/15313#issuecomment-2745294370 > @alamb @blaginin I have resolved all the conversations. Please look into it. Also my bad for few unnecessary commits here and there, I am still getting acquainted with the workflow.

Re: [PR] Added tests with are writing into parquet files in memory for issue #… [datafusion]

2025-03-22 Thread via GitHub
alamb commented on PR #15325: URL: https://github.com/apache/datafusion/pull/15325#issuecomment-2745294019 Thanks again @pranavJibhakate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Improve collection during repr and repr_html [datafusion-python]

2025-03-22 Thread via GitHub
timsaucer merged PR #1036: URL: https://github.com/apache/datafusion-python/pull/1036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Added tests with are writing into parquet files in memory for issue #… [datafusion]

2025-03-22 Thread via GitHub
alamb merged PR #15325: URL: https://github.com/apache/datafusion/pull/15325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-22 Thread via GitHub
alamb commented on PR #15313: URL: https://github.com/apache/datafusion/pull/15313#issuecomment-2745294680 i took a look at the most recent commits and it looks good to me. Thanks again @Shreyaskr1409 ! -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-22 Thread via GitHub
alamb merged PR #15313: URL: https://github.com/apache/datafusion/pull/15313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Perf: Support Utf8View datatype single column comparisons for SortPre… [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15348: URL: https://github.com/apache/datafusion/pull/15348#discussion_r2008785345 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -281,6 +281,33 @@ impl CursorArray for GenericByteArray { } } +impl CursorArray for StringViewArray {

[I] limit max disk usage for spilling queries [datafusion]

2025-03-22 Thread via GitHub
alamb opened a new issue, #15358: URL: https://github.com/apache/datafusion/issues/15358 ### Is your feature request related to a problem or challenge? Breaking rationale from https://github.com/apache/datafusion/pull/14975#issue-2890626662 into its own ticket: For memory-

Re: [I] Add test coverage for wasm32 + parquet build [datafusion]

2025-03-22 Thread via GitHub
alamb closed issue #15158: Add test coverage for wasm32 + parquet build URL: https://github.com/apache/datafusion/issues/15158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-22 Thread via GitHub
alamb commented on PR #15355: URL: https://github.com/apache/datafusion/pull/15355#issuecomment-2745303386 - I filed https://github.com/apache/datafusion/issues/15358 to track the feature request, and linked this ticket -- This is an automated message from the Apache Git Service. To respo

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-03-22 Thread via GitHub
alamb commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740983293 BTW combined with @adriangb's PR here - https://github.com/apache/datafusion/pull/15301 It will likely go crazy fast 🚀 -- This is an automated message from the Apache

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-22 Thread via GitHub
alamb commented on code in PR #15355: URL: https://github.com/apache/datafusion/pull/15355#discussion_r2008788363 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -230,9 +219,14 @@ struct ExternalSorter { /// if `Self::in_mem_batches` are sorted in_mem_batches_sort

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-22 Thread via GitHub
alamb commented on PR #15355: URL: https://github.com/apache/datafusion/pull/15355#issuecomment-2745309065 > There are two extra operators that can be changed to this new interface (Aggregate and SortMergeJoin), they're planned to be included in this PR. I plan to do it after getting some

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-22 Thread via GitHub
omar commented on code in PR #61: URL: https://github.com/apache/datafusion-site/pull/61#discussion_r2008990238 ## content/blog/2025-03-21-parquet-pushdown.md: ## @@ -0,0 +1,312 @@ +--- +layout: post +title: Efficient Filter Pushdown in Parquet +date: 2025-03-21 +author: Xiangpe

Re: [PR] fix: Redundant files spilled during external sort + introduce `SpillManager` [datafusion]

2025-03-22 Thread via GitHub
2010YOUY01 commented on code in PR #15355: URL: https://github.com/apache/datafusion/pull/15355#discussion_r2009004733 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -230,9 +219,14 @@ struct ExternalSorter { /// if `Self::in_mem_batches` are sorted in_mem_batches

Re: [PR] Fix array_has_all and array_has_any with empty array [datafusion]

2025-03-22 Thread via GitHub
LuQQiu commented on PR #15039: URL: https://github.com/apache/datafusion/pull/15039#issuecomment-2741622696 @alamb @Weijun-H Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Fix parquet pruning blog post hyperlink [datafusion-site]

2025-03-22 Thread via GitHub
alamb merged PR #62: URL: https://github.com/apache/datafusion-site/pull/62 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-22 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2745948813 Nice! Fwiw another edge case I found recently that's probably worth testing is a List where the Struct evolves. I ended up solving it by updating list_coersion but curious if yo

[PR] Format `Date32` to string given timestamp specifiers [datafusion]

2025-03-22 Thread via GitHub
friendlymatthew opened a new pull request, #15361: URL: https://github.com/apache/datafusion/pull/15361 Closes https://github.com/apache/datafusion/issues/14536 ## Rationale for this change Datafusion currently errs when attempting to format a date using time-related specifiers

Re: [PR] Parse Postgres's LOCK TABLE statement [datafusion-sqlparser-rs]

2025-03-22 Thread via GitHub
github-actions[bot] commented on PR #1614: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1614#issuecomment-2745969727 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or