Re: [PR] fix: add missing columns into list directly [datafusion]

2025-01-25 Thread via GitHub
lichuang commented on PR #14180: URL: https://github.com/apache/datafusion/pull/14180#issuecomment-2613832659 > > @jonahgao in [#10234 (comment)](https://github.com/apache/datafusion/pull/10234#issuecomment-2087760241) comment: > > > I think that we should handle ORDER BY similarly to HA

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-25 Thread via GitHub
adriangb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2613936275 I think this is great Andrew. For what it's worth if this were packaged up in some installable way (even if it had to be from git, etc.) I'm sure we'd be super happy to can our cust

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2613937768 > We could handle such nulls handling in `ScalarFunctionExpr::evaluate` Most SQL functions are "pure" in the sense that if any of their inputs are null they produce output

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613937019 > u64+i64 combination I mean the comparison_op(u64,i64), coalesce(u64,i64) and union(u64,i64) that use `binary_numeric_coercion` and casted to decimal128. mathematics operat

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-25 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2613936434 > But once such logic is written somewhere, there is no reason for it not to be part of datafusion project, for the benefit of all consumers. I think such logic should belong to d

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2613940212 Maybe we need yet another trait implementation ```rust trait ScalarUDFImpl { fn handle_nulls(&self, args: ScalarFunctionArgs) -> Result> { // most of the cas

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-25 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929529897 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -2133,4 +2133,77 @@ mod test { assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), pla

Re: [PR] Extract useful methods from sqllogictest bin [datafusion]

2025-01-25 Thread via GitHub
xudong963 commented on PR #14267: URL: https://github.com/apache/datafusion/pull/14267#issuecomment-2613941841 > Looks good to me -- thanks @xudong963 thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: implement `invoke_with_args` for `struct` and `named_struct` [datafusion]

2025-01-25 Thread via GitHub
pepijnve commented on code in PR #14276: URL: https://github.com/apache/datafusion/pull/14276#discussion_r1929519836 ## datafusion/functions/src/core/named_struct.rs: ## @@ -203,12 +137,19 @@ impl ScalarUDFImpl for NamedStructFunc { } -fn invoke_batch(

Re: [PR] Support within group syntax for existing aggregate functions [datafusion]

2025-01-25 Thread via GitHub
Garamda commented on code in PR #13511: URL: https://github.com/apache/datafusion/pull/13511#discussion_r1929539094 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -77,36 +77,38 @@ SELECT approx_distinct(c9) count_c9, approx_distinct(cast(c9 as varchar)) count_ #

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-01-25 Thread via GitHub
ozankabak commented on PR #14273: URL: https://github.com/apache/datafusion/pull/14273#issuecomment-2613842458 This change makes sense to me. However, I *fully* agree with @alamb on avoiding being trigger happy on partial changes to coercion behavior. Let's follow the 4-step process laid ou

Re: [PR] Feature: Monotonic Sets [datafusion]

2025-01-25 Thread via GitHub
2010YOUY01 commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1929505898 ## datafusion/expr/src/udaf.rs: ## @@ -635,6 +655,14 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { fn documentation(&self) -> Option<&Documentation>

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613874994 For the scalar case, like `SELECT a > -1`, consider any `SELECT a > b` where b is constant. I think we could optimize it since we know the value. If the scalar is negative, it can

Re: [I] Alternative approaches to "fan-out" style RepartitionExec [datafusion]

2025-01-25 Thread via GitHub
alamb commented on issue #14287: URL: https://github.com/apache/datafusion/issues/14287#issuecomment-2613926246 BTW my suggestion for a first step would be to get some example query / test case that shows where the current algorithm doesn't work very well. Then we can evaluate potential sol

Re: [I] Alternative approaches to "fan-out" style RepartitionExec [datafusion]

2025-01-25 Thread via GitHub
alamb commented on issue #14287: URL: https://github.com/apache/datafusion/issues/14287#issuecomment-2613926046 Thanks @westonpace for filing this -- I agree there is likely some improvements in this area that would be beneficial I believe @crepererum spent quite a bit of time on th

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613928742 > Another question I would like to know is whether the u64+i64 combination is common in DataFusion? And whether we can avoid this at all. I guess u64 that is larger than i64::max is un

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613928105 BTW I filed a ticket to track resolving this thread (so that it does't get lost before we release 45.0.0): - https://github.com/apache/datafusion/issues/14291 -- This is an automa

[I] Potential performance regression with comparisions to scalar values [datafusion]

2025-01-25 Thread via GitHub
alamb opened a new issue, #14291: URL: https://github.com/apache/datafusion/issues/14291 ### Describe the bug There is concern changes in this PR will cause regression (as it may convert some numbers to Decimal128 rather than more efficient Integer) - https://github.com/apache/dat

[PR] Revert "fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers (#14223)" [datafusion]

2025-01-25 Thread via GitHub
alamb opened a new pull request, #14292: URL: https://github.com/apache/datafusion/pull/14292 Draft PR to test the performance implications of #14223. ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/14291 ## Rationale fo

Re: [PR] Support specific `GroupsAccumulator` for `median` [datafusion]

2025-01-25 Thread via GitHub
Rachelint commented on PR #13681: URL: https://github.com/apache/datafusion/pull/13681#issuecomment-2613929079 I think this pr is ready now Q6 in h2o: - `result in main` ``` Q6: SELECT id4, id5, MEDIAN(v3) AS median_v3, STDDEV(v3) AS sd_v3 FROM x GROUP BY id4, id5; Que

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
Omega359 commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929541738 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
Omega359 commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929540823 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
Omega359 commented on PR #14282: URL: https://github.com/apache/datafusion/pull/14282#issuecomment-2613964912 Thanks for your contribution! I've left some comments for your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] [Regression] Panic when handling Decimal128 overflow [datafusion]

2025-01-25 Thread via GitHub
waynexia closed issue #14124: [Regression] Panic when handling Decimal128 overflow URL: https://github.com/apache/datafusion/issues/14124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: add support for Decimal128 and Decimal256 types in interval arithmetic [datafusion]

2025-01-25 Thread via GitHub
waynexia commented on PR #14126: URL: https://github.com/apache/datafusion/pull/14126#issuecomment-2613965595 Thank you @comphead and @alamb for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
Omega359 commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929541028 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] fix: add support for Decimal128 and Decimal256 types in interval arithmetic [datafusion]

2025-01-25 Thread via GitHub
waynexia merged PR #14126: URL: https://github.com/apache/datafusion/pull/14126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
Omega359 commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929541426 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14270: URL: https://github.com/apache/datafusion/pull/14270#issuecomment-2613966867 > I think adding tests for comparison operations will probably expose the possible issue with the linked PR. (Or give us a peace of mind) It is a good idea -- I added tests showi

Re: [I] Build time regression [datafusion]

2025-01-25 Thread via GitHub
waynexia commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2613967071 Looking forward to the ongoing refactors! >Also, can you make sure there is no bias in the measurement? If you build in reverse order and run cargo clean between each ste

Re: [PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-25 Thread via GitHub
alamb commented on code in PR #14270: URL: https://github.com/apache/datafusion/pull/14270#discussion_r1929542389 ## datafusion/sqllogictest/test_files/operator.slt: ## @@ -110,5 +226,139 @@ from numeric_types; Int8 Int16 Int32 Int64 UInt8 UInt16 UInt32 UInt64 Float32 Flo

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2613968183 I ran benchmarks with / without this change and did not see any noticable performance difference. See details here - https://github.com/apache/datafusion/pull/14292 I also cre

Re: [I] Potential performance regression with comparisions to scalar values [datafusion]

2025-01-25 Thread via GitHub
alamb commented on issue #14291: URL: https://github.com/apache/datafusion/issues/14291#issuecomment-2613968303 I ran benchmarks with / without this change and did not see any noticable performance difference. See details here - https://github.com/apache/datafusion/pull/14292 I als

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2613969732 > I think this is great Andrew. For what it's worth if this were packaged up in some installable way (even if it had to be from git, etc.) I'm sure we'd be super happy to can our custo

Re: [I] Bug: applying multiple times `EnforceDistribution` generates invalid plan [datafusion]

2025-01-25 Thread via GitHub
xudong963 commented on issue #14150: URL: https://github.com/apache/datafusion/issues/14150#issuecomment-2613843375 there is a fix #14207, looking forward to your feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat: implement `invoke_with_args` for `struct` and `named_struct` [datafusion]

2025-01-25 Thread via GitHub
pepijnve commented on code in PR #14276: URL: https://github.com/apache/datafusion/pull/14276#discussion_r1929519836 ## datafusion/functions/src/core/named_struct.rs: ## @@ -203,12 +137,19 @@ impl ScalarUDFImpl for NamedStructFunc { } -fn invoke_batch(

Re: [I] Question on: `visit_expressions_mut` for alias expr [datafusion-sqlparser-rs]

2025-01-25 Thread via GitHub
docteurklein commented on issue #1475: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1475#issuecomment-2613924635 here is an example to change ASC to DESC in some order by: ```rust struct VisitOrderBy { reorder: Vec, } impl VisitorMut for VisitOrde

Re: [I] Jan 18, 2025: This week(s) in DataFusion [datafusion]

2025-01-25 Thread via GitHub
alamb commented on issue #14179: URL: https://github.com/apache/datafusion/issues/14179#issuecomment-2613925385 If anyone else is interested in helping build times, @waynexia is starting to organize a project: - https://github.com/apache/datafusion/issues/14256 -- This is an automated

Re: [I] Build time regression [datafusion]

2025-01-25 Thread via GitHub
alamb commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2613925243 100% making build time better would be really appreciated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Sort out tests in `aggregate.slt` [datafusion]

2025-01-25 Thread via GitHub
logan-keede commented on issue #13723: URL: https://github.com/apache/datafusion/issues/13723#issuecomment-2613841926 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] feat: implement `invoke_with_args` for `struct` and `named_struct` [datafusion]

2025-01-25 Thread via GitHub
lichuang commented on code in PR #14276: URL: https://github.com/apache/datafusion/pull/14276#discussion_r1929516369 ## datafusion/functions/src/core/named_struct.rs: ## @@ -203,12 +137,19 @@ impl ScalarUDFImpl for NamedStructFunc { } -fn invoke_batch(

Re: [PR] feat: implement `invoke_with_args` for `struct` and `named_struct` [datafusion]

2025-01-25 Thread via GitHub
lichuang commented on code in PR #14276: URL: https://github.com/apache/datafusion/pull/14276#discussion_r1929516369 ## datafusion/functions/src/core/named_struct.rs: ## @@ -203,12 +137,19 @@ impl ScalarUDFImpl for NamedStructFunc { } -fn invoke_batch(

Re: [I] Querying Parquet file specifically with a predicate returns invalid data error but works in other situations [datafusion]

2025-01-25 Thread via GitHub
senyosimpson commented on issue #14281: URL: https://github.com/apache/datafusion/issues/14281#issuecomment-2613908545 Confirmed that the following works now. ```rust let mut parquet_options = TableParquetOptions::new(); parquet_options .set("enable_page_index", "false")

Re: [PR] Extract useful methods from sqllogictest bin [datafusion]

2025-01-25 Thread via GitHub
alamb merged PR #14267: URL: https://github.com/apache/datafusion/pull/14267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Deprecate max statistics size properly [datafusion]

2025-01-25 Thread via GitHub
alamb merged PR #14188: URL: https://github.com/apache/datafusion/pull/14188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] move projection pushdown optimization logic to ExecutionPlan trait [datafusion]

2025-01-25 Thread via GitHub
berkaysynnada commented on PR #14235: URL: https://github.com/apache/datafusion/pull/14235#issuecomment-2614058166 I am merging this once the CI passes one more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
SKY-ALIN commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929596970 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
SKY-ALIN commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929596852 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

[PR] Move All Physical Optimizer Tests to core/tests and Remove functions-aggregate Dependency [datafusion]

2025-01-25 Thread via GitHub
berkaysynnada opened a new pull request, #14298: URL: https://github.com/apache/datafusion/pull/14298 ## Which issue does this PR close? Closes #14243. ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] build: re-enable upload-test-reports for macos-13 runner [datafusion-comet]

2025-01-25 Thread via GitHub
viirya commented on PR #1335: URL: https://github.com/apache/datafusion-comet/pull/1335#issuecomment-2614071748 Thanks @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] build: re-enable upload-test-reports for macos-13 runner [datafusion-comet]

2025-01-25 Thread via GitHub
viirya merged PR #1335: URL: https://github.com/apache/datafusion-comet/pull/1335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
SKY-ALIN commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929597131 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add regexp_extract func [datafusion]

2025-01-25 Thread via GitHub
SKY-ALIN commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1929596877 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] move projection pushdown optimization logic to ExecutionPlan trait [datafusion]

2025-01-25 Thread via GitHub
berkaysynnada merged PR #14235: URL: https://github.com/apache/datafusion/pull/14235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-25 Thread via GitHub
timsaucer commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1929608071 ## datafusion/ffitest/src/async_provider.rs: ## @@ -0,0 +1,272 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] chore: improve tests for array expressions [datafusion-comet]

2025-01-25 Thread via GitHub
codecov-commenter commented on PR #1339: URL: https://github.com/apache/datafusion-comet/pull/1339#issuecomment-2614134541 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1339?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] generate_series UDTF only supports integers [datafusion]

2025-01-25 Thread via GitHub
gokselk commented on issue #14209: URL: https://github.com/apache/datafusion/issues/14209#issuecomment-2614003903 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Build time regression [datafusion]

2025-01-25 Thread via GitHub
waynexia commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2614019259 I located the biggest jump happens on https://github.com/apache/datafusion/pull/11681/, commit 3438b355. But I can't tell the reason. I go through it and it's just a normal log

Re: [PR] move projection pushdown optimization logic to ExecutionPlan trait [datafusion]

2025-01-25 Thread via GitHub
berkaysynnada commented on code in PR #14235: URL: https://github.com/apache/datafusion/pull/14235#discussion_r1929564102 ## datafusion/sqllogictest/test_files/explain.slt: ## @@ -43,10 +43,11 @@ logical_plan 02)--Filter: aggregate_test_100.c2 > Int8(10) 03)TableScan: aggr

Re: [PR] Fix regression in CASE expression [datafusion]

2025-01-25 Thread via GitHub
andygrove commented on PR #14283: URL: https://github.com/apache/datafusion/pull/14283#issuecomment-2614024359 > How do you handle types mismatch issue? Does Comet has another type handling logic to find the correct types for datafusion physical plan? We map Spark types to Arrow types

[PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-01-25 Thread via GitHub
adriangb opened a new pull request, #14295: URL: https://github.com/apache/datafusion/pull/14295 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] move projection pushdown optimization logic to ExecutionPlan trait [datafusion]

2025-01-25 Thread via GitHub
buraksenn commented on code in PR #14235: URL: https://github.com/apache/datafusion/pull/14235#discussion_r1929572633 ## datafusion/sqllogictest/test_files/explain.slt: ## @@ -43,10 +43,11 @@ logical_plan 02)--Filter: aggregate_test_100.c2 > Int8(10) 03)TableScan: aggregat

Re: [I] RFC: Should we remove pyarrow feature from datafusion core [datafusion]

2025-01-25 Thread via GitHub
timsaucer commented on issue #14197: URL: https://github.com/apache/datafusion/issues/14197#issuecomment-2614025064 Great, once I have `datafusion-python` updated, I'll put up this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-01-25 Thread via GitHub
adriangb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2614026191 I want to point out that this works because of how the Recordbatch is generated: https://github.com/apache/datafusion/blob/20544bcccd83e0de36e2944ad2b99615ad3bb41d/datafusion/physic

Re: [PR] Fix regression in CASE expression [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on PR #14283: URL: https://github.com/apache/datafusion/pull/14283#issuecomment-2614154584 The function `coerce_types` is used exclusively within function handling. For case expressions, `coerce_types` is not utilized. Instead, the function `get_coerce_type_for_case_exp

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2614158196 > So instead of SELECT array_slice(1.5, NULL, NULL) returning an error for an unsupported type in the first argument, it will return NULL This is because the signature for `

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1929640010 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -186,6 +186,56 @@ impl PhysicalExpr for ScalarFunctionExpr { .map(|e| e.evaluate(batch))

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-01-25 Thread via GitHub
berkaysynnada commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2613992380 This week I couldn't spare time to review this fix, sorry @xudong963. That will be one of my priorities in the next week. -- This is an automated message from the Apache Git

Re: [I] Deprecate `datafusion.execution.parquet.max_statistics_size` config option [datafusion]

2025-01-25 Thread via GitHub
alamb closed issue #14172: Deprecate `datafusion.execution.parquet.max_statistics_size` config option URL: https://github.com/apache/datafusion/issues/14172 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Deprecate max statistics size properly [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14188: URL: https://github.com/apache/datafusion/pull/14188#issuecomment-2613994473 Thanks again @logan-keede -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] BigQuery: Support trailing commas in column definitions list [datafusion-sqlparser-rs]

2025-01-25 Thread via GitHub
alamb merged PR #1682: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-25 Thread via GitHub
jonahgao commented on PR #14223: URL: https://github.com/apache/datafusion/pull/14223#issuecomment-2614016503 `UInt64 > -1` is less common than `UInt64 > 1`. In my opinion, ensuring that comparisons between unsigned and signed columns are always available is more important. -- This is a

Re: [I] Alternative approaches to "fan-out" style RepartitionExec [datafusion]

2025-01-25 Thread via GitHub
berkaysynnada commented on issue #14287: URL: https://github.com/apache/datafusion/issues/14287#issuecomment-2614018742 We have designed a poll-based repartition mechanism that polls its input whenever any of the output partitions are polled. This approach deviates from the round-robin patt

Re: [PR] add tests to check precision loss fix [datafusion]

2025-01-25 Thread via GitHub
himadripal commented on PR #14284: URL: https://github.com/apache/datafusion/pull/14284#issuecomment-2614028861 #13492 fix for this is in arrow-rs and this is a test to confirm the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[I] User Defined Coercion Rules [datafusion]

2025-01-25 Thread via GitHub
alamb opened a new issue, #14296: URL: https://github.com/apache/datafusion/issues/14296 ### Is your feature request related to a problem or challenge? Coercion is (TODO find definition) At the moment DataFusion has one set of built in coercion rules. However, with a single set

Re: [PR] make AnalysisContext aware of empty sets to represent certainly false bounds [datafusion]

2025-01-25 Thread via GitHub
buraksenn commented on code in PR #14279: URL: https://github.com/apache/datafusion/pull/14279#discussion_r1929594804 ## datafusion/physical-expr/src/analysis.rs: ## @@ -344,6 +366,41 @@ mod tests { } } +#[test] +fn test_analyze_empty_set_boundary_exprs()

Re: [PR] make AnalysisContext aware of empty sets to represent certainly false bounds [datafusion]

2025-01-25 Thread via GitHub
buraksenn commented on code in PR #14279: URL: https://github.com/apache/datafusion/pull/14279#discussion_r1929594954 ## datafusion/physical-expr/src/analysis.rs: ## @@ -179,7 +179,17 @@ pub fn analyze( expr.as_any() .downcast_ref::()

Re: [PR] make AnalysisContext aware of empty sets to represent certainly false bounds [datafusion]

2025-01-25 Thread via GitHub
buraksenn commented on code in PR #14279: URL: https://github.com/apache/datafusion/pull/14279#discussion_r1929595006 ## datafusion/physical-expr/src/analysis.rs: ## @@ -235,16 +256,25 @@ fn shrink_boundaries( fn calculate_selectivity( target_boundaries: &[ExprBoundaries],

Re: [PR] fix: add missing columns into list directly [datafusion]

2025-01-25 Thread via GitHub
jonahgao commented on PR #14180: URL: https://github.com/apache/datafusion/pull/14180#issuecomment-2613997693 > @jonahgao `select_to_plan` only works with SQL API, but sometimes people use `DataFrame` API directly, where `test_distinct_sort_by_unprojected` is this case, so only check in `se

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-25 Thread via GitHub
jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2614042678 I made a couple of changes to this PR in the second commit. Previously, I was getting an optimizer error under certain scenarios, for example, ``` > select array_slice([1,2,3],

Re: [I] Consider using upstream arrow-avro reader [datafusion]

2025-01-25 Thread via GitHub
getChan commented on issue #14097: URL: https://github.com/apache/datafusion/issues/14097#issuecomment-2614046188 I'm waiting for the arrow-avro PR below as it might include changes to the public API. https://github.com/apache/arrow-rs/pull/6965 I will resume related work once this PR

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-25 Thread via GitHub
ion-elgreco commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2614047765 @alamb ah good point, I missed that! Definitely good to add, will have a better look at where these payload streams are collected -- This is an automated message from the Apach

Re: [I] Document how to use rust UDF extensions of datafusion-python [datafusion-python]

2025-01-25 Thread via GitHub
timsaucer closed issue #792: Document how to use rust UDF extensions of datafusion-python URL: https://github.com/apache/datafusion-python/issues/792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Add nulls checks to generated pruning predicates [datafusion]

2025-01-25 Thread via GitHub
adriangb opened a new pull request, #14297: URL: https://github.com/apache/datafusion/pull/14297 Currently pruning predicates may return `NULL` to indicate "this container should be included", thus using `NULL` as a *truthy* value. That is quite confusing, as explained in the various commen

Re: [PR] Add nulls checks to generated pruning predicates [datafusion]

2025-01-25 Thread via GitHub
adriangb commented on code in PR #14297: URL: https://github.com/apache/datafusion/pull/14297#discussion_r1929591473 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -819,16 +813,24 @@ impl RequiredColumns { /// statistics column, while keeping track that a reference

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-25 Thread via GitHub
timsaucer commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1929605183 ## datafusion/ffi/src/lib.rs: ## @@ -26,5 +26,14 @@ pub mod session_config; pub mod table_provider; pub mod table_source; +/// Returns the major version of t

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-25 Thread via GitHub
timsaucer commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1929608323 ## datafusion/ffitest/src/async_provider.rs: ## @@ -0,0 +1,272 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] refactor aggregate [datafusion]

2025-01-25 Thread via GitHub
logan-keede commented on PR #14301: URL: https://github.com/apache/datafusion/pull/14301#issuecomment-2614108205 cc @Rachelint @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[PR] bench: add array_agg benchmark [datafusion]

2025-01-25 Thread via GitHub
rluvaton opened a new pull request, #14302: URL: https://github.com/apache/datafusion/pull/14302 ## Which issue does this PR close? N/A ## Rationale for this change So we can see the improvement in #14299 ## What changes are included in this PR? added benchm

[PR] refactor aggregate [datafusion]

2025-01-25 Thread via GitHub
logan-keede opened a new pull request, #14301: URL: https://github.com/apache/datafusion/pull/14301 ## Which issue does this PR close? Closes #13723 ## Rationale for this change Better Readability and Navigation. ## What changes are included in this PR

Re: [PR] Perform hashing in CollectLeft HashJoin in parallel [datafusion]

2025-01-25 Thread via GitHub
ctsk commented on PR #14234: URL: https://github.com/apache/datafusion/pull/14234#issuecomment-2614112816 I plan to test this again with a larger TPCH scale factor, and compare collectLeft (parallel hashing) vs collectLeft (main branch) vs repartition joins - On SF=1, collectLeft alrea

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-25 Thread via GitHub
jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2614165389 > This is because the signature for extract doesn't handle type checking correctly, it uses variadic_any, not because of the introduced new trait method Oh, great then I don't

Re: [PR] Feat: Add support for `array_min`, `array_max`, `sort_array`, `array_zip` & `array_union` [datafusion-comet]

2025-01-25 Thread via GitHub
dharanad commented on PR #1227: URL: https://github.com/apache/datafusion-comet/pull/1227#issuecomment-2614220161 @andygrove Sure i will break them into multiple PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Update ClickBench benchmarks with DataFusion `45.0.0` (When Published) [datafusion]

2025-01-25 Thread via GitHub
Rachelint commented on issue #14246: URL: https://github.com/apache/datafusion/issues/14246#issuecomment-2614196240 #13681 is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-25 Thread via GitHub
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1929652858 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4247,8 +4247,10 @@ logical_plan physical_plan 01)CoalesceBatchesExec: target_batch_size=3, fetch=2 0

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-25 Thread via GitHub
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1929652858 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4247,8 +4247,10 @@ logical_plan physical_plan 01)CoalesceBatchesExec: target_batch_size=3, fetch=2 0

Re: [I] User Defined Coercion Rules [datafusion]

2025-01-25 Thread via GitHub
jayzhan211 commented on issue #14296: URL: https://github.com/apache/datafusion/issues/14296#issuecomment-2614252405 We have type coercion in logical plan now, consider the case where we want to separate logical types and physical types, should we add another type coercion layer in physical

[PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-01-25 Thread via GitHub
adriangb opened a new pull request, #14294: URL: https://github.com/apache/datafusion/pull/14294 Closes #13836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-01-25 Thread via GitHub
adriangb closed pull request #14294: use a single row_count column during predicate pruning instead of one per column URL: https://github.com/apache/datafusion/pull/14294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-25 Thread via GitHub
alamb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2614037782 > Based on the previous discussions, and draft PRs, I ended up with this Object store wrapper to spawn the io tasks in a different handle: https://github.com/delta-io/delta-rs/blob/mai

  1   2   >