Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-06 Thread via GitHub
findepi commented on code in PR #14470: URL: https://github.com/apache/datafusion/pull/14470#discussion_r1944280367 ## datafusion/functions-nested/src/max.rs: ## @@ -0,0 +1,152 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [I] DataSink::write_all given invalid RecordBatchStream [datafusion]

2025-02-06 Thread via GitHub
jonahgao commented on issue #14394: URL: https://github.com/apache/datafusion/issues/14394#issuecomment-2639143085 I am concerned whether union will also lead to similar behavior when the nullability of the two inputs is different. -- This is an automated message from the Apache Git Serv

Re: [I] Build time regression [datafusion]

2025-02-06 Thread via GitHub
waynexia commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2639241508 Thanks for those updates! I was trying to replace type parameters (`F` for `Fn`) inside `TreeNode` with `Box` this holiday. Let's see if it works. (it should reduce some

Re: [PR] Add suppport for Show Objects statement for the Snowflake parser [datafusion-sqlparser-rs]

2025-02-06 Thread via GitHub
DanCodedThis commented on code in PR #1702: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1702#discussion_r1944541733 ## src/dialect/snowflake.rs: ## @@ -182,6 +183,15 @@ impl Dialect for SnowflakeDialect { return Some(parse_file_staging_command(kw, p

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
mertak-synnada commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2639555749 @alamb The FileType enum was forgotten, I deleted it completely and returned just a &str for file_type() Changed the ParquetSource initialization, now `new()` is only ac

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
mertak-synnada commented on code in PR #14411: URL: https://github.com/apache/datafusion/pull/14411#discussion_r1944579206 ## datafusion/physical-plan/src/repartition/on_demand_repartition.rs: ## @@ -0,0 +1,1320 @@ +// Licensed to the Apache Software Foundation (ASF) under one +

[PR] Support bounds evaluation for temporal data types [datafusion]

2025-02-06 Thread via GitHub
ch-sc opened a new pull request, #14523: URL: https://github.com/apache/datafusion/pull/14523 ## Which issue does this PR close? As discussed in [14237](https://github.com/apache/datafusion/issues/14237) temporal data types should be supported in bounds evaluation. - Cl

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2639597492 > @alamb The FileType enum was forgotten, I deleted it completely and returned just a &str for file_type() Great - custom source creators will just implement some `MyCoolForm

Re: [I] Querying Parquet file specifically with a predicate returns invalid data error but works in other situations [datafusion]

2025-02-06 Thread via GitHub
alamb closed issue #14281: Querying Parquet file specifically with a predicate returns invalid data error but works in other situations URL: https://github.com/apache/datafusion/issues/14281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2639603508 > @alamb, can you take a final look to verify all is good? Thanks. Doing now -- I am figuring out the pattern for migration where we have code that looks for `ParquetExec` in the

Re: [PR] Add suppport for Show Objects statement for the Snowflake parser [datafusion-sqlparser-rs]

2025-02-06 Thread via GitHub
DanCodedThis commented on code in PR #1702: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1702#discussion_r1944408128 ## src/dialect/snowflake.rs: ## @@ -182,6 +183,15 @@ impl Dialect for SnowflakeDialect { return Some(parse_file_staging_command(kw, p

Re: [I] Create UNION plan node with correct schema [datafusion]

2025-02-06 Thread via GitHub
findepi commented on issue #14380: URL: https://github.com/apache/datafusion/issues/14380#issuecomment-2639110011 You have good point about the wildcards > is there benefit to producing a plan that has a correct schema but may have non-coerced input plans? i don't think there

Re: [I] Proper NULL handling in array functions [datafusion]

2025-02-06 Thread via GitHub
alan910127 commented on issue #14451: URL: https://github.com/apache/datafusion/issues/14451#issuecomment-2639208217 Hello @jkosh44, what do you think about this? `array_resize` should return `NULL` only if the second argument (new length) is `NULL`. If the third argument (new eleme

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
findepi commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1944355108 ## datafusion/core/src/schema_equivalence.rs: ## @@ -0,0 +1,84 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
findepi commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1944353348 ## datafusion/core/src/schema_equivalence.rs: ## @@ -0,0 +1,84 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
findepi commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1944351361 ## datafusion/core/src/schema_equivalence.rs: ## @@ -0,0 +1,84 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] refactor: switch BooleanBufferBuilder to NullBufferBuilder in MaybeNullBufferBuilder [datafusion]

2025-02-06 Thread via GitHub
Chen-Yuan-Lai commented on PR #14504: URL: https://github.com/apache/datafusion/pull/14504#issuecomment-2639231104 @alamb I found that `b.capacity()` (`BooleanBufferBuilder`) return bits, so `allocated_size()` (`NullBufferBuilder`) should return bits, not bytes https://docs.rs/arrow-buff

Re: [I] `array_slice` can't correctly handle NULL parameters or some edge cases [datafusion]

2025-02-06 Thread via GitHub
jonahgao closed issue #10548: `array_slice` can't correctly handle NULL parameters or some edge cases URL: https://github.com/apache/datafusion/issues/10548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] bug: Fix edge cases in array_slice [datafusion]

2025-02-06 Thread via GitHub
jonahgao merged PR #14489: URL: https://github.com/apache/datafusion/pull/14489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2025-02-06 Thread via GitHub
tobixdev commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2639093257 > I think getting `Scalar` from `DataType` or `ArrayRef` makes sense, so the restriction of this doesn't seem sound to me. What I meant is that we should not be able to d

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2025-02-06 Thread via GitHub
jayzhan-synnada commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2639133561 Having a true "logical" Scalar sounds like a good idea, since either the current `ScalarValue` or `Scalar` are actually still tightly coupled with `DataType`, if we can

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2639134313 Having a true "logical" Scalar sounds like a good idea, since either the current `ScalarValue` or `Scalar` are actually still tightly coupled with `DataType`, if we can bring

Re: [I] Column for primary key not found in schema if constraint column in uppercase [datafusion]

2025-02-06 Thread via GitHub
jonahgao commented on issue #14340: URL: https://github.com/apache/datafusion/issues/14340#issuecomment-2639136909 I think we should also apply normalization to constraint columns. Using `to_lowercase` may not work when normalization is disabled. -- This is an automated message from the A

[I] Nullable doesn't work when create memory table [datafusion]

2025-02-06 Thread via GitHub
xudong963 opened a new issue, #14522: URL: https://github.com/apache/datafusion/issues/14522 ### Describe the bug Nullable doesn't work when create memory table. ### To Reproduce ``` DataFusion CLI v44.0.0 > CREATE or replace TABLE table_with_pk ( sn IN

Re: [I] Support registering views [datafusion-python]

2025-02-06 Thread via GitHub
kosiew commented on issue #1004: URL: https://github.com/apache/datafusion-python/issues/1004#issuecomment-2639373525 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] View [datafusion-python]

2025-02-06 Thread via GitHub
kosiew opened a new pull request, #1016: URL: https://github.com/apache/datafusion-python/pull/1016 # Which issue does this PR close? Closes #1004. # Rationale for this change Currently, Datafusion supports views via ViewTable, allowing logical plans to be r

Re: [PR] Implement snowflake ALTER SESSION [datafusion-sqlparser-rs]

2025-02-06 Thread via GitHub
osipovartem closed pull request #1711: Implement snowflake ALTER SESSION URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Implement SnowFlake ALTER SESSION [datafusion-sqlparser-rs]

2025-02-06 Thread via GitHub
osipovartem opened a new pull request, #1712: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1712 Closes https://github.com/apache/datafusion-sqlparser-rs/issues/1710 https://docs.snowflake.com/en/sql-reference/sql/alter-session -- This is an automated message from the

[PR] Implement snowflake ALTER SESSION [datafusion-sqlparser-rs]

2025-02-06 Thread via GitHub
osipovartem opened a new pull request, #1711: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1711 Closes https://github.com/apache/datafusion-sqlparser-rs/issues/1710 https://docs.snowflake.com/en/sql-reference/sql/alter-session -- This is an automated message from

Re: [PR] Feat: Add fetch to CoalescePartitionsExec [datafusion]

2025-02-06 Thread via GitHub
berkaysynnada merged PR #14499: URL: https://github.com/apache/datafusion/pull/14499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add CoalescePartitionsExec fetch (limit) support [datafusion]

2025-02-06 Thread via GitHub
berkaysynnada closed issue #14446: Add CoalescePartitionsExec fetch (limit) support URL: https://github.com/apache/datafusion/issues/14446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]

2025-02-06 Thread via GitHub
Omega359 commented on code in PR #14367: URL: https://github.com/apache/datafusion/pull/14367#discussion_r1945223855 ## datafusion/functions/src/hash/xxhash.rs: ## @@ -0,0 +1,476 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]

2025-02-06 Thread via GitHub
Spaarsh commented on code in PR #14367: URL: https://github.com/apache/datafusion/pull/14367#discussion_r1945244130 ## datafusion/functions/src/hash/xxhash.rs: ## @@ -0,0 +1,476 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-06 Thread via GitHub
amladik commented on code in PR #14470: URL: https://github.com/apache/datafusion/pull/14470#discussion_r1945270542 ## datafusion/functions-nested/src/max.rs: ## @@ -0,0 +1,173 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] feat: Add `array_max` function support [datafusion]

2025-02-06 Thread via GitHub
amladik commented on code in PR #14470: URL: https://github.com/apache/datafusion/pull/14470#discussion_r1945270542 ## datafusion/functions-nested/src/max.rs: ## @@ -0,0 +1,173 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] refactor: switch BooleanBufferBuilder to NullBufferBuilder in MaybeNullBufferBuilder [datafusion]

2025-02-06 Thread via GitHub
alamb commented on code in PR #14504: URL: https://github.com/apache/datafusion/pull/14504#discussion_r1945294378 ## datafusion/physical-plan/src/aggregates/group_values/null_builder.rs: ## @@ -15,120 +15,78 @@ // specific language governing permissions and limitations // unde

Re: [PR] refactor: switch BooleanBufferBuilder to NullBufferBuilder in MaybeNullBufferBuilder [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14504: URL: https://github.com/apache/datafusion/pull/14504#issuecomment-2640789173 I also merged this PR up to main to rerun CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945298335 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2640796514 @adriangb -- this PR now seems to have some conflicts that need to be resolved prior to merge. Marking it as a draft as we sort them out -- This is an automated message from the Ap

Re: [PR] 14044/enhancement/add xxhash algorithms in expression api [datafusion]

2025-02-06 Thread via GitHub
Spaarsh commented on code in PR #14367: URL: https://github.com/apache/datafusion/pull/14367#discussion_r1945245299 ## datafusion/functions/src/hash/xxhash.rs: ## @@ -0,0 +1,476 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [I] Add example to spark-expr crate [datafusion-comet]

2025-02-06 Thread via GitHub
viczsaurav commented on issue #1365: URL: https://github.com/apache/datafusion-comet/issues/1365#issuecomment-2640645653 Can we use the following approach? Where would this example reside? 1. Initialize DataFusion Context 2. Define Custom Expression - Implement a Scalar Fun

Re: [PR] feat: [wip] experimental fuzz testing in test suite [datafusion-comet]

2025-02-06 Thread via GitHub
codecov-commenter commented on PR #1374: URL: https://github.com/apache/datafusion-comet/pull/1374#issuecomment-2640781808 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1374?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945292475 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945294642 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945347391 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945297144 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] functions: Remove NullHandling from scalar funcs [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on PR #14531: URL: https://github.com/apache/datafusion/pull/14531#issuecomment-2640875858 Sorry for the spam of PRs related to this issue. It turns out, IMO, that the fix to the null input issue was improving the function signature and the `NullHandling` enum did not help

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2640874379 I was able to work around the issue pretty easily by keeping the first row count we find 😄: https://github.com/apache/datafusion/pull/14295/commits/b9a5ccb57a62abcac84ffa88ae6ea59b6

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
berkaysynnada commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640985434 > Maybe I am missing something, but the benchmark numbers reported above don't really show much of an improvement this might be a silly question but, did you set the conf

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
Lordworms commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945417517 ## datafusion/sqllogictest/test_files/identifiers.slt: ## @@ -90,16 +90,16 @@ drop table case_insensitive_test statement ok CREATE TABLE test("Column1" string

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
comphead commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1945425102 ## datafusion/core/src/physical_planner.rs: ## @@ -689,7 +693,7 @@ impl DefaultPhysicalPlanner { if physical_field.data_type() != logica

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2641004858 @alamb conflicts resolved and your test was added and fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
comphead commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1945427486 ## datafusion/core/src/schema_equivalence.rs: ## @@ -0,0 +1,84 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945327347 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945349575 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Apply take_function_args [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14525: URL: https://github.com/apache/datafusion/pull/14525#issuecomment-2640906182 > @findepi Is there a good way to use `take_function_args()` outside of the `functions` crate? I think you would have to move it somwehere like `datafusion_common` --

Re: [PR] Improve error messages to include the function name. [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14511: URL: https://github.com/apache/datafusion/pull/14511#issuecomment-2640907113 Along with this PR from @Lordworms DataFusion error messages are getting downright friendly! - https://github.com/apache/datafusion/pull/14521 -- This is an automated message fro

Re: [PR] bug: Remove array_slice two arg variant [datafusion]

2025-02-06 Thread via GitHub
alamb commented on code in PR #14527: URL: https://github.com/apache/datafusion/pull/14527#discussion_r1945314693 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -1850,15 +1850,11 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 0, [] []

Re: [PR] Replacing `SessionState` with `Session` and progress towards moving `FileFormatFactory` out of `datasource` [datafusion]

2025-02-06 Thread via GitHub
alamb commented on code in PR #14517: URL: https://github.com/apache/datafusion/pull/14517#discussion_r1945307778 ## datafusion/catalog-listing/Cargo.toml: ## @@ -28,6 +28,7 @@ rust-version.workspace = true version.workspace = true [dependencies] +apache-avro = { version = "

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-02-06 Thread via GitHub
andygrove commented on PR #14227: URL: https://github.com/apache/datafusion/pull/14227#issuecomment-2640814818 > @andygrove are we happy that dict_id is no longer needed in DataFusion? Yes, I think so. We have proven that we no longer need it in Comet, at least. Thanks for the ping.

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945315824 ## docs/source/user-guide/configs.md: ## @@ -48,7 +48,7 @@ Comet provides the following configuration settings. | spark.comet.exec.hashJoin.enabled |

Re: [PR] bug: Remove array_slice two arg variant [datafusion]

2025-02-06 Thread via GitHub
alamb merged PR #14527: URL: https://github.com/apache/datafusion/pull/14527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945320310 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640829028 > This is still in somewhat early stages, and there is work to do. But it might be good to get feedback early on from the community as the performance of this code is somewhat sensitiv

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
alamb commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945317697 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,13 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 00:00

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945324008 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] Add nulls checks to generated pruning predicates [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14297: URL: https://github.com/apache/datafusion/pull/14297#issuecomment-2640833744 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from the A

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945309452 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945318951 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] Replacing `SessionState` with `Session` and progress towards moving `FileFormatFactory` out of `datasource` [datafusion]

2025-02-06 Thread via GitHub
logan-keede commented on PR #14517: URL: https://github.com/apache/datafusion/pull/14517#issuecomment-2640830389 > Unfortunately, I think this OR had major conflicts with > > * [Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc  #14224](

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14521: URL: https://github.com/apache/datafusion/pull/14521#issuecomment-2640835368 Amazing work! It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in https://github.com/apache/datafusi

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640838240 @Weijun-H did [some benchmarks](https://github.com/synnada-ai/datafusion-upstream/pull/60) a while back and the approach seemed promising in TPCH/SF50. @mertak-synnada will

Re: [PR] fix: Mark cast from float/double to decimal as incompatible [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove commented on PR #1372: URL: https://github.com/apache/datafusion-comet/pull/1372#issuecomment-2640840371 I don't understand the following test failure: ``` 2025-02-06T17:32:38.2264489Z - final decimal avg *** FAILED *** (17 milliseconds) 2025-02-06T17:32:38.2265038Z

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945330070 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14521: URL: https://github.com/apache/datafusion/pull/14521#issuecomment-2641075978 > It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in https://github.com/apache/datafusion/pull/13664 while we

[PR] fix: rest api `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` [datafusion-ballista]

2025-02-06 Thread via GitHub
milenkovicm opened a new pull request, #1175: URL: https://github.com/apache/datafusion-ballista/pull/1175 # Which issue does this PR close? Closes #1174. # Rationale for this change Rest api reports correct number of registered executors in case of `TaskSchedulingPolic

[PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jkosh44 opened a new pull request, #14532: URL: https://github.com/apache/datafusion/pull/14532 This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14227: URL: https://github.com/apache/datafusion/pull/14227#issuecomment-2641110054 > > @andygrove are we happy that dict_id is no longer needed in DataFusion? > > Yes, I think so. We have proven that we no longer need it in Comet, at least. Thanks for the ping.

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944643137 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -584,23 +541,36 @@ fn get_valid_types( match target_type_class {

Re: [I] `UnwrapCastInComparison` produces incorrect results [datafusion]

2025-02-06 Thread via GitHub
Spaarsh commented on issue #14303: URL: https://github.com/apache/datafusion/issues/14303#issuecomment-2639831523 @alamb thanks! In order to test if that PR is casing this issue, I made a separate branch and reverted the commit (e0f9f65) that merged that change into main. The incorrect resu

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2639923034 > Otherwise things are going well -- I am making good progress Great! The change you suggested makes sense. Since it is a very small change, let's add to this PR directly and

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2639925962 Something else I noticed is the import paths for ParquetExec are gone (so I got a lot of compile errors about missing ParquetExec): ``` error[E0432]: unresolved import `dataf

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2639932918 > Great! The change you suggested makes sense. Since it is a very small change, we'll shortly add it to this PR and so migration smoother to others as well. To be clear, I think

Re: [PR] Script and documentation for regenerating sqlite test files [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14290: URL: https://github.com/apache/datafusion/pull/14290#issuecomment-2640529650 Thanks again @Omega359 -- let's get this one in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] stddev_pop not compatible with Spark in some cases [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove commented on issue #1375: URL: https://github.com/apache/datafusion-comet/issues/1375#issuecomment-2640555226 Actually, it looks like the code for checking answers allowing for tolerance may not be correct: The expected vs actual are: ``` 1.2417965031048596E38

Re: [PR] Fix config_namespace macro symbol usage [datafusion]

2025-02-06 Thread via GitHub
davisp commented on PR #14520: URL: https://github.com/apache/datafusion/pull/14520#issuecomment-2640559128 @alamb Thanks for the tip! I've moved the test to the macro_hygiene module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] wip: proto to physical plan conversion [datafusion]

2025-02-06 Thread via GitHub
jatin510 opened a new pull request, #14530: URL: https://github.com/apache/datafusion/pull/14530 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2640573974 > I got enough of our code to compile and tests running that i think this PR is ok to merge. Thank you @mertak-synnada @ozankabak and @berkaysynnada -- this is pretty epic G

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
ozankabak merged PR #14224: URL: https://github.com/apache/datafusion/pull/14224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] [DISCUSS] Single Source `ExecutionPlan` Across All `TableProviders` [datafusion]

2025-02-06 Thread via GitHub
ozankabak closed issue #13838: [DISCUSS] Single Source `ExecutionPlan` Across All `TableProviders` URL: https://github.com/apache/datafusion/issues/13838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Support Limit pushdown for `MemoryExec` [datafusion]

2025-02-06 Thread via GitHub
ozankabak closed pull request #14502: Support Limit pushdown for `MemoryExec` URL: https://github.com/apache/datafusion/pull/14502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Implement `fetch` limit for MemoryExec [datafusion]

2025-02-06 Thread via GitHub
ozankabak closed issue #14337: Implement `fetch` limit for MemoryExec URL: https://github.com/apache/datafusion/issues/14337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2640579066 I just merged the PR -- I will check back in half an hour to see if there are any problems -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] `UnwrapCastInComparison` produces incorrect results [datafusion]

2025-02-06 Thread via GitHub
Spaarsh commented on issue #14303: URL: https://github.com/apache/datafusion/issues/14303#issuecomment-2640583140 I think I have found out the main problem here. I added a few debugging statements to print the DataTypes as the Optimizer code is running, here is what I found: ``` > wit

Re: [PR] fix: order by expr rewrite fix [datafusion]

2025-02-06 Thread via GitHub
comphead merged PR #14486: URL: https://github.com/apache/datafusion/pull/14486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-06 Thread via GitHub
Omega359 commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2639739608 I looked over the code, it's clean and well written. I can't speak for the actual functionality here as it's not in my area of expertise. You need to regenerate the docs again thoug

Re: [I] [Epic] Split datasources out from `datafusion` crate (`datafusion/core`) [datafusion]

2025-02-06 Thread via GitHub
logan-keede commented on issue #1: URL: https://github.com/apache/datafusion/issues/1#issuecomment-2639738923 > [@alamb](https://github.com/alamb) Can you point me at whatever tool you used to generate those compile timing graphs? Those look like something I'd absolutely adopt in a

Re: [I] Test DataFusion 45.0.0 with Sail [datafusion]

2025-02-06 Thread via GitHub
alamb commented on issue #14408: URL: https://github.com/apache/datafusion/issues/14408#issuecomment-2639747760 > Not sure if this is the right file to be looking at, but it's where the error comes from (`Physical input schema should be the same as the one converted from logical input schem

Re: [PR] Draft: coercible signature [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1944760066 ## datafusion/expr-common/src/signature.rs: ## @@ -455,6 +461,46 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone)] +

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-06 Thread via GitHub
xudong963 commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2640010194 > ## Probably Not: Correlated Subqueries > For this project: > > * [[EPIC] More Subquery support  #5483](https://github.com/apache/datafusion/issues/5483) > >

  1   2   3   >