Re: [PR] chore: remove partition_keys from (Bounded)WindowAggExec [datafusion]

2025-02-06 Thread via GitHub
berkaysynnada commented on PR #14526: URL: https://github.com/apache/datafusion/pull/14526#issuecomment-2642173706 Thank you @irenjj. I'd like to take this ASAP. Can you ping me when it is ready. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
Lordworms commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1946094731 ## datafusion/common/src/column.rs: ## @@ -299,6 +301,23 @@ impl Column { .flat_map(|s| s.columns()) .collect(), }) +

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-02-06 Thread via GitHub
berkaysynnada commented on PR #14523: URL: https://github.com/apache/datafusion/pull/14523#issuecomment-2642167448 Thank you @ch-sc for working on this. When you need a review, I can do that if you ping me. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
wForget commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1946083252 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,160 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Always use `StringViewArray` as output of `substr` [datafusion]

2025-02-06 Thread via GitHub
Kev1n8 commented on PR #14498: URL: https://github.com/apache/datafusion/pull/14498#issuecomment-2642144319 > Could you resolve the conflict? Sure. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
eliaperantoni commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1946069829 ## datafusion/common/src/column.rs: ## @@ -299,6 +301,23 @@ impl Column { .flat_map(|s| s.columns()) .collect(), }

Re: [I] External sorting not working for (maybe only for string columns??) [datafusion]

2025-02-06 Thread via GitHub
xuchen-plus commented on issue #12136: URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2642135559 > Not sure why the sorted batches' memory is over 2.6x than the batches before sort. Some findings so far: 1. My test was reading a parquet file with mostly string

Re: [PR] Minor: deprecate unused index mod [datafusion]

2025-02-06 Thread via GitHub
zhuqi-lucas commented on code in PR #14534: URL: https://github.com/apache/datafusion/pull/14534#discussion_r1946069187 ## datafusion/physical-plan/src/sorts/mod.rs: ## @@ -19,12 +19,9 @@ mod builder; mod cursor; -mod index; mod merge; pub mod partial_sort; pub mod sort;

Re: [PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-06 Thread via GitHub
wForget commented on code in PR #1379: URL: https://github.com/apache/datafusion-comet/pull/1379#discussion_r1946064632 ## spark/src/main/scala/org/apache/spark/Plugins.scala: ## @@ -90,13 +96,19 @@ class CometDriverPlugin extends DriverPlugin with Logging with ShimCometDriverP

[PR] feat: override executor overhead memory only when comet unified memory manager is disabled [datafusion-comet]

2025-02-06 Thread via GitHub
wForget opened a new pull request, #1379: URL: https://github.com/apache/datafusion-comet/pull/1379 ## Which issue does this PR close? Closes #1378. ## Rationale for this change ## What changes are included in this PR? ## How are these chang

Re: [PR] Minor: remove unused index mod [datafusion]

2025-02-06 Thread via GitHub
zhuqi-lucas commented on code in PR #14534: URL: https://github.com/apache/datafusion/pull/14534#discussion_r1946040045 ## datafusion/physical-plan/src/sorts/mod.rs: ## @@ -19,12 +19,9 @@ mod builder; mod cursor; -mod index; mod merge; pub mod partial_sort; pub mod sort;

Re: [PR] Minor: remove unused index mod [datafusion]

2025-02-06 Thread via GitHub
zhuqi-lucas commented on code in PR #14534: URL: https://github.com/apache/datafusion/pull/14534#discussion_r1946033818 ## datafusion/physical-plan/src/sorts/mod.rs: ## @@ -19,12 +19,9 @@ mod builder; mod cursor; -mod index; mod merge; pub mod partial_sort; pub mod sort;

[PR] polish MemoryStream related code [datafusion]

2025-02-06 Thread via GitHub
zjregee opened a new pull request, #14537: URL: https://github.com/apache/datafusion/pull/14537 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/14502. ## Rationale for this change ## What changes are included in this PR? Polish MemoryStr

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2642015078 @2010YOUY01 -- fantastic, added the project idea to the list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Minor: remove unused index mod [datafusion]

2025-02-06 Thread via GitHub
2010YOUY01 commented on code in PR #14534: URL: https://github.com/apache/datafusion/pull/14534#discussion_r1945987503 ## datafusion/physical-plan/src/sorts/mod.rs: ## @@ -19,12 +19,9 @@ mod builder; mod cursor; -mod index; mod merge; pub mod partial_sort; pub mod sort;

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2641969506 > See this, [#13819 (comment)](https://github.com/apache/datafusion/issues/13819#issuecomment-2552554818). We only convert an inner fixed-size list to a regular list when the functio

Re: [PR] Always use `StringViewArray` as output of `substr` [datafusion]

2025-02-06 Thread via GitHub
2010YOUY01 commented on PR #14498: URL: https://github.com/apache/datafusion/pull/14498#issuecomment-2641979566 Thank you, it looks good to me. Could you resolve the conflict? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2641979167 > So it sounds like we need to include in the array signature whether or not the function might change the size of the list and use that information. yes -- This is an au

[I] Question: `to_char(date, timstamp format)` [datafusion]

2025-02-06 Thread via GitHub
xudong963 opened a new issue, #14536: URL: https://github.com/apache/datafusion/issues/14536 It seems that we don't support ``` > select to_char('2023-09-04'::date, '%Y-%m-%dT%H:%M:%S%.3f'); Execution error: Cast error: Format error ``` I want to ensure if this is an unsupport

Re: [I] Project Ideas for GSoC 2025 (Google Summer of Code) [datafusion]

2025-02-06 Thread via GitHub
2010YOUY01 commented on issue #14478: URL: https://github.com/apache/datafusion/issues/14478#issuecomment-2641935740 > > Besides, there are many well-defined tasks in our SQL fuzzer [#11030](https://github.com/apache/datafusion/issues/11030) and I'm interested to mentor. I'll open an issue

Re: [I] Implement nested join optimization [datafusion]

2025-02-06 Thread via GitHub
clflushopt commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2641934903 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[I] Rewrite `datafusion-sqlancer` in Rust [datafusion]

2025-02-06 Thread via GitHub
2010YOUY01 opened a new issue, #14535: URL: https://github.com/apache/datafusion/issues/14535 ### Is your feature request related to a problem or challenge? This a project idea for GSoC 2025 https://github.com/apache/datafusion/issues/14478 `datafusion-sqlancer` is a SQL level

Re: [I] Implement nested join optimization [datafusion]

2025-02-06 Thread via GitHub
clflushopt commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2641928732 Hi, I've been doing some reading on the side and I am interested into taking a stab at this if the issue is still open and no one is working on it. -- This is an automated me

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
Weijun-H commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2641916193 I updated the latest benchmark results. It seems the `OnDemandRepartition` improved performance on `clickbench_partitioned` and large datasets like `tpch_sf50`. For `tpch_sf1` and `

[PR] Minor: remove unused index mod [datafusion]

2025-02-06 Thread via GitHub
zhuqi-lucas opened a new pull request, #14534: URL: https://github.com/apache/datafusion/pull/14534 ## Which issue does this PR close? This is a minor change for remove unused file and mod. ## Are there any user-facing changes? no -- This is an automated messag

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1945920988 ## datafusion/expr-common/src/signature.rs: ## @@ -455,6 +461,46 @@ fn get_data_types(native_type: &NativeType) -> Vec { } } +#[derive(Debug, Clone)] +

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-02-06 Thread via GitHub
Garamda commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2641894768 Thank you @Omega359 for your comment. `./dev/update_function_docs.sh` changes the order of function parameters in docs. This might differ from what users would naturally expect

Re: [I] Attach `Diagnostic` to syntax errors [datafusion]

2025-02-06 Thread via GitHub
irenjj commented on issue #14437: URL: https://github.com/apache/datafusion/issues/14437#issuecomment-2641868892 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Attach `Diagnostic` to "more than one column in subquery" error [datafusion]

2025-02-06 Thread via GitHub
irenjj commented on issue #14438: URL: https://github.com/apache/datafusion/issues/14438#issuecomment-2641868741 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Support Limit pushdown for `MemoryExec` [datafusion]

2025-02-06 Thread via GitHub
zjregee commented on PR #14502: URL: https://github.com/apache/datafusion/pull/14502#issuecomment-2641852047 It seems that the suggestions mentioned here are still feasible. I will use a follow on PR to add this. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-06 Thread via GitHub
wForget commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1945908388 ## native/core/src/parquet/parquet_support.rs: ## @@ -1861,6 +1863,42 @@ fn trim_end(s: &str) -> &str { } } +// Default object store which is local fil

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1945901967 ## datafusion/expr-common/src/signature.rs: ## @@ -286,6 +261,34 @@ impl Display for ArrayFunctionSignature { } } +#[derive(Debug, Clone, PartialEq, Eq, Pa

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on code in PR #14532: URL: https://github.com/apache/datafusion/pull/14532#discussion_r1945901504 ## datafusion/expr-common/src/signature.rs: ## @@ -227,25 +226,11 @@ impl Display for TypeSignatureClass { #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Has

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2641836978 See this, https://github.com/apache/datafusion/issues/13819#issuecomment-2552554818. We only convert an inner fixed-size list to a regular list when the function performs a mut

Re: [PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on PR #14532: URL: https://github.com/apache/datafusion/pull/14532#issuecomment-2641819030 This is failing CI for the following reason. Previously, in `get_valid_types()` functions with the array signature `ArrayAndIndexes` and `Array` would convert top level `Fixed

Re: [PR] feat: metadata columns [datafusion]

2025-02-06 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2641808013 > > would you mind to add some tests about stopping system column propagation? I haven't seen them on your branch? > > have you seen these? > > https://github.com/ap

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1945884758 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -125,6 +125,26 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove commented on code in PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#discussion_r1945884758 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -125,6 +125,26 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-06 Thread via GitHub
codecov-commenter commented on PR #1377: URL: https://github.com/apache/datafusion-comet/pull/1377#issuecomment-2641802060 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1377?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-06 Thread via GitHub
codecov-commenter commented on PR #1376: URL: https://github.com/apache/datafusion-comet/pull/1376#issuecomment-2641780247 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1376?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: metadata columns [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2641766867 > would you mind to add some tests about stopping system column propagation? I haven't seen them on your branch? have you seen these? https://github.com/apache/datafusion/blo

Re: [PR] feat: metadata columns [datafusion]

2025-02-06 Thread via GitHub
chenkovsky commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2641751804 > @chenkovsky the main difference between the two approaches is how to transmit the information on which columns are system columns and which aren't. The approach in this PR does

Re: [PR] POC: Eliminate unnecessary group by keys (q35 in clickbench 1.35x faster) [datafusion]

2025-02-06 Thread via GitHub
github-actions[bot] closed pull request #13617: POC: Eliminate unnecessary group by keys (q35 in clickbench 1.35x faster) URL: https://github.com/apache/datafusion/pull/13617 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1945857598 ## datafusion/functions/src/string/ascii.rs: ## @@ -61,7 +63,13 @@ impl Default for AsciiFunc { impl AsciiFunc { pub fn new() -> Self { Self { -

[PR] chore: Adding an optional `hdfs` crate [datafusion-comet]

2025-02-06 Thread via GitHub
comphead opened a new pull request, #1377: URL: https://github.com/apache/datafusion-comet/pull/1377 ## Which issue does this PR close? Closes #1368 . ## Rationale for this change ## What changes are included in this PR? ## How are these cha

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1945854733 ## datafusion/functions/src/string/repeat.rs: ## @@ -65,10 +65,17 @@ impl Default for RepeatFunc { impl RepeatFunc { pub fn new() -> Self { Self

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14440: URL: https://github.com/apache/datafusion/pull/14440#discussion_r1945854931 ## datafusion/functions/src/string/repeat.rs: ## @@ -65,10 +65,17 @@ impl Default for RepeatFunc { impl RepeatFunc { pub fn new() -> Self { Self

Re: [PR] Enable Dataframe to be converted into views which can be used in register_table [datafusion-python]

2025-02-06 Thread via GitHub
kosiew commented on code in PR #1016: URL: https://github.com/apache/datafusion-python/pull/1016#discussion_r1945827785 ## python/tests/test_view.py: ## @@ -0,0 +1,34 @@ +from datafusion import SessionContext, col, literal + + +def test_register_filtered_dataframe(): +ctx =

Re: [PR] fix: Mark cast from float/double to decimal as incompatible [datafusion-comet]

2025-02-06 Thread via GitHub
codecov-commenter commented on PR #1372: URL: https://github.com/apache/datafusion-comet/pull/1372#issuecomment-2641640511 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1372?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Apply take_function_args [datafusion]

2025-02-06 Thread via GitHub
lgingerich commented on PR #14525: URL: https://github.com/apache/datafusion/pull/14525#issuecomment-2641605649 > I think you would have to move it somwehere like `datafusion_common` Sounds good, I should get to that this weekend and then I'll apply `take_function_args()` in other cra

Re: [PR] fix: Mark cast from float/double to decimal as incompatible [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove commented on code in PR #1372: URL: https://github.com/apache/datafusion-comet/pull/1372#discussion_r1945726731 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -867,10 +885,11 @@ class CometAggregateSuite extends CometTestBase with Adapt

Re: [PR] bug: Fix NULL handling in array_slice, introduce `NullHandling` enum to `Signature` [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2641475104 @jkosh44 It sounds like Nulls can be handled by signature at all? Sounds great -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
jayzhan211 commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1945670750 ## datafusion/core/src/physical_planner.rs: ## @@ -689,7 +693,7 @@ impl DefaultPhysicalPlanner { if physical_field.data_type() != logi

Re: [PR] feat: [wip] experimental fuzz testing in test suite [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove closed pull request #1374: feat: [wip] experimental fuzz testing in test suite URL: https://github.com/apache/datafusion-comet/pull/1374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] fix: disable checking for uint_8 and uint_16 if complex type readers are enabled [datafusion-comet]

2025-02-06 Thread via GitHub
parthchandra opened a new pull request, #1376: URL: https://github.com/apache/datafusion-comet/pull/1376 ## Which issue does this PR close? Partly addresses test failures caused by https://github.com/apache/datafusion-comet/issues/1348 ## Rationale for this change As the

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
Lordworms commented on PR #14521: URL: https://github.com/apache/datafusion/pull/14521#issuecomment-2641333721 > threshold of `0.5` is reasonable. Maybe a table with multiple columns that are increasingly further apart by eyeball (`timesamp`, `timeamp`, `timp`, `ts`, `tokens`, `amp`, `foo`?

[I] `array_has` UDF performance is slow for smaller number of needles [datafusion]

2025-02-06 Thread via GitHub
cetra3 opened a new issue, #14533: URL: https://github.com/apache/datafusion/issues/14533 ### Describe the bug When using `array_has` the performance is quite slow when there is a single needle or smaller needle amount to check for. ### To Reproduce Here's an example:

Re: [PR] Replacing `SessionState` with `Session` and progress towards moving `FileFormatFactory` out of `datasource` [datafusion]

2025-02-06 Thread via GitHub
logan-keede commented on PR #14517: URL: https://github.com/apache/datafusion/pull/14517#issuecomment-2641260027 https://github.com/apache/datafusion/blame/bab0f54daa99830339ca19c1b4e3489278ad/datafusion/core/src/datasource/physical_plan/file_scan_config.rs#L605 this downcast intr

Re: [PR] feat: metadata columns [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14057: URL: https://github.com/apache/datafusion/pull/14057#issuecomment-2641172445 @chenkovsky the main difference between the two approaches is how to transmit the information on which columns are system columns and which aren't. The approach in this PR does it e

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945533851 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,13 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 00

Re: [PR] functions: Remove NullHandling from scalar funcs [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on PR #14531: URL: https://github.com/apache/datafusion/pull/14531#issuecomment-2641147504 I also have a PoC for how we'd make the array signatures more expressive and more easily fix the null input errors: https://github.com/apache/datafusion/pull/14532 -- This is an a

Re: [I] Proper NULL handling in array functions [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on issue #14451: URL: https://github.com/apache/datafusion/issues/14451#issuecomment-264182 > @alan910127 I have a very rough proposal, that might help you with this issue OK I put this together here: https://github.com/apache/datafusion/pull/14532, if you're in

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14227: URL: https://github.com/apache/datafusion/pull/14227#issuecomment-2641110054 > > @andygrove are we happy that dict_id is no longer needed in DataFusion? > > Yes, I think so. We have proven that we no longer need it in Comet, at least. Thanks for the ping.

[PR] function: Allow more expressive array signatures [datafusion]

2025-02-06 Thread via GitHub
jkosh44 opened a new pull request, #14532: URL: https://github.com/apache/datafusion/pull/14532 This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of

[PR] fix: rest api `/api/executors` does not show executors if `TaskSchedulingPolicy::PullStaged` [datafusion-ballista]

2025-02-06 Thread via GitHub
milenkovicm opened a new pull request, #1175: URL: https://github.com/apache/datafusion-ballista/pull/1175 # Which issue does this PR close? Closes #1174. # Rationale for this change Rest api reports correct number of registered executors in case of `TaskSchedulingPolic

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14521: URL: https://github.com/apache/datafusion/pull/14521#issuecomment-2641075978 > It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in https://github.com/apache/datafusion/pull/13664 while we

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
comphead commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1945427486 ## datafusion/core/src/schema_equivalence.rs: ## @@ -0,0 +1,84 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2641004858 @alamb conflicts resolved and your test was added and fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Relax physical schema validation [datafusion]

2025-02-06 Thread via GitHub
comphead commented on code in PR #14519: URL: https://github.com/apache/datafusion/pull/14519#discussion_r1945425102 ## datafusion/core/src/physical_planner.rs: ## @@ -689,7 +693,7 @@ impl DefaultPhysicalPlanner { if physical_field.data_type() != logica

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
berkaysynnada commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640985434 > Maybe I am missing something, but the benchmark numbers reported above don't really show much of an improvement this might be a silly question but, did you set the conf

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
Lordworms commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945417517 ## datafusion/sqllogictest/test_files/identifiers.slt: ## @@ -90,16 +90,16 @@ drop table case_insensitive_test statement ok CREATE TABLE test("Column1" string

Re: [I] Test DataFusion 45.0.0 with Sail [datafusion]

2025-02-06 Thread via GitHub
findepi commented on issue #14408: URL: https://github.com/apache/datafusion/issues/14408#issuecomment-2640948296 > [@findepi](https://github.com/findepi) do you mean we should relax the check to ignore nullable / non nullable annotations? -- I think that would probably be ok too. ye

Re: [PR] functions: Remove NullHandling from scalar funcs [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on PR #14531: URL: https://github.com/apache/datafusion/pull/14531#issuecomment-2640948421 @jayzhan211 I'm curious if you think this is a good idea or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
Lordworms commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945390440 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,13 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 0

Re: [PR] Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2640909974 > This is a terrific idea -- we will be happy to write up a blog post that will serve a dual purpose: (1) help people with upgrading, and (2) brag about this epic PR 🙂 Y

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
comphead commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945378334 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,13 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 00

Re: [PR] Improve error messages to include the function name. [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14511: URL: https://github.com/apache/datafusion/pull/14511#issuecomment-2640907113 Along with this PR from @Lordworms DataFusion error messages are getting downright friendly! - https://github.com/apache/datafusion/pull/14521 -- This is an automated message fro

Re: [PR] Apply take_function_args [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14525: URL: https://github.com/apache/datafusion/pull/14525#issuecomment-2640906182 > @findepi Is there a good way to use `take_function_args()` outside of the `functions` crate? I think you would have to move it somwehere like `datafusion_common` --

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945327347 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945349575 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] functions: Remove NullHandling from scalar funcs [datafusion]

2025-02-06 Thread via GitHub
jkosh44 commented on PR #14531: URL: https://github.com/apache/datafusion/pull/14531#issuecomment-2640875858 Sorry for the spam of PRs related to this issue. It turns out, IMO, that the fix to the null input issue was improving the function signature and the `NullHandling` enum did not help

Re: [PR] use a single row_count column during predicate pruning instead of one per column [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14295: URL: https://github.com/apache/datafusion/pull/14295#issuecomment-2640874379 I was able to work around the issue pretty easily by keeping the first row count we find 😄: https://github.com/apache/datafusion/pull/14295/commits/b9a5ccb57a62abcac84ffa88ae6ea59b6

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945297144 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945347391 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945330070 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] fix: Mark cast from float/double to decimal as incompatible [datafusion-comet]

2025-02-06 Thread via GitHub
andygrove commented on PR #1372: URL: https://github.com/apache/datafusion-comet/pull/1372#issuecomment-2640840371 I don't understand the following test failure: ``` 2025-02-06T17:32:38.2264489Z - final decimal avg *** FAILED *** (17 milliseconds) 2025-02-06T17:32:38.2265038Z

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
ozankabak commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640838240 @Weijun-H did [some benchmarks](https://github.com/synnada-ai/datafusion-upstream/pull/60) a while back and the approach seemed promising in TPCH/SF50. @mertak-synnada will

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
adriangb commented on PR #14521: URL: https://github.com/apache/datafusion/pull/14521#issuecomment-2640835368 Amazing work! It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in https://github.com/apache/datafusi

Re: [PR] Add nulls checks to generated pruning predicates [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14297: URL: https://github.com/apache/datafusion/pull/14297#issuecomment-2640833744 Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look -- This is an automated message from the A

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945324008 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] Replacing `SessionState` with `Session` and progress towards moving `FileFormatFactory` out of `datasource` [datafusion]

2025-02-06 Thread via GitHub
logan-keede commented on PR #14517: URL: https://github.com/apache/datafusion/pull/14517#issuecomment-2640830389 > Unfortunately, I think this OR had major conflicts with > > * [Introduce unified `DataSourceExec` for provided datasources, remove `ParquetExec`, `CsvExec`, etc  #14224](

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945320310 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-06 Thread via GitHub
alamb commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2640829028 > This is still in somewhat early stages, and there is work to do. But it might be good to get feedback early on from the community as the performance of this code is somewhat sensitiv

Re: [PR] feat: add hint for missing fields [datafusion]

2025-02-06 Thread via GitHub
alamb commented on code in PR #14521: URL: https://github.com/apache/datafusion/pull/14521#discussion_r1945317697 ## datafusion/sqllogictest/test_files/errors.slt: ## @@ -161,3 +161,13 @@ create table records (timestamp timestamp, value float) as values ( '2021-01-01 00:00

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945318951 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] bug: Remove array_slice two arg variant [datafusion]

2025-02-06 Thread via GitHub
alamb merged PR #14527: URL: https://github.com/apache/datafusion/pull/14527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
kazuyukitanimura commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945315824 ## docs/source/user-guide/configs.md: ## @@ -48,7 +48,7 @@ Comet provides the following configuration settings. | spark.comet.exec.hashJoin.enabled |

Re: [PR] bug: Remove array_slice two arg variant [datafusion]

2025-02-06 Thread via GitHub
alamb commented on code in PR #14527: URL: https://github.com/apache/datafusion/pull/14527#discussion_r1945314693 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -1850,15 +1850,11 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 0, [] []

Re: [PR] feat: Add fair unified memory pool [datafusion-comet]

2025-02-06 Thread via GitHub
comphead commented on code in PR #1369: URL: https://github.com/apache/datafusion-comet/pull/1369#discussion_r1945309452 ## native/core/src/execution/fair_memory_pool.rs: ## @@ -0,0 +1,159 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Remove use of deprecated dict_id in datafusion-proto (#14173) [datafusion]

2025-02-06 Thread via GitHub
andygrove commented on PR #14227: URL: https://github.com/apache/datafusion/pull/14227#issuecomment-2640814818 > @andygrove are we happy that dict_id is no longer needed in DataFusion? Yes, I think so. We have proven that we no longer need it in Comet, at least. Thanks for the ping.

  1   2   3   >