Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2025-01-08 Thread via GitHub
peter-toth commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2577048321 This is a very interresting issue. I was trying to repro the results of the above experiement, but got matching results with rayon and tokio. Maybe rayon is slightly faster

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-08 Thread via GitHub
dependabot[bot] commented on PR #13964: URL: https://github.com/apache/datafusion/pull/13964#issuecomment-2577104520 A newer version of petgraph exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1906682942 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + +///

Re: [PR] Fix error on `array_distinct` when input is empty #13810 [datafusion]

2025-01-08 Thread via GitHub
cht42 commented on PR #14034: URL: https://github.com/apache/datafusion/pull/14034#issuecomment-2577023771 added back the early check on empty arrays and added a test case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Custom scalar to sql overrides support for DuckDB Unparser dialect [datafusion]

2025-01-08 Thread via GitHub
sgrebnov commented on PR #13915: URL: https://github.com/apache/datafusion/pull/13915#issuecomment-2577037902 @goldmedal - thank you, updated implementation to define `with_custom_scalar_overrides` on `Dialect` trait level. Moving forward we will be able to add support/implementation for

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2577045685 How do you cancel the query? You mean terminating the next()'s on the stream, or dropping the stream. If it is the former, the issue might be related with the RepartitionE

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
nuno-faria commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1906820538 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -985,6 +985,77 @@ impl OptimizerRule for PushDownFilter { }

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906813655 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream {

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
nuno-faria commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1906812252 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -985,6 +985,77 @@ impl OptimizerRule for PushDownFilter { }

Re: [I] Move CPU Bound Tasks off Tokio Threadpool [datafusion]

2025-01-08 Thread via GitHub
tustvold commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2577190762 > Where I do see difference is --concurrency 8+: Once the concurrency exceeds the pool size starvation is inevitable, and isn't the issue people have been running into.

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-08 Thread via GitHub
crepererum commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1906929864 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream { r

[I] Optimize `date_part` Minute by avoiding unnecessary computation [datafusion]

2025-01-08 Thread via GitHub
jayzhan211 opened a new issue, #14043: URL: https://github.com/apache/datafusion/issues/14043 ### Is your feature request related to a problem or challenge? Open an issue to track the status of this optimization Related https://github.com/apache/datafusion/issues/13449 htt

Re: [I] Optimize `date_part` Minute by avoiding unnecessary computation [datafusion]

2025-01-08 Thread via GitHub
jayzhan211 commented on issue #14043: URL: https://github.com/apache/datafusion/issues/14043#issuecomment-2577513036 I think this is a good first issue for getting familiar with optimization and benchmarking code. -- This is an automated message from the Apache Git Service. To respond to

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2025-01-08 Thread via GitHub
wackywendell commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2578203665 After looking more closely, it looks like `datafusion-substrait` has a `protoc` feature: https://github.com/apache/datafusion/blob/7af6aa9e51fde2bbf7a12c4096e999041ef6f

Re: [PR] Refactor into `LexOrdering::collapse`, `LexRequirement::collapse` avoid clone [datafusion]

2025-01-08 Thread via GitHub
alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1907236855 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + +/// Construc

Re: [PR] Encapsulate fields of `EquivalenceGroup` [datafusion]

2025-01-08 Thread via GitHub
alamb merged PR #14039: URL: https://github.com/apache/datafusion/pull/14039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on code in PR #13996: URL: https://github.com/apache/datafusion/pull/13996#discussion_r1907326957 ## benchmarks/bench.sh: ## @@ -541,6 +589,125 @@ run_imdb() { $CARGO_COMMAND --bin imdb -- benchmark datafusion --iterations 5 --path "${IMDB_DIR}" --prefer_ha

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-08 Thread via GitHub
mbutrovich commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2577908417 > > > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition optio

Re: [I] Bad CPU type in executable failure in loading Native library in Apple Silicon with M4 Pro [datafusion-comet]

2025-01-08 Thread via GitHub
mbutrovich commented on issue #1188: URL: https://github.com/apache/datafusion-comet/issues/1188#issuecomment-2577910817 If you're manually installing protobuf anyway, can you try the version supplied by Homebrew? That's what I'm using on my Apple Silicon machine. -- This is an automated

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #13964: URL: https://github.com/apache/datafusion/pull/13964#issuecomment-2577915587 Update is here: https://github.com/apache/datafusion/pull/14045 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] fix: yield when the next file is ready to open to prevent CPU starvation [datafusion]

2025-01-08 Thread via GitHub
jeffreyssmith2nd commented on code in PR #14028: URL: https://github.com/apache/datafusion/pull/14028#discussion_r1907311925 ## datafusion/core/src/datasource/physical_plan/file_stream.rs: ## @@ -478,7 +478,12 @@ impl FileStream {

Re: [PR] Fix error on `array_distinct` when input is empty #13810 [datafusion]

2025-01-08 Thread via GitHub
comphead merged PR #14034: URL: https://github.com/apache/datafusion/pull/14034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Simplify error handling in case.rs (#13990) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on code in PR #14033: URL: https://github.com/apache/datafusion/pull/14033#discussion_r1907660263 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -369,11 +366,8 @@ impl CaseExpr { // evaluate when expression let when_value = self.when

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578417153 > 🤔 something seems to be broken now I feel bad that this broke after my suggestion -- here is a proposal targeting this branch to fix it: - https://github.com/timvw/datafusio

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
timvw commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578428357 > > 🤔 something seems to be broken now > > I feel bad that this broke after my suggestion -- here is a proposal targeting this branch to fix it: > > * [Fix inferring logic

[PR] docs(ci): use up-to-date protoc with docs.rs [datafusion]

2025-01-08 Thread via GitHub
wackywendell opened a new pull request, #14048: URL: https://github.com/apache/datafusion/pull/14048 ## Which issue does this PR close? Closes #13853. ## Rationale for this change This uses the same basic solution as in `substrait-rs`: https://github.com/substrait

Re: [PR] [comet-parquet-exec] Fix regressions in DisableAQECometShuffleSuite [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove merged PR #1237: URL: https://github.com/apache/datafusion-comet/pull/1237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Missing "INFO" log level [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1127: URL: https://github.com/apache/datafusion-comet/issues/1127#issuecomment-2578652346 Here is my repro. ## Comet 0.3.0 ``` scala> spark.read.parquet("/mnt/bigdata/tpch/sf100/lineitem.parquet").createTempView("lineitem") 25/01/08 13:56:35

[PR] ignore: just testing something [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove opened a new pull request, #1239: URL: https://github.com/apache/datafusion-comet/pull/1239 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] chore: Improve shuffle configuration [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on code in PR #1207: URL: https://github.com/apache/datafusion-comet/pull/1207#discussion_r1908004404 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -242,17 +241,17 @@ object CometConf extends ShimCometConf { .booleanConf .c

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1908068498 ## python/datafusion/dataframe.py: ## @@ -620,17 +679,34 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2579022385 Can you confirm if this is related to columnar shuffle by disabling it? This looks to me like an NaN normalizing issue (clearly the Rust NaNs are being counted as not equa

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1908071361 ## python/datafusion/dataframe.py: ## @@ -35,6 +35,65 @@ from datafusion._internal import DataFrame as DataFrameInternal from datafusion.expr import Expr, So

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on code in PR #981: URL: https://github.com/apache/datafusion-python/pull/981#discussion_r1908071116 ## python/datafusion/dataframe.py: ## @@ -620,17 +679,34 @@ def write_csv(self, path: str | pathlib.Path, with_header: bool = False) -> None def write_parq

Re: [PR] feat(substrait): add support for insert roundtrip in append mode [datafusion]

2025-01-08 Thread via GitHub
github-actions[bot] closed pull request #13118: feat(substrait): add support for insert roundtrip in append mode URL: https://github.com/apache/datafusion/pull/13118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Clickhouse SQL generation for datatypes. [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
github-actions[bot] closed pull request #1482: Clickhouse SQL generation for datatypes. URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jayzhan211 commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579134148 > > I think we should fix it on the display/formatting side. For example, we still cannot distinguish: > > ``` > > DataFusion CLI v44.0.0 > > > > > select array[], arr

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jatin510 commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579141610 Why haven’t we been displaying null values as NULL so far? What was the original reasoning or intention behind this decision? -- This is an automated message from the Apache Gi

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
berkaysynnada commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2579313671 We have also checkpoint tests which will drop the stream after some amount of time, and after the failure, FileStream offsets do not increment more. I think the same

Re: [PR] Default to ZSTD compression when writing Parquet [datafusion-python]

2025-01-08 Thread via GitHub
kosiew commented on PR #981: URL: https://github.com/apache/datafusion-python/pull/981#issuecomment-2579331381 Does anyone know how to fix this error: ``` ruff check --output-format=github python/ ruff format --check python/ shell: /usr/bin/bash -e {0} env: py

Re: [I] Initcap behaves differently in Spark and in DataFusion (also Comet) [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on issue #1052: URL: https://github.com/apache/datafusion-comet/issues/1052#issuecomment-2579333075 The fix should be already included in https://github.com/apache/datafusion/commits/branch-44/ -- This is an automated message from the Apache Git Service. To res

Re: [PR] show a mismatch for initcap between Spark and DataFusion [datafusion-comet]

2025-01-08 Thread via GitHub
kazuyukitanimura commented on PR #1051: URL: https://github.com/apache/datafusion-comet/pull/1051#issuecomment-2579336663 #1052 should be already fixed with the DF44 release. Would you like to rebase and re-trigger this test? @Blizzara -- This is an automated message from the Apache Git

Re: [PR] Fix MySQL parsing of GRANT, REVOKE, and CREATE VIEW [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
yoavcloud commented on code in PR #1538: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1538#discussion_r1908272974 ## src/parser/mod.rs: ## @@ -11808,23 +11899,32 @@ impl<'a> Parser<'a> { } } +pub fn parse_grantee_name(&mut self) -> Result { +

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jatin510 commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579130276 > I think we should fix it on the display/formatting side. For example, we still cannot distinguish: > > ``` > DataFusion CLI v44.0.0 > > > select array[], array[nul

Re: [PR] 'array_repeat' if the repeat count value is 0, return NULL instead of empty array [datafusion]

2025-01-08 Thread via GitHub
jonahgao commented on PR #14046: URL: https://github.com/apache/datafusion/pull/14046#issuecomment-2579269061 > Why haven’t we been displaying `null` values as `NULL` so far? What was the original reasoning or intention behind this decision? I guess it's to follow PostgreSQL CLI. Post

Re: [I] Optimize repartitioning logic in ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1235: URL: https://github.com/apache/datafusion-comet/issues/1235#issuecomment-2578168997 Related: https://github.com/apache/arrow-rs/issues/6692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] Feat: Support `map`, `map_keys` & `maps_values` [datafusion-comet]

2025-01-08 Thread via GitHub
dharanad opened a new pull request, #1236: URL: https://github.com/apache/datafusion-comet/pull/1236 ## Which issue does this PR close? Part of #1044 ## Rationale for this change ## What changes are included in this PR? ## How are these chan

Re: [I] Delegate to native code for cast_is_supported [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove closed issue #1171: Delegate to native code for cast_is_supported URL: https://github.com/apache/datafusion-comet/issues/1171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Delegate to native code for cast_is_supported [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1171: URL: https://github.com/apache/datafusion-comet/issues/1171#issuecomment-2578028749 The native code mentioned above is now specific to parquet-to-spark conversion, so closing this issue -- This is an automated message from the Apache Git Service. To re

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
alamb commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1907385995 ## datafusion/optimizer/Cargo.toml: ## @@ -43,6 +43,7 @@ arrow = { workspace = true } chrono = { workspace = true } datafusion-common = { workspace = true, default

Re: [I] Comet native shuffle reader [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on issue #1125: URL: https://github.com/apache/datafusion-comet/issues/1125#issuecomment-2578032442 Now that we have native decompression and decoding, I wonder how important it will be to have a fully native shuffle reader. -- This is an automated message from the Ap

Re: [PR] build(deps): bump protobuf version to 3.21.12 [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove commented on PR #1234: URL: https://github.com/apache/datafusion-comet/pull/1234#issuecomment-2578025818 Thanks @wForget -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] build(deps): bump protobuf version to 3.21.12 [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove merged PR #1234: URL: https://github.com/apache/datafusion-comet/pull/1234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Bad CPU type in executable failure in loading Native library in Apple Silicon with M4 Pro [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove closed issue #1188: Bad CPU type in executable failure in loading Native library in Apple Silicon with M4 Pro URL: https://github.com/apache/datafusion-comet/issues/1188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] Minor: Document output schema of LogicalPlan::Aggregate and LogicalPl… [datafusion]

2025-01-08 Thread via GitHub
alamb opened a new pull request, #14047: URL: https://github.com/apache/datafusion/pull/14047 …an::Window ## Which issue does this PR close? Closes #. ## Rationale for this change While reviewing https://github.com/apache/datafusion/pull/14026 from @nuno-faria

Re: [PR] Parse Postgres's LOCK TABLE statement [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio commented on code in PR #1614: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1614#discussion_r1907585091 ## src/ast/mod.rs: ## @@ -7278,16 +7279,126 @@ impl fmt::Display for SearchModifier { } } +/// A `LOCK TABLE ..` statement. MySQL and Postgres v

Re: [PR] feat: support enable_url_table config [datafusion-python]

2025-01-08 Thread via GitHub
timsaucer commented on PR #980: URL: https://github.com/apache/datafusion-python/pull/980#issuecomment-2578313618 Very nice. Thank you. I've kicked off CI and will merge if all goes through. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
nuno-faria commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1907597367 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -985,6 +985,77 @@ impl OptimizerRule for PushDownFilter { }

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2578322453 I tried a bit today to re-create this but was not able to What I tried was to create a highly compressed parquet file (48MB that has 1B rows with all repeated strings) and

Re: [PR] feat(optimizer): Enable filter pushdown on window functions [datafusion]

2025-01-08 Thread via GitHub
nuno-faria commented on code in PR #14026: URL: https://github.com/apache/datafusion/pull/14026#discussion_r1907601841 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -1442,6 +1513,227 @@ mod tests { assert_optimized_plan_eq(plan, expected) } +/// veri

Re: [I] Bad CPU type in executable failure in loading Native library in Apple Silicon with M4 Pro [datafusion-comet]

2025-01-08 Thread via GitHub
wForget commented on issue #1188: URL: https://github.com/apache/datafusion-comet/issues/1188#issuecomment-2577598827 `protoc-3.19.6-osx-aarch_64.exe` seems to be just a copy of `protoc-3.19.6-osx-x86_64.exe`, so it may not work on arm64 macos. ![image](https://github.com/user-attac

Re: [PR] Fix bug in `nth_value` when `ignoreNulls` is true and no nulls in values [datafusion]

2025-01-08 Thread via GitHub
Blizzara commented on code in PR #14042: URL: https://github.com/apache/datafusion/pull/14042#discussion_r1907148647 ## datafusion/functions-window/src/nth_value.rs: ## @@ -360,9 +360,10 @@ impl PartitionEvaluator for NthValueEvaluator { })

[PR] build(deps): bump protobuf version to 3.21.12 [datafusion-comet]

2025-01-08 Thread via GitHub
wForget opened a new pull request, #1234: URL: https://github.com/apache/datafusion-comet/pull/1234 ## Which issue does this PR close? Closes #1188. ## Rationale for this change ## What changes are included in this PR? bump protobuf version

[I] Implement xxhash algorithms as part of the expression API [datafusion]

2025-01-08 Thread via GitHub
HectorPascual opened a new issue, #14044: URL: https://github.com/apache/datafusion/issues/14044 ### Is your feature request related to a problem or challenge? I am currently in need of using datafusion SQL query engine (through delta-rs merge operation) to hash with a specific hashin

Re: [I] parquet RowGroup pruning for `Dictionary(Decimal)` type incorrect [datafusion]

2025-01-08 Thread via GitHub
lichuang commented on issue #13821: URL: https://github.com/apache/datafusion/issues/13821#issuecomment-2577698922 after i dig into this bug, here is the report: `cast(1 as decimal(4, 1))` will get a value `Decimal128(Some(10),4,1)`, and `eq` operator transfer to two operation: * c

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-08 Thread via GitHub
dependabot[bot] commented on PR #13964: URL: https://github.com/apache/datafusion/pull/13964#issuecomment-2577795548 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-08 Thread via GitHub
alamb closed pull request #13964: Update petgraph requirement from 0.6.2 to 0.7.0 URL: https://github.com/apache/datafusion/pull/13964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #13964: URL: https://github.com/apache/datafusion/pull/13964#issuecomment-2577795431 Actually, I'll close this PR down and hopefully depdabot will update to the latest version automatically -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.0 [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #13964: URL: https://github.com/apache/datafusion/pull/13964#issuecomment-2577793360 @dependabot recreate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Update petgraph requirement from 0.6.2 to 0.7.1 [datafusion]

2025-01-08 Thread via GitHub
dependabot[bot] opened a new pull request, #14045: URL: https://github.com/apache/datafusion/pull/14045 Updates the requirements on [petgraph](https://github.com/petgraph/petgraph) to permit the latest version. Changelog Sourced from https://github.com/petgraph/petgraph/blob/master

Re: [I] Add `union_extract` function [datafusion]

2025-01-08 Thread via GitHub
tobixdev commented on issue #11081: URL: https://github.com/apache/datafusion/issues/11081#issuecomment-2577796009 Hi @gstvg! I have a question regarding this issue. We are currently working on a prototype that would require "ergonomic" handling of unions. I understand that you have

Re: [PR] Use partial aggregation schema for spilling to avoid column mismatch in GroupedHashAggregateStream [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #13995: URL: https://github.com/apache/datafusion/pull/13995#issuecomment-2577780189 ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] chore: deprecate `ValuesExec` in favour of `MemoryExec` [datafusion]

2025-01-08 Thread via GitHub
alamb commented on code in PR #14032: URL: https://github.com/apache/datafusion/pull/14032#discussion_r1907263612 ## datafusion/core/src/physical_planner.rs: ## @@ -466,7 +465,8 @@ impl DefaultPhysicalPlanner { .collect::>>>() })

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
byte-sourcerer commented on PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#issuecomment-2577703594 @iffyio Thank you for your time. I have revised the PR according to your comments and hope you can review it." -- This is an automated message from the Apache Gi

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
byte-sourcerer commented on code in PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#discussion_r1907194351 ## src/parser/mod.rs: ## @@ -8857,6 +8857,23 @@ impl<'a> Parser<'a> { } } +pub fn parse_table_object( +&mut self, +

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
wugeer commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1907192976 ## tests/sqlparser_common.rs: ## @@ -5374,10 +5396,49 @@ fn parse_interval_all() { verified_only_select("SELECT INTERVAL '1' MINUTE TO SECOND");

Re: [PR] Encapsulate fields of `EquivalenceGroup` [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14039: URL: https://github.com/apache/datafusion/pull/14039#issuecomment-2577755462 Thank you for the review @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Encapsulate fields of `OrderingEquivalenceClass` (make field non pub) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14037: URL: https://github.com/apache/datafusion/pull/14037#issuecomment-2577809845 > Thank you @alamb. It clearly improves the usability, LGTM. No need to keep waiting if you have no more commits Thank you for the review @berkaysynnada -- This is an automat

Re: [I] Un-cancellable Query when hitting many large files. [datafusion]

2025-01-08 Thread via GitHub
jeffreyssmith2nd commented on issue #14036: URL: https://github.com/apache/datafusion/issues/14036#issuecomment-2577862225 > How do you cancel the query? You mean terminating the next()'s on the stream, or dropping the stream. The queries are running in the context of a gRPC request,

Re: [PR] Add H2O.ai Database-like Ops benchmark to dfbench (groupby support) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #13996: URL: https://github.com/apache/datafusion/pull/13996#issuecomment-2577933761 I also think we maybe should also consider supporting fewer of these combinations (in follow on PRs) -- for example I am not sure how much value the parquet versions of the h2o tests a

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2578342385 > We could just add that directive to datafusion/substrait/Cargo.toml and see if it fixes it in the next version? Any other ideas? This sounds like a great idea to me -- tha

Re: [PR] Replace `ReferentialAction` enum in `DROP` statements [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio merged PR #1648: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for MS-SQL BEGIN/END TRY/CATCH [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio merged PR #1649: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Encapsulate fields of `OrderingEquivalenceClass` (make field non pub) [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14037: URL: https://github.com/apache/datafusion/pull/14037#issuecomment-2578373024 Since I think this PR is unobjectionable I am merging it in -- I am happy to address any other comments as follow on PRs -- This is an automated message from the Apache Git Service.

Re: [PR] Encapsulate fields of `OrderingEquivalenceClass` (make field non pub) [datafusion]

2025-01-08 Thread via GitHub
alamb merged PR #14037: URL: https://github.com/apache/datafusion/pull/14037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve performance of `find_in_set` function [datafusion]

2025-01-08 Thread via GitHub
tlm365 commented on PR #14020: URL: https://github.com/apache/datafusion/pull/14020#issuecomment-2577664206 @comphead Thanks for reviewing, > I think it is a good PR the way it is. One thing comes to my mind which probably relevant for other string functions as well. > > So we

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
wugeer commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1907191172 ## src/parser/mod.rs: ## @@ -2353,14 +2355,30 @@ impl<'a> Parser<'a> { }; Ok(DateTimeField::Week(week_day))

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
wugeer commented on code in PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630#discussion_r1907192614 ## tests/sqlparser_common.rs: ## @@ -5374,10 +5396,49 @@ fn parse_interval_all() { verified_only_select("SELECT INTERVAL '1' MINUTE TO SECOND");

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
byte-sourcerer commented on code in PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#discussion_r1907194081 ## src/ast/dml.rs: ## @@ -470,8 +470,7 @@ pub struct Insert { /// INTO - optional keyword pub into: bool, /// TABLE -#[cfg_at

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
byte-sourcerer commented on code in PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#discussion_r1907194705 ## src/parser/mod.rs: ## @@ -8857,6 +8857,23 @@ impl<'a> Parser<'a> { } } +pub fn parse_table_object( +&mut self, +

Re: [PR] build(deps): bump protobuf version to 3.21.12 [datafusion-comet]

2025-01-08 Thread via GitHub
codecov-commenter commented on PR #1234: URL: https://github.com/apache/datafusion-comet/pull/1234#issuecomment-2577747978 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1234?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Support pluralized time units [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio merged PR #1630: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Update petgraph requirement from 0.6.2 to 0.7.1 [datafusion]

2025-01-08 Thread via GitHub
alamb merged PR #14045: URL: https://github.com/apache/datafusion/pull/14045 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Optimize repartitioning logic in ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-01-08 Thread via GitHub
andygrove opened a new issue, #1235: URL: https://github.com/apache/datafusion-comet/issues/1235 ### What is the problem the feature request solves? In ShfufleWriterExec, we copy data from each input RecordBatch into array builders for each partition and once a partition reaches the s

Re: [I] datafusion-substrait API docs on docs.rs are broken [datafusion]

2025-01-08 Thread via GitHub
wackywendell commented on issue #13853: URL: https://github.com/apache/datafusion/issues/13853#issuecomment-2578118272 Ah! We just ran into this in https://github.com/substrait-io/substrait-validator/issues/355, and like `substrait-rs`, added a `protoc` feature for using `protobuf-src` to g

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
timvw commented on code in PR #14021: URL: https://github.com/apache/datafusion/pull/14021#discussion_r1907472694 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -114,19 +114,22 @@ impl ListingTableConfig { } } -fn infer_file_extension(path: &str) -

Re: [I] Implement xxhash algorithms as part of the expression API [datafusion]

2025-01-08 Thread via GitHub
alamb commented on issue #14044: URL: https://github.com/apache/datafusion/issues/14044#issuecomment-2578127820 You could also implement this as a user defined function perhaps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: support `INSERT INTO [TABLE] FUNCTION` of Clickhouse [datafusion-sqlparser-rs]

2025-01-08 Thread via GitHub
iffyio commented on code in PR #1633: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1633#discussion_r1907559764 ## src/ast/mod.rs: ## @@ -7766,6 +7766,27 @@ impl fmt::Display for RenameTable { } } +/// table object for insertion +#[derive(Debug, Clone, Par

Re: [PR] [comet-parquet-exec] Move type conversion logic for ParquetExec out of Cast expression. [datafusion-comet]

2025-01-08 Thread via GitHub
parthchandra commented on PR #1229: URL: https://github.com/apache/datafusion-comet/pull/1229#issuecomment-2578261581 > > > > Finally, can we include two more things (either in spark_parquet_options or in some parquet_conversion_context struct) which has the conversion and type promition o

Re: [PR] Fix: ensure that compression type is also taken into consideration during ListingTableConfig infer_options [datafusion]

2025-01-08 Thread via GitHub
alamb commented on PR #14021: URL: https://github.com/apache/datafusion/pull/14021#issuecomment-2578285930 🤔 something seems to be broken now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

  1   2   >