Re: [PR] Remove inline table scan analyzer rule [datafusion]

2025-03-20 Thread via GitHub
alamb commented on PR #15201: URL: https://github.com/apache/datafusion/pull/15201#issuecomment-2738300722 @aditanase perhaps you can make a PR with whatever tests were breaking in your fork so we can make sure they work here -- This is an automated message from the Apache Git Service. T

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2004351049 ## docs/source/user-guide/tuning.md: ## @@ -141,30 +191,22 @@ It must be set before the Spark context is created. You can enable or disable Co at ru

Re: [PR] feat: Support serde for JsonSource PhysicalPlan [datafusion]

2025-03-20 Thread via GitHub
milenkovicm commented on code in PR #15311: URL: https://github.com/apache/datafusion/pull/15311#discussion_r2005232310 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -247,6 +247,15 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode { .with_file_compres

Re: [I] Make `DiskManagerBuilder` to construct DiskManagers [datafusion]

2025-03-20 Thread via GitHub
Standing-Man commented on issue #15319: URL: https://github.com/apache/datafusion/issues/15319#issuecomment-2739850033 Hi @alamb, If the PR has been closed, should the issue still remain open? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] feat: instrument spawned tasks with current tracing span when `tracing` feature is enabled [datafusion]

2025-03-20 Thread via GitHub
geoffreyclaude commented on PR #14547: URL: https://github.com/apache/datafusion/pull/14547#issuecomment-2739859432 @alamb and @goldmedal thanks for the review! > From my perspective this PR now closes that issue's stated request: > > > I would like to make it easy for people to

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-03-20 Thread via GitHub
2010YOUY01 closed pull request #14975: feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries URL: https://github.com/apache/datafusion/pull/14975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-03-20 Thread via GitHub
2010YOUY01 commented on PR #14975: URL: https://github.com/apache/datafusion/pull/14975#issuecomment-2739748374 @alamb Thanks for the review, I agree there are two things we can improve: 1. Create a new struct for spilling related utilities, instead of putting it inside `DiskManager` 2

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-03-20 Thread via GitHub
alamb commented on PR #14975: URL: https://github.com/apache/datafusion/pull/14975#issuecomment-2739766003 > I plan to address point 1 and the above-mentioned limitation in another PR, and after that rework this PR to add disk limit feature. So closed this PR for now. THank you -- I

Re: [PR] Support Duration in min/max agg functions [datafusion]

2025-03-20 Thread via GitHub
alamb commented on PR #15310: URL: https://github.com/apache/datafusion/pull/15310#issuecomment-2739772489 https://github.com/user-attachments/assets/c4de068f-d54e-4125-9392-ea77c6cfd308"; /> Will do -- thanks @svranesevic I normally try and wait for abour 24 hours before merging

Re: [PR] Support Duration in min/max agg functions [datafusion]

2025-03-20 Thread via GitHub
alamb merged PR #15310: URL: https://github.com/apache/datafusion/pull/15310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-03-20 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2739119123 @kosiew any chance you can try running the test case in https://github.com/apache/datafusion/issues/14757? It's a real world example of schema evolution that I hope can be solve

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-20 Thread via GitHub
2010YOUY01 commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2739672614 > Thank you @waynexia, I'm planning to check it out at most tomorrow. > > I have a question in advance before reviewing -- have you been considering to implement groups accu

[PR] chore(deps): bump blake3 from 1.6.1 to 1.7.0 [datafusion]

2025-03-20 Thread via GitHub
dependabot[bot] opened a new pull request, #15331: URL: https://github.com/apache/datafusion/pull/15331 Bumps [blake3](https://github.com/BLAKE3-team/BLAKE3) from 1.6.1 to 1.7.0. Release notes Sourced from https://github.com/BLAKE3-team/BLAKE3/releases";>blake3's releases. 1

[PR] chore(deps): bump indexmap from 2.7.1 to 2.8.0 [datafusion]

2025-03-20 Thread via GitHub
dependabot[bot] opened a new pull request, #15333: URL: https://github.com/apache/datafusion/pull/15333 Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.7.1 to 2.8.0. Changelog Sourced from https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md";>indexmap's

Re: [PR] Saner handling of nulls inside arrays [datafusion]

2025-03-20 Thread via GitHub
joroKr21 commented on PR #15149: URL: https://github.com/apache/datafusion/pull/15149#issuecomment-2739679149 > I recommend following whatever DuckDB (or postgres do) -- there is not muchv alue in DataFusion having different semantics from other systems * DuckDB doesn't have union for

Re: [I] Failed optimizations with Int64 type [datafusion]

2025-03-20 Thread via GitHub
aectaan commented on issue #15291: URL: https://github.com/apache/datafusion/issues/15291#issuecomment-2739694259 Ok, probably it's related to `Analyzer`. After disabling it optimisations are ok -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-03-20 Thread via GitHub
alamb commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2004518126 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-20 Thread via GitHub
korowa commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2739393073 Thank you @waynexia, I'm planning to check it out at most tomorrow. I have a question in advance before reviewing -- have you been considering to implement groups accumulator fo

Re: [I] Native scan panic with native_iceberg_compat on hdfs [datafusion-comet]

2025-03-20 Thread via GitHub
comphead commented on issue #1553: URL: https://github.com/apache/datafusion-comet/issues/1553#issuecomment-2737306709 interesting why the panic is nonunwinding, by default the `panic` on Rust for release should be unwinding -- This is an automated message from the Apache Git Service. To

Re: [I] Allow UDFs to return custom `Diagnostic` [datafusion]

2025-03-20 Thread via GitHub
eliaperantoni commented on issue #15276: URL: https://github.com/apache/datafusion/issues/15276#issuecomment-2739463296 Hey @jsai28, that looks good! I have some points that I'd like to hear your opinion on: 1. I think `FnCallSpans.args` should have exactly as many elements as the ar

Re: [I] [EPIC] A collection of tickets for improving sorting larger than memory datasets / spilling sorts [datafusion]

2025-03-20 Thread via GitHub
xudong963 commented on issue #15271: URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2739420835 @alamb Thank you for summarizing, I'm also interested in this topic and may have more time to join the game in May, but I will keep an eye on the progress. -- This is an aut

Re: [PR] Add doc for the `statistics_from_parquet_meta_calc method` [datafusion]

2025-03-20 Thread via GitHub
xudong963 commented on code in PR #15330: URL: https://github.com/apache/datafusion/pull/15330#discussion_r2005013876 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -797,10 +797,34 @@ pub async fn fetch_statistics( statistics_from_parquet_meta_calc(&metadata, ta

[PR] Add doc for the `statistics_from_parquet_meta_calc method` [datafusion]

2025-03-20 Thread via GitHub
xudong963 opened a new pull request, #15330: URL: https://github.com/apache/datafusion/pull/15330 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/pull/15289 ## Rationale for this change I'm refactor the method `statistics_from

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-20 Thread via GitHub
andygrove commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2004547775 ## docs/source/user-guide/tuning.md: ## @@ -17,18 +17,96 @@ specific language governing permissions and limitations under the License. --> -# Tuning Guid

Re: [PR] chore(deps): Update sqlparser to 0.55.0 [datafusion]

2025-03-20 Thread via GitHub
PokIsemaine commented on code in PR #15183: URL: https://github.com/apache/datafusion/pull/15183#discussion_r2005632036 ## datafusion/sql/src/planner.rs: ## @@ -560,11 +558,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { SQLDataType::SmallInt(_) | SQLDataType::

Re: [PR] chore(deps): bump quote from 1.0.38 to 1.0.40 [datafusion]

2025-03-20 Thread via GitHub
xudong963 merged PR #15332: URL: https://github.com/apache/datafusion/pull/15332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] fix: make register_object_store use same session_env as file scan [datafusion-comet]

2025-03-20 Thread via GitHub
wForget commented on code in PR #1555: URL: https://github.com/apache/datafusion-comet/pull/1555#discussion_r2005775054 ## native/core/src/parquet/mod.rs: ## @@ -641,6 +640,8 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat session_timezo

Re: [PR] feat: Support serde for JsonSource PhysicalPlan [datafusion]

2025-03-20 Thread via GitHub
westhide commented on code in PR #15311: URL: https://github.com/apache/datafusion/pull/15311#discussion_r2005735108 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -247,6 +247,15 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode { .with_file_compressio

[PR] Prep for 0.1.0rc2 [datafusion-ray]

2025-03-20 Thread via GitHub
robtandy opened a new pull request, #86: URL: https://github.com/apache/datafusion-ray/pull/86 This PR is long but it does not affect the core functionality of DataFusion for Ray, and does not differ from `0.1.0rc1` which has been extensively used by me in benchmarking from `test.pypi`.

Re: [PR] Prep for 0.1.0rc2 [datafusion-ray]

2025-03-20 Thread via GitHub
robtandy commented on PR #86: URL: https://github.com/apache/datafusion-ray/pull/86#issuecomment-2740336109 @andygrove Here is the PR I mentioned to you that I would submit with benchmarking code and results. I have some good graphs of the results, but i'll submit them in a subseque

Re: [I] Add most functions to the Expr class so that they're chainable. [datafusion-python]

2025-03-20 Thread via GitHub
deanm commented on issue #1064: URL: https://github.com/apache/datafusion-python/issues/1064#issuecomment-2740490227 Is this something a PR would be accepted for or no? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-20 Thread via GitHub
alan910127 commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r2005666997 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> { right: Expr, right_schema:

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-20 Thread via GitHub
alan910127 commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r2005666997 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> { right: Expr, right_schema:

Re: [PR] perf: unwrap cast for comparing ints =/!= strings [datafusion]

2025-03-20 Thread via GitHub
alan910127 commented on code in PR #15110: URL: https://github.com/apache/datafusion/pull/15110#discussion_r2005677168 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -290,19 +290,72 @@ impl<'a> TypeCoercionRewriter<'a> { right: Expr, right_schema:

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-20 Thread via GitHub
Dandandan commented on code in PR #15324: URL: https://github.com/apache/datafusion/pull/15324#discussion_r2005787338 ## datafusion/functions-aggregate/src/count.rs: ## @@ -752,10 +761,245 @@ impl Accumulator for DistinctCountAccumulator { } } +/// GroupsAccumulator for

Re: [PR] fix: check if handle has been initialized before closing [datafusion-comet]

2025-03-20 Thread via GitHub
wForget commented on PR #1554: URL: https://github.com/apache/datafusion-comet/pull/1554#issuecomment-2740681759 > I just wanted to see in what condition the NativeBatchReader can be called after close has been called. The scenario I encountered was not NativeBatchReader called afte

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-03-20 Thread via GitHub
Dandandan commented on code in PR #15324: URL: https://github.com/apache/datafusion/pull/15324#discussion_r2005787338 ## datafusion/functions-aggregate/src/count.rs: ## @@ -752,10 +761,245 @@ impl Accumulator for DistinctCountAccumulator { } } +/// GroupsAccumulator for

Re: [PR] feat: Support serde for FileScanConfig `batch_size` [datafusion]

2025-03-20 Thread via GitHub
westhide commented on code in PR #15335: URL: https://github.com/apache/datafusion/pull/15335#discussion_r2005989730 ## datafusion/proto/proto/datafusion.proto: ## @@ -997,6 +997,7 @@ message FileScanExecConf { reserved 10; datafusion_common.Constraints constraints = 11;

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-03-20 Thread via GitHub
alamb commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740980237 > Thanks for checking [@alamb](https://github.com/alamb) ! > > I think a large portion is spent in the hash join (repartitioning the right side input) - I think because it r

Re: [PR] refactor: move `CteWorkTable`, `default_table_source` a bunch of files out of core [datafusion]

2025-03-20 Thread via GitHub
alamb commented on PR #15316: URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2741010424 > where does Memtable belong datasource or catalog? it is TableProvider implementation so I thought It was going to be in catalog, but I m not so sure anymore as it has dependency on d

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-20 Thread via GitHub
alamb commented on PR #15313: URL: https://github.com/apache/datafusion/pull/15313#issuecomment-2740902341 FYI @blaginin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] refactor: move `CteWorkTable`, `default_table_source` a bunch of files out of core [datafusion]

2025-03-20 Thread via GitHub
logan-keede commented on PR #15316: URL: https://github.com/apache/datafusion/pull/15316#issuecomment-2740991793 where does Memtable belong datasource or catalog? it is TableProvider implementation so I thought It was going to be in catalog, but I m not so sure anymore as it has dependency

Re: [PR] [WIP] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-20 Thread via GitHub
andygrove commented on code in PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2005998677 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1334,26 +1334,46 @@ object CometSparkSessionExtensions extends Logging {

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-20 Thread via GitHub
kevinjqliu commented on code in PR #60: URL: https://github.com/apache/datafusion-site/pull/60#discussion_r2006136201 ## content/blog/2025-03-20-parquet-pruning.md: ## @@ -0,0 +1,118 @@ +--- +layout: post +title: Parquet Pruning in DataFusion: Read Only What Matters +date: 2025-

Re: [PR] include some BinaryOperator from sqlparser [datafusion]

2025-03-20 Thread via GitHub
waynexia commented on PR #15327: URL: https://github.com/apache/datafusion/pull/15327#issuecomment-2741876016 >It might also be a good idea to include some documentation in the operators themselves that DataFusion doesn't have default implementations Added in [5828cba](https://github

Re: [I] [Rust] [datafusion] Allow integration in non libc environments [datafusion]

2025-03-20 Thread via GitHub
alamb closed issue #102: [Rust] [datafusion] Allow integration in non libc environments URL: https://github.com/apache/datafusion/issues/102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-20 Thread via GitHub
andygrove commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2006506263 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,130 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-20 +aut

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-03-20 Thread via GitHub
kosiew commented on code in PR #15301: URL: https://github.com/apache/datafusion/pull/15301#discussion_r2006732123 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -163,26 +187,32 @@ impl TopK { // TODO make this algorithmically better?: // Idea: filter out r

Re: [PR] feat: introduce hadoop mini cluster to test native scan on hdfs [datafusion-comet]

2025-03-20 Thread via GitHub
wForget commented on code in PR #1556: URL: https://github.com/apache/datafusion-comet/pull/1556#discussion_r2006750847 ## pom.xml: ## @@ -447,6 +448,13 @@ under the License. 5.1.0 + +org.apache.hadoop +hadoop-client-minicluster Review C

[PR] Comet 0.7.0 [datafusion-site]

2025-03-20 Thread via GitHub
andygrove opened a new pull request, #63: URL: https://github.com/apache/datafusion-site/pull/63 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-03-20 Thread via GitHub
Dandandan commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2741469672 I traced this down to an issue in the planner, which uses `PartitionMode::Auto` iff stats are collected (`datafusion.execution.collect_statistics`) We can however still use

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-20 Thread via GitHub
comphead commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2006367866 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,130 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-20 +auth

[PR] 1075/enhancement/Make col class with __getattr__ [datafusion-python]

2025-03-20 Thread via GitHub
deanm opened a new pull request, #1076: URL: https://github.com/apache/datafusion-python/pull/1076 # Which issue does this PR close? Closes #1075 # Rationale for this change To improve ergonomics of the API by providing a quicker way of accessing columns using the __ge

Re: [I] Snowflake COPY INTO fails to parse with a semicolon [datafusion-sqlparser-rs]

2025-03-20 Thread via GitHub
tv42 commented on issue #1519: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1519#issuecomment-2741566749 This was fixed in sqlparse v0.55.0, likely https://github.com/apache/datafusion-sqlparser-rs/pull/1669 -- This is an automated message from the Apache Git Service. To

Re: [I] type coercion for arthmetic/binary ops fails for some unsigned/signed mappings [datafusion]

2025-03-20 Thread via GitHub
Omega359 commented on issue #15340: URL: https://github.com/apache/datafusion/issues/15340#issuecomment-2741609413 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-20 Thread via GitHub
kazuyukitanimura commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2006435819 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,131 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-20 Thread via GitHub
andygrove commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2006440492 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,131 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-20 +aut

Re: [PR] Comet 0.7.0 [datafusion-site]

2025-03-20 Thread via GitHub
andygrove commented on code in PR #63: URL: https://github.com/apache/datafusion-site/pull/63#discussion_r2006442098 ## content/blog/2025-03-20-datafusion-comet-0.7.0.md: ## @@ -0,0 +1,131 @@ +--- +layout: post +title: Apache DataFusion Comet 0.7.0 Release +date: 2025-03-20 +aut

[PR] fix type coercion for uint/int's [datafusion]

2025-03-20 Thread via GitHub
Omega359 opened a new pull request, #15341: URL: https://github.com/apache/datafusion/pull/15341 ## Which issue does this PR close? - Closes #15340 ## Rationale for this change Better handle type coercion when unsigned numerics are involved ## What changes are included

Re: [PR] added explaination for Schema and DFSchema to documentation [datafusion]

2025-03-20 Thread via GitHub
comphead commented on code in PR #15329: URL: https://github.com/apache/datafusion/pull/15329#discussion_r2006103754 ## docs/source/library-user-guide/working-with-exprs.md: ## @@ -50,6 +50,29 @@ As another example, the SQL expression `a + b * c` would be represented as an `E

Re: [I] Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) [datafusion]

2025-03-20 Thread via GitHub
adriangb commented on issue #7955: URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2741188852 I have a PR up for doing something similar for TopK sorts (`ORDER BY col LIMIT 10`) in https://github.com/apache/datafusion/pull/15301. I think we should be able to re-use that w

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-20 Thread via GitHub
kevinjqliu commented on PR #60: URL: https://github.com/apache/datafusion-site/pull/60#issuecomment-2741193664 > The diagram below illustrates the [Parquet reading pipeline](https://docs.rs/datafusion/46.0.0/datafusion/datasource/physical_plan/parquet/source/struct.ParquetSource.html%60%60%6

[PR] 1064/enhancement/add functions to Expr class [datafusion-python]

2025-03-20 Thread via GitHub
deanm opened a new pull request, #1074: URL: https://github.com/apache/datafusion-python/pull/1074 # Which issue does this PR close? Works towards closing #1064 # Rationale for this change To improve ergonomics of the API by adding functions to the Expr class so th

[PR] fix: write hive partitions for any int/uint/float [datafusion]

2025-03-20 Thread via GitHub
christophermcdermott opened a new pull request, #15337: URL: https://github.com/apache/datafusion/pull/15337 ## Which issue does this PR close? Closes #15336 ## Rationale for this change Support additional types in hive partitions. ## What changes a

Re: [PR] Blog post on Parquet pruning in datafusion [datafusion-site]

2025-03-20 Thread via GitHub
kevinjqliu commented on PR #60: URL: https://github.com/apache/datafusion-site/pull/60#issuecomment-2741250429 #62 should fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-20 Thread via GitHub
comphead commented on code in PR #61: URL: https://github.com/apache/datafusion-site/pull/61#discussion_r2006162703 ## content/blog/2025-03-21-parquet-pushdown.md: ## @@ -0,0 +1,259 @@ +--- +layout: post +title: Efficient Filter Pushdown in Parquet +date: 2025-03-21 +author: Xia

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-20 Thread via GitHub
milenkovicm commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2741214382 apparently you found another bug: https://github.com/apache/datafusion-ballista/blob/bb10a1bebd52ebb91515efa7a2a977df740c2d7a/ballista/scheduler/src/scheduler_serv

[I] Write hive partitions for any int/uint/float [datafusion]

2025-03-20 Thread via GitHub
christophermcdermott opened a new issue, #15336: URL: https://github.com/apache/datafusion/issues/15336 ### Is your feature request related to a problem or challenge? I hit this error: DataFusion error: This feature is not implemented: it is not yet supported to write to hive part

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-20 Thread via GitHub
comphead commented on PR #61: URL: https://github.com/apache/datafusion-site/pull/61#issuecomment-2741241689 on content/images/parquet-pushdown/baseline-impl.jpg the flow comes from 3 to 5, I assume it is expected, perhaps its needed to make a separate comment? -- This is an automated me

Re: [PR] Blog post on Parquet filter pushdown [datafusion-site]

2025-03-20 Thread via GitHub
comphead commented on code in PR #61: URL: https://github.com/apache/datafusion-site/pull/61#discussion_r2006168286 ## content/blog/2025-03-21-parquet-pushdown.md: ## @@ -0,0 +1,259 @@ +--- +layout: post +title: Efficient Filter Pushdown in Parquet +date: 2025-03-21 +author: Xia

Re: [PR] Fix parquet pruning blog post hyperlink [datafusion-site]

2025-03-20 Thread via GitHub
kevinjqliu commented on PR #62: URL: https://github.com/apache/datafusion-site/pull/62#issuecomment-2741251030 cc @XiangpengHao @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] fix: write hive partitions for any int/uint/float [datafusion]

2025-03-20 Thread via GitHub
Omega359 commented on PR #15337: URL: https://github.com/apache/datafusion/pull/15337#issuecomment-2741362724 LGTM, thanks @christophermcdermott ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[I] Add a Col class instead of just col function to use __getattr__ method [datafusion-python]

2025-03-20 Thread via GitHub
deanm opened a new issue, #1075: URL: https://github.com/apache/datafusion-python/issues/1075 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This would allow columns to be referred to as attr methods of col. For example inste

Re: [PR] Implement GroupsAccumulator for min/max Duration [datafusion]

2025-03-20 Thread via GitHub
shruti2522 commented on code in PR #15322: URL: https://github.com/apache/datafusion/pull/15322#discussion_r2006224462 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -264,6 +265,7 @@ impl AggregateUDFImpl for Max { | Binary | LargeBinar

Re: [PR] Implement GroupsAccumulator for min/max Duration [datafusion]

2025-03-20 Thread via GitHub
shruti2522 commented on code in PR #15322: URL: https://github.com/apache/datafusion/pull/15322#discussion_r2006224462 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -264,6 +265,7 @@ impl AggregateUDFImpl for Max { | Binary | LargeBinar

Re: [I] Add most functions to the Expr class so that they're chainable. [datafusion-python]

2025-03-20 Thread via GitHub
deanm commented on issue #1064: URL: https://github.com/apache/datafusion-python/issues/1064#issuecomment-2741351917 @timsaucer I put in a [draft PR](https://github.com/apache/datafusion-python/pull/1074) that does all the one input arg functions. Is your reluctance to putting *

Re: [PR] feat: Support serde for FileScanConfig `batch_size` [datafusion]

2025-03-20 Thread via GitHub
westhide commented on PR #15335: URL: https://github.com/apache/datafusion/pull/15335#issuecomment-2740991253 > Thank you @westhide > > Should we remove the `batch_size` from JSON source too? > > https://github.com/apache/datafusion/blob/dd9c3a815d7b4af2ef503ea557332ecc700af318

Re: [PR] SET statements: scope modifier for multiple assignments [datafusion-sqlparser-rs]

2025-03-20 Thread via GitHub
iffyio commented on code in PR #1772: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1772#discussion_r2006914091 ## src/parser/mod.rs: ## @@ -11145,17 +11145,16 @@ impl<'a> Parser<'a> { } /// Parse a `SET ROLE` statement. Expects SET to be consumed alre

Re: [PR] fix: make register_object_store use same session_env as file scan [datafusion-comet]

2025-03-20 Thread via GitHub
wForget commented on code in PR #1555: URL: https://github.com/apache/datafusion-comet/pull/1555#discussion_r2006926052 ## native/core/src/parquet/mod.rs: ## @@ -641,6 +640,8 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat session_timezo

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-20 Thread via GitHub
viirya commented on code in PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2006927319 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -255,8 +256,7 @@ object CometConf extends ShimCometConf { val COMET_MEMORY_OVERHEAD_MIN_MI

Re: [PR] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-20 Thread via GitHub
viirya commented on code in PR #1561: URL: https://github.com/apache/datafusion-comet/pull/1561#discussion_r2006927110 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -236,17 +236,18 @@ object CometConf extends ShimCometConf { val COMET_MEMORY_OVERHEAD: Optio

Re: [I] Add most functions to the Expr class so that they're chainable. [datafusion-python]

2025-03-20 Thread via GitHub
timsaucer commented on issue #1064: URL: https://github.com/apache/datafusion-python/issues/1064#issuecomment-2740837049 This sounds like a great idea! I don’t think we want it for *every* function, but definitely many of them - probably a majority. Things like regr_count prob

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-03-20 Thread via GitHub
westhide commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2740838704 The test in version `45.0.0` seems blocking, try fixing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] chore(deps): bump blake3 from 1.6.1 to 1.7.0 [datafusion]

2025-03-20 Thread via GitHub
xudong963 merged PR #15331: URL: https://github.com/apache/datafusion/pull/15331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-03-20 Thread via GitHub
alamb commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2740855773 > I did not fully get this part. DF has semi join support and some rewrites to utilize it in similar cases? > The query transformation in SQL as given by @xudong963 is optim

Re: [PR] Enforce JOIN plan to require condition [datafusion]

2025-03-20 Thread via GitHub
goldmedal commented on code in PR #15334: URL: https://github.com/apache/datafusion/pull/15334#discussion_r2005913705 ## datafusion/optimizer/src/push_down_limit.rs: ## @@ -861,167 +849,6 @@ mod test { assert_optimized_plan_equal(outer_query, expected) } -#[t

[PR] [WIP] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-20 Thread via GitHub
andygrove opened a new pull request, #1561: URL: https://github.com/apache/datafusion-comet/pull/1561 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1560 ## Rationale for this change ## What changes are included

Re: [PR] perf: Reuse row converter during sort [datafusion]

2025-03-20 Thread via GitHub
Dandandan commented on code in PR #15302: URL: https://github.com/apache/datafusion/pull/15302#discussion_r2005752259 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -688,15 +707,29 @@ impl ExternalSorter { let fetch = self.fetch; let expressions: LexOrd

Re: [PR] [WIP] chore: Fix some inconsistencies in memory pool configuration [datafusion-comet]

2025-03-20 Thread via GitHub
andygrove closed pull request #1561: [WIP] chore: Fix some inconsistencies in memory pool configuration URL: https://github.com/apache/datafusion-comet/pull/1561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] feat: Support serde for FileScanConfig `batch_size` [datafusion]

2025-03-20 Thread via GitHub
westhide opened a new pull request, #15335: URL: https://github.com/apache/datafusion/pull/15335 ## Which issue does this PR close? - Closes None. - Reference [Support serde for batch_size](https://github.com/apache/datafusion/pull/15311#discussion_r2004114426) ## Ra

Re: [PR] Migrate physical plan tests to `insta` (Part-1) [datafusion]

2025-03-20 Thread via GitHub
Shreyaskr1409 commented on code in PR #15313: URL: https://github.com/apache/datafusion/pull/15313#discussion_r2005883069 ## datafusion/physical-plan/Cargo.toml: ## @@ -58,6 +58,7 @@ futures = { workspace = true } half = { workspace = true } hashbrown = { workspace = true } i

Re: [I] Missing 46.0.1 release for the `datafusion` crate [datafusion]

2025-03-20 Thread via GitHub
vadimpiven commented on issue #15328: URL: https://github.com/apache/datafusion/issues/15328#issuecomment-2740896190 Hi! I can report that without `datafusion` crate release the issue https://github.com/apache/datafusion/issues/15122 still reproduces and still requires hotfix ``` [dep

Re: [PR] chore(deps): Update sqlparser to 0.55.0 [datafusion]

2025-03-20 Thread via GitHub
jonahgao commented on code in PR #15183: URL: https://github.com/apache/datafusion/pull/15183#discussion_r2005949005 ## datafusion/sql/src/planner.rs: ## @@ -560,11 +558,11 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { SQLDataType::SmallInt(_) | SQLDataType::Int

Re: [PR] Fix extended tests by restore datafusion-testing submodule [datafusion]

2025-03-20 Thread via GitHub
alamb commented on PR #15318: URL: https://github.com/apache/datafusion/pull/15318#issuecomment-2740909824 Thanks @adriangb and @ozankabak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add doc for the `statistics_from_parquet_meta_calc method` [datafusion]

2025-03-20 Thread via GitHub
xudong963 commented on code in PR #15330: URL: https://github.com/apache/datafusion/pull/15330#discussion_r2005013876 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -797,10 +797,34 @@ pub async fn fetch_statistics( statistics_from_parquet_meta_calc(&metadata, ta

Re: [PR] docs: various improvements to tuning guide [datafusion-comet]

2025-03-20 Thread via GitHub
andygrove commented on code in PR #1525: URL: https://github.com/apache/datafusion-comet/pull/1525#discussion_r2003622238 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -274,11 +272,9 @@ object CometConf extends ShimCometConf { .createWithDefault(true)

Re: [PR] fix: make register_object_store use same session_env as file scan [datafusion-comet]

2025-03-20 Thread via GitHub
parthchandra commented on code in PR #1555: URL: https://github.com/apache/datafusion-comet/pull/1555#discussion_r2005685215 ## native/core/src/parquet/mod.rs: ## @@ -641,6 +640,8 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat session_t

Re: [PR] Migrate tests to insta [datafusion]

2025-03-20 Thread via GitHub
xudong963 merged PR #15288: URL: https://github.com/apache/datafusion/pull/15288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Migrate the following tests to `insta` [datafusion]

2025-03-20 Thread via GitHub
xudong963 closed issue #15282: Migrate the following tests to `insta` URL: https://github.com/apache/datafusion/issues/15282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: check if handle has been initialized before closing [datafusion-comet]

2025-03-20 Thread via GitHub
parthchandra commented on PR #1554: URL: https://github.com/apache/datafusion-comet/pull/1554#issuecomment-2740525922 > > Is it possible to add a unit test for this? > > Do we need to add a error test case for this obvious anomaly? It's not necessary. I just wanted to see in wh

  1   2   >