Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609145891 > @berkaysynnada can we merge this PR in now? Or shall we wait for the statistics revamp that is underway? No need to wait for underway PR as it does not depend which sta

Re: [I] [EPIC] Improve examples to make them easier to navigate [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #11172: URL: https://github.com/apache/datafusion/issues/11172#issuecomment-2609198048 > [@alamb](https://github.com/alamb) I think there is one more thing we can do, though perhaps it is only my preference. > > We can consolidate them into folders, a big `sql

Re: [I] Move `ProjectionPushdown` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-23 Thread via GitHub
buraksenn commented on issue #14184: URL: https://github.com/apache/datafusion/issues/14184#issuecomment-2609203160 > > take > > [@alamb](https://github.com/alamb) I thought this issue was assigned to me, is this an issue with the bot? I did not saw any assignments to this sorr

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub
gatesn commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609166991 I can't think of any other statistical quantities that would immediately help operators, so from our perspective it's only "sum" (we may also use sum to mean true-count for booleans).

Re: [PR] Update datafusion-testing pin [datafusion]

2025-01-23 Thread via GitHub
alamb commented on PR #14242: URL: https://github.com/apache/datafusion/pull/14242#issuecomment-2609202940 Woohoo -- I checked and the tests are passing again: https://github.com/apache/datafusion/actions/runs/12919983033/job/36031496271 -- This is an automated message from the Apache Git

Re: [I] External sorting not working for (maybe only for string columns??) [datafusion]

2025-01-23 Thread via GitHub
xuchen-plus commented on issue #12136: URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2609230218 We have also encountered this issue. After some debugging (by adding debug logs before every call to `try_grow`), it seems that the memory counting of batches returned by `S

Re: [I] Update ClickBench benchmarks with DataFusion `44.0.0` [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #13983: URL: https://github.com/apache/datafusion/issues/13983#issuecomment-2609242828 Someone pointed out to me the other day that DataFusion 43 is no longer on top of the [ClickBench Parquet Leaderboard](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
graup commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2609249058 Thanks for the review @iffyio, I have applied your suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] support for `array_repeat` array function [datafusion-comet]

2025-01-23 Thread via GitHub
jatin510 commented on code in PR #1205: URL: https://github.com/apache/datafusion-comet/pull/1205#discussion_r1925689396 ## native/core/src/execution/planner.rs: ## @@ -775,6 +776,49 @@ impl PhysicalPlanner { Ok(Arc::new(case_expr)) } +

Re: [I] Remove dependency on physical-optimizer on functions-aggregates [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on issue #14243: URL: https://github.com/apache/datafusion/issues/14243#issuecomment-2609476826 I've an idea. You're moving all tests to core/physical_optimizer with https://github.com/apache/datafusion/pull/14244. So, as of now, if we are writing tests core/physical

Re: [PR] consolidate physical_optimizer tests into core/tests/physical_optimizer [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on PR #14244: URL: https://github.com/apache/datafusion/pull/14244#issuecomment-2609482172 I shared my comments in https://github.com/apache/datafusion/issues/14243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Jan 18, 2025: This week(s) in DataFusion [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #14179: URL: https://github.com/apache/datafusion/issues/14179#issuecomment-2609473769 This is a neat "zero lake" idea (DataFusion via WASM in browser) https://gh-sparkling-cherry-6975.fly.dev/ https://github.com/user-attachments/assets/a280fa61-fe90-414

Re: [I] Remove dependency on physical-optimizer on functions-aggregates [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on issue #14243: URL: https://github.com/apache/datafusion/issues/14243#issuecomment-2609479704 > I've an idea. You're moving all tests to core/physical_optimizer with [#14244](https://github.com/apache/datafusion/pull/14244). So, as of now, if we are writing tests c

[PR] Improve docs [datafusion]

2025-01-23 Thread via GitHub
alamb opened a new pull request, #14248: URL: https://github.com/apache/datafusion/pull/14248 ## Which issue does this PR close? - Part of #7013 ## Rationale for this change While messing around with other things I noticed there was some documentation that could be impr

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2609552911 I think `ReturnInfo` would also need to be updated to return a `Field` rather than a subset: ``` /// Return metadata for this function. /// /// See [`ScalarUDFImpl:

[PR] Expose more components from sqllogictest [datafusion]

2025-01-23 Thread via GitHub
xudong963 opened a new pull request, #14249: URL: https://github.com/apache/datafusion/pull/14249 ## Which issue does this PR close? Follow up https://github.com/apache/datafusion/pull/14233 ## Rationale for this change For datafusion users, if they want to bu

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
milenkovicm commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2609570265 Just checking if we're on the same page @alamb It would be great if this would could support use case like https://clickhouse.com/blog/a-new-powerful-json-data-type-f

[I] Update ClickBench benchmarks with DataFusion `45.0.0` (When Published) [datafusion]

2025-01-23 Thread via GitHub
alamb opened a new issue, #14246: URL: https://github.com/apache/datafusion/issues/14246 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-23 Thread via GitHub
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1926635986 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4247,8 +4247,10 @@ logical_plan physical_plan 01)CoalesceBatchesExec: target_batch_size=3, fetch=2 0

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-23 Thread via GitHub
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1926647223 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2217,11 +2217,6 @@ async fn write_parquet_with_order() -> Result<()> { let df = ctx.sql("SELECT * FROM d

Re: [I] Fails to parse TypedString expressions in BigQuery with triple quoted strings [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
graup commented on issue #1673: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1673#issuecomment-2609359758 The issue here is that `parse_literal_string` which is used by `TypedString` doesn't support quote styles other than single or double quoted strings. Trying to f

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
graup commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2609437559 Thanks, should be fixed. (I found the helpful ` HINT: use test_utils::number` 😄) -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609448004 > Statistics can be helpful for optimizer rules, but they also allow short-circuiting computations. For example, min/max can be used to avoid evaluating a filter over a record

[I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
alamb opened a new issue, #14247: URL: https://github.com/apache/datafusion/issues/14247 ### Is your feature request related to a problem or challenge? It seems the design of Arrow extension types is nearing consensus and will arrive soon - https://github.com/apache/arrow-rs/pull/5

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2609530014 FYI @jayzhan211 and @kylebarron -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2609536311 I poked around with this a litte bit and I think it may be more invasive -- we probably have to clean up some other plumbing. -- This is an automated message from the Apache Gi

Re: [I] Extension Types [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-2609539786 I started thinking about how the ExtensionType trait would be used in DataFusion. I think we would need to improve the APIs a bit to be in terms of `Field` rather than `DataType`.

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
jayzhan211 commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2609644634 Given we have `to_field` already, maybe `Fields` is easier to implement ```rust pub struct ReturnTypeArgs<'a> { /// The schema fields of the arguments. Fields

Re: [I] Update ClickBench benchmarks with DataFusion `45.0.0` (When Published) [datafusion]

2025-01-23 Thread via GitHub
Dandandan commented on issue #14246: URL: https://github.com/apache/datafusion/issues/14246#issuecomment-2609650375 Would be nice to get this in as well https://github.com/apache/datafusion/pull/13681 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Support specific `GroupsAccumulator` for `median` [datafusion]

2025-01-23 Thread via GitHub
Dandandan commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1926876243 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,200 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulato

Re: [PR] Support specific `GroupsAccumulator` for `median` [datafusion]

2025-01-23 Thread via GitHub
Dandandan commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1926880943 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,199 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulato

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-23 Thread via GitHub
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1926634800 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4225,7 +4225,7 @@ query B SELECT * FROM t0 FULL JOIN t1 ON t0.c2 >= t1.c2 LIMIT 2; 2 2 2 2

Re: [PR] Feat: support array_compact function [datafusion-comet]

2025-01-23 Thread via GitHub
kazantsev-maksim closed pull request #1321: Feat: support array_compact function URL: https://github.com/apache/datafusion-comet/pull/1321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609417451 > I can't think of any other statistical quantities that would immediately help operators, so from our perspective it's only "sum" (we may also use sum to mean true-count for b

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
iffyio commented on PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#issuecomment-2609295511 @graup could you take a look at the ci failures when you get a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Add support for filter selectivity for more expression types [datafusion]

2025-01-23 Thread via GitHub
berkaysynnada commented on issue #14237: URL: https://github.com/apache/datafusion/issues/14237#issuecomment-2609383884 Hi @ch-sc. I try to address your solution suggestions: > 1. Add support for some missing stuff in interval arithmetics, i.e., temporal data. I highly recommen

[PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
graup opened a new pull request, #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679 Fixes #1673. This is a breaking change. TypedString contained a `String` without any knowledge of the used quote style. The parser used `parse_literal_string` to construct this, wh

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-23 Thread via GitHub
gatesn commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2609426600 Statistics can be helpful for optimizer rules, but they also allow short-circuiting computations. For example, min/max can be used to avoid evaluating a filter over a record batch and

Re: [I] Implement xxhash algorithms as part of the expression API [datafusion]

2025-01-23 Thread via GitHub
ion-elgreco commented on issue #14044: URL: https://github.com/apache/datafusion/issues/14044#issuecomment-2609606375 Perhaps consider adding a non cryptographic module and add wyhash as well (here some of the crates I've used for similar usecase but in polars: https://github.com/ion-elgrec

Re: [PR] Support specific `GroupsAccumulator` for `median` [datafusion]

2025-01-23 Thread via GitHub
Rachelint commented on PR #13681: URL: https://github.com/apache/datafusion/pull/13681#issuecomment-2609661155 I am back and continue working this pr yesterday. The rest work is adding test and sorting out codes, plan to finish it today or tomorrow. Very sorry for long delay for so

Re: [I] Update ClickBench benchmarks with DataFusion `45.0.0` (When Published) [datafusion]

2025-01-23 Thread via GitHub
Rachelint commented on issue #14246: URL: https://github.com/apache/datafusion/issues/14246#issuecomment-260904 > Would be nice to get this in as well [#13681](https://github.com/apache/datafusion/pull/13681) I added fuzzy tests for it in my local, and found some failed case for

Re: [I] Update ClickBench benchmarks with DataFusion `45.0.0` (When Published) [datafusion]

2025-01-23 Thread via GitHub
Dandandan commented on issue #14246: URL: https://github.com/apache/datafusion/issues/14246#issuecomment-2609769396 Thank you very much @Rachelint -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Release DataFusion `45.0.0` [datafusion]

2025-01-23 Thread via GitHub
jayzhan211 commented on issue #14008: URL: https://github.com/apache/datafusion/issues/14008#issuecomment-2609768983 > ValuesExec is now deprecated. The deprecation message is a bit confusing though. It currently states: "Use MemoryExec::try_new_as_values instead", but I think should say: "

Re: [PR] Add support for mysql table hints [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
yoavcloud commented on code in PR #1675: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1675#discussion_r1926542015 ## src/parser/mod.rs: ## @@ -,6 +,64 @@ impl<'a> Parser<'a> { } } +pub fn parse_table_index_hints(&mut self) -> Result,

Re: [I] Add additional regexp functions [datafusion]

2025-01-23 Thread via GitHub
osipovartem commented on issue #11946: URL: https://github.com/apache/datafusion/issues/11946#issuecomment-2609951360 I can implement [regexp_substr()](https://pgpedia.info/r/regexp_substr.html) can I just create a pull request or do I need to be added to some access list? -- This is a

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-23 Thread via GitHub
eliaperantoni commented on PR #13664: URL: https://github.com/apache/datafusion/pull/13664#issuecomment-2610077654 @alamb As agreed, I implemented all the errors that I previously implemented but in a less invasive way, only adding `Span` to `Column`. Here's some examples: htt

[I] Query produces different results after physical plan round trip to bytes [datafusion]

2025-01-23 Thread via GitHub
robtandy opened a new issue, #14253: URL: https://github.com/apache/datafusion/issues/14253 ### Describe the bug Serializing a physical plan to bytes and then recreating it produces a different output. This was observed on TPCH query 3. Note though, that it was also observed

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
findepi commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2610778483 > However, since the `ReturnTypeInfo` only provides `DataType` the the `Field` information will not be present and thus UDF writers will not be able to access extension type inf

Re: [PR] National strings: check if dialect supports backslash escape [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
hansott commented on code in PR #1672: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1672#discussion_r1927058238 ## src/tokenizer.rs: ## @@ -3543,4 +3546,17 @@ mod tests { ]; compare(expected, tokens); } + +#[test] +fn test_national_

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#discussion_r1927085944 ## native/core/src/execution/metrics/utils.rs: ## @@ -55,60 +64,21 @@ pub fn update_comet_metric( Some(metrics.aggregate_by_name()) }; -u

[PR] Minor: Add tests for types of arithmetic operator output types [datafusion]

2025-01-23 Thread via GitHub
alamb opened a new pull request, #14250: URL: https://github.com/apache/datafusion/pull/14250 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/pull/14223 ## Rationale for this change While reviewing https://github.com/apache/dat

Re: [I] Question on: `visit_expressions_mut` for alias expr [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
docteurklein commented on issue #1475: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1475#issuecomment-2609963101 I think I have the same problem with: ```sql select * from (select 'test' order by 1 asc) ``` It seems impossible to write a `visit_statements

Re: [PR] chore: Refactor QueryPlanSerde to allow logic to be moved to individual classes per expression [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1331: URL: https://github.com/apache/datafusion-comet/pull/1331#discussion_r1927462683 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1557,918 +1624,935 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSe

Re: [PR] chore: Bump DataFusion to rev 5592834 [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1927466228 ## native/core/src/execution/operators/scan.rs: ## @@ -304,11 +304,7 @@ fn scan_schema(input_batch: &InputBatch, data_types: &[DataType]) -> SchemaRef {

Re: [PR] chore: Bump DataFusion to rev 5592834 [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1927466775 ## native/core/src/execution/shuffle/row.rs: ## @@ -3435,24 +3435,10 @@ fn builder_to_array( } fn make_batch(arrays: Vec, row_count: usize) -> Result { -

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-01-23 Thread via GitHub
parthchandra commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2610659661 > @parthchandra @mbutrovich, could you take a look since this may impact our current work to move to ParquetExec It looks for the most part that we should be able to move

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-23 Thread via GitHub
jatin510 commented on code in PR #14236: URL: https://github.com/apache/datafusion/pull/14236#discussion_r1927082321 ## datafusion/functions-nested/src/repeat.rs: ## @@ -124,19 +125,47 @@ impl ScalarUDFImpl for ArrayRepeat { &self.aliases } +fn coerce_types(&

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-01-23 Thread via GitHub
mbutrovich commented on code in PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#discussion_r1927134258 ## native/core/src/execution/jni_api.rs: ## @@ -508,9 +505,6 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( let ne

Re: [PR] fix: add support for Decimal128 and Decimal256 types in interval arithmetic [datafusion]

2025-01-23 Thread via GitHub
waynexia commented on PR #14126: URL: https://github.com/apache/datafusion/pull/14126#issuecomment-2610062530 Hi @alamb, I've copied constant vectors from arrow, please take another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#discussion_r1927089473 ## native/core/src/execution/jni_api.rs: ## @@ -508,9 +505,6 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( let nex

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#issuecomment-2609980366 @mbutrovich may be interested in reviewing this as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] [DISCUSSION]: Inconsistent Behavior Between prefer_existing_sort and AggregateExec's required_input_ordering [datafusion]

2025-01-23 Thread via GitHub
ozankabak commented on issue #14231: URL: https://github.com/apache/datafusion/issues/14231#issuecomment-2610108788 @alamb, before we make an attempt into this, do you have any thoughts you can share? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Question on: `visit_expressions_mut` for alias expr [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
docteurklein commented on issue #1475: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1475#issuecomment-2610324878 Given what I see here https://github.com/openobserve/openobserve/blob/ebcb12c4ead3c31d16f2ef5f515707c670dd2611/src/service/search/sql.rs#L1210, maybe the better

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#discussion_r1927142543 ## native/core/src/execution/jni_api.rs: ## @@ -508,9 +505,6 @@ pub unsafe extern "system" fn Java_org_apache_comet_Native_executePlan( let nex

Re: [PR] Only support escape literals for Postgres, Redshift and generic dialect [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
hansott commented on code in PR #1674: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1674#discussion_r1927142721 ## src/tokenizer.rs: ## @@ -982,7 +982,8 @@ impl<'a> Tokenizer<'a> { } } // PostgreSQL accepts "

Re: [I] Jan 18, 2025: This week(s) in DataFusion [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #14179: URL: https://github.com/apache/datafusion/issues/14179#issuecomment-2610174007 @mertak-synnada has a PR that nicely refactors the data source code but may be a non trivial downstream API change - https://github.com/apache/datafusion/pull/14224 -- This

Re: [PR] perf: improve performance of update metrics [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on PR #1329: URL: https://github.com/apache/datafusion-comet/pull/1329#issuecomment-2610028654 Based on a single run, I see approximately 2% improvement in TPC-H (325s on main vs 318s with this PR) -- This is an automated message from the Apache Git Service. To respon

Re: [I] Add additional regexp functions [datafusion]

2025-01-23 Thread via GitHub
Omega359 commented on issue #11946: URL: https://github.com/apache/datafusion/issues/11946#issuecomment-2610141766 > I can implement [regexp_substr()](https://pgpedia.info/r/regexp_substr.html) can I just create a pull request or do I need to be added to some access list? A PR should

Re: [I] [DISCUSSION]: Inconsistent Behavior Between prefer_existing_sort and AggregateExec's required_input_ordering [datafusion]

2025-01-23 Thread via GitHub
alamb commented on issue #14231: URL: https://github.com/apache/datafusion/issues/14231#issuecomment-2610159122 The description of the issue on this ticket makes sense to me > AggregateExec sets its required_input_ordering based solely on its group-by expressions without checking any

Re: [PR] Only support escape literals for Postgres, Redshift and generic dialect [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
hansott commented on PR #1674: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1674#issuecomment-2610413404 Yup, I'm on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Only support escape literals for Postgres, Redshift and generic dialect [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
hansott commented on PR #1674: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1674#issuecomment-2610424263 @iffyio done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1927082163 ## src/ast/value.rs: ## @@ -97,6 +97,32 @@ pub enum Value { Placeholder(String), } +impl Into for Value { +fn into(self) -> String { +

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-23 Thread via GitHub
jatin510 commented on code in PR #14236: URL: https://github.com/apache/datafusion/pull/14236#discussion_r1927082321 ## datafusion/functions-nested/src/repeat.rs: ## @@ -124,19 +125,47 @@ impl ScalarUDFImpl for ArrayRepeat { &self.aliases } +fn coerce_types(&

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1927082163 ## src/ast/value.rs: ## @@ -97,6 +97,32 @@ pub enum Value { Placeholder(String), } +impl Into for Value { +fn into(self) -> String { +

Re: [PR] Made imdb download (data_imdb) function atomic [datafusion]

2025-01-23 Thread via GitHub
Spaarsh commented on PR #14225: URL: https://github.com/apache/datafusion/pull/14225#issuecomment-2609946746 > Thank you @Spaarsh > > I tried this out locally and found that now the conversion code was not triggered. > > I pushed a fix to your branch locally (moving where the `

Re: [PR] fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers [datafusion]

2025-01-23 Thread via GitHub
alamb commented on code in PR #14223: URL: https://github.com/apache/datafusion/pull/14223#discussion_r1927086851 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -777,29 +777,19 @@ pub fn binary_numeric_coercion( (_, Float32) | (Float32, _) => Some(Float32)

Re: [I] Question on: `visit_expressions_mut` for alias expr [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
cisaacson commented on issue #1475: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1475#issuecomment-2609988461 If someone can provide some guidance on this I would be happy to work on a PR to fix this. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Improve `ScalarUDFImpl` docs [datafusion]

2025-01-23 Thread via GitHub
alamb commented on PR #14248: URL: https://github.com/apache/datafusion/pull/14248#issuecomment-2610161559 Thank you @xudong963 for the quick review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Improve `ScalarUDFImpl` docs [datafusion]

2025-01-23 Thread via GitHub
alamb merged PR #14248: URL: https://github.com/apache/datafusion/pull/14248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Breaking change since DF 44: `swap_hash_join` removed [datafusion]

2025-01-23 Thread via GitHub
andygrove opened a new issue, #14251: URL: https://github.com/apache/datafusion/issues/14251 ### Describe the bug Comet uses the public `swap_hash_join` function: https://docs.rs/datafusion/latest/datafusion/physical_optimizer/join_selection/fn.swap_hash_join.html This seems

Re: [I] Breaking change since DF 44: `swap_hash_join` removed [datafusion]

2025-01-23 Thread via GitHub
andygrove commented on issue #14251: URL: https://github.com/apache/datafusion/issues/14251#issuecomment-2610437667 This isn't blocking Comet. I found that I just needed to make this change: ```rust -let swapped_hash_join = -swap_hash_j

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-23 Thread via GitHub
jatin510 commented on code in PR #14236: URL: https://github.com/apache/datafusion/pull/14236#discussion_r1927082321 ## datafusion/functions-nested/src/repeat.rs: ## @@ -124,19 +125,47 @@ impl ScalarUDFImpl for ArrayRepeat { &self.aliases } +fn coerce_types(&

[PR] chore: Bump DataFusion to rev 5592834 [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove opened a new pull request, #1332: URL: https://github.com/apache/datafusion-comet/pull/1332 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1304 ## Rationale for this change Use latest DF in preparatio

Re: [PR] chore: Fix merge conflicts from merging comet-parquet-exec into main [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on PR #1323: URL: https://github.com/apache/datafusion-comet/pull/1323#issuecomment-2610871968 @mbutrovich could you restart CI (maybe trigger by pushing an empty commit?) -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Reject CREATE TABLE/VIEW with duplicate column names [datafusion]

2025-01-23 Thread via GitHub
findepi commented on PR #13517: URL: https://github.com/apache/datafusion/pull/13517#issuecomment-2610880556 Correct. Apparently the only question was about how to enforce the verification: https://github.com/apache/datafusion/pull/13517#discussion_r1864259318 The new builder types are m

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-01-23 Thread via GitHub
alamb commented on code in PR #14224: URL: https://github.com/apache/datafusion/pull/14224#discussion_r1927115530 ## datafusion/core/src/datasource/data_source.rs: ## @@ -0,0 +1,264 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[PR] chore: deprecate function with typo in name (is_expr_constant_accross_partitions) [datafusion]

2025-01-23 Thread via GitHub
andygrove opened a new pull request, #14252: URL: https://github.com/apache/datafusion/pull/14252 ## Which issue does this PR close? N/A ## Rationale for this change Fix typo in public API function ## What changes are included in this PR?

[PR] Support multiple tables in `UPDATE FROM` clause [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
iffyio opened a new pull request, #1681: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1681 Adds support for multiple tables in the `FROM` clause of an update statement. ```sql UPDATE A SET .. FROM T, U, V ``` -- This is an automated message from the Apache

Re: [PR] chore: Bump DataFusion to rev 5592834 [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove commented on code in PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1927363212 ## native/core/src/execution/operators/filter.rs: ## @@ -168,11 +169,15 @@ impl FilterExec { if binary.op() == &Operator::Eq {

Re: [PR] consolidation of examples: date_time_functions [datafusion]

2025-01-23 Thread via GitHub
comphead merged PR #14240: URL: https://github.com/apache/datafusion/pull/14240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: deprecate function with typo in name (is_expr_constant_accross_partitions) [datafusion]

2025-01-23 Thread via GitHub
comphead commented on code in PR #14252: URL: https://github.com/apache/datafusion/pull/14252#discussion_r1927368794 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -1221,10 +1221,33 @@ impl EquivalenceProperties { /// # Returns /// /// Returns `t

Re: [PR] Feature Unifying source execution plans [datafusion]

2025-01-23 Thread via GitHub
andygrove commented on PR #14224: URL: https://github.com/apache/datafusion/pull/14224#issuecomment-2610476305 @parthchandra @mbutrovich, could you take a look since this may impact our current work to move to ParquetExec -- This is an automated message from the Apache Git Service. To res

Re: [PR] National strings: check if dialect supports backslash escape [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
iffyio commented on code in PR #1672: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1672#discussion_r1927261386 ## src/tokenizer.rs: ## @@ -3543,4 +3546,17 @@ mod tests { ]; compare(expected, tokens); } + +#[test] +fn test_national_s

Re: [PR] National strings: check if dialect supports backslash escape [datafusion-sqlparser-rs]

2025-01-23 Thread via GitHub
iffyio merged PR #1672: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Change `ReturnTypeInfo` to return a `Field` rather than `DataType` [datafusion]

2025-01-23 Thread via GitHub
paleolimbot commented on issue #14247: URL: https://github.com/apache/datafusion/issues/14247#issuecomment-2610283428 Just a note that this all quite a lot of work to go through to avoid adding an `Extension` member to the `DataType` enum. It seems like you could avoid a breaking change the

[I] Refactor QueryPlanSerde to Use Individual Classes for Expression Translation [datafusion-comet]

2025-01-23 Thread via GitHub
andygrove opened a new issue, #1330: URL: https://github.com/apache/datafusion-comet/issues/1330 ### What is the problem the feature request solves? The QueryPlanSerde class in Comet currently contains a huge match statement that handles the logic for translating Spark expressions int

Re: [PR] consolidation of examples: date_time_functions [datafusion]

2025-01-23 Thread via GitHub
logan-keede commented on PR #14240: URL: https://github.com/apache/datafusion/pull/14240#issuecomment-2610520534 > thanks @logan-keede for removing also `to_char.rs` I doubled checked it and it refers to conversion from dates so it perfectly fits to date and time examples thanks, but

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-23 Thread via GitHub
korowa commented on code in PR #14236: URL: https://github.com/apache/datafusion/pull/14236#discussion_r1927474710 ## datafusion/functions-nested/src/repeat.rs: ## @@ -136,7 +137,17 @@ pub fn array_repeat_inner(args: &[ArrayRef]) -> Result { } let element = &args[0]

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-23 Thread via GitHub
korowa commented on code in PR #14236: URL: https://github.com/apache/datafusion/pull/14236#discussion_r1927495321 ## datafusion/functions-nested/src/repeat.rs: ## @@ -124,19 +125,47 @@ impl ScalarUDFImpl for ArrayRepeat { &self.aliases } +fn coerce_types(&se

Re: [PR] Add casting of `count` to `Int64` in `array_repeat` function to ensure consistent integer type handling [datafusion]

2025-01-23 Thread via GitHub
korowa commented on code in PR #14236: URL: https://github.com/apache/datafusion/pull/14236#discussion_r1927495321 ## datafusion/functions-nested/src/repeat.rs: ## @@ -124,19 +125,47 @@ impl ScalarUDFImpl for ArrayRepeat { &self.aliases } +fn coerce_types(&se

  1   2   3   >