subject:"\[I\] \[DISCUSSION\] Add separate crate to cover spark builtin functions \[datafusion\]"

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-05-01 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2846025987 We are tracking follow on work in - https://github.com/apache/datafusion/issues/15914 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-05-01 Thread via GitHub

alamb closed issue #5600: [DISCUSSION] Add separate crate to cover spark builtin functions URL: https://github.com/apache/datafusion/issues/5600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-03-12 Thread via GitHub

shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2719645118 https://github.com/apache/datafusion/pull/15168 is ready for review! @alamb @andygrove -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-03-11 Thread via GitHub

shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715756046 > BTW some backstory [@shehabgamin](https://github.com/shehabgamin) is that Google Summer of Code applicants are starting to look for projects and this is one of the ones list

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-03-11 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715751265 BTW some backstory @shehabgamin is that Google Summer of Code applicants are starting to look for projects and this is one of the ones listed for DataFusion: https://datafusion.ap

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-03-11 Thread via GitHub

shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715745098 > There is quite a lot of related discussion on > > * [feat: Add `datafusion-spark` crate #14392](https://github.com/apache/datafusion/pull/14392) > > > C

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-03-11 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715483375 There is quite a lot of related discussion on - https://github.com/apache/datafusion/pull/14392 Currently @shehabgamin I think plans to try and prototype this: - https:

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-02-10 Thread via GitHub

Spaarsh commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2648452637 > I think that would be ok (maybe implement this in datafusion-cli) I'm sorry if I'm got this wrong, you're suggesting that we could make an import command in the datafusion

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-02-10 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2648221037 > Since we're planning of having a separate mode for spark wherein a user can access all spark functions and also not make the main code dependent on this crate, I was thinking if t

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-02-09 Thread via GitHub

Spaarsh commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2646416918 I began contributing to this repo only from a month ago so please pardon an errs from my side, but I just wanted to suggest something. Since we're planning of having a sepa

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628502728 > @alamb just making sure if I understand, CI will still run and check datafusion/src/functions-spark even it is not a dependency? Yes that is my assumption > For exam

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628442110 > My assumption is this is a totally optional feature for downstream projects (core dependencies do not change). Correct. It is part of the workspace, but no other DataFus

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

findepi commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628404269 I feel positive in principle about creating a crate as a home for Spark-compatible functions, especially if it gonna be maintained ("not dead project") and released together with

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628311123 I created a draft PR https://github.com/apache/datafusion/pull/14392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

kazuyukitanimura commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628231531 +1 for including spark functions in DataFusion main > Do not add a dependency on datafusion crate (datafusion/core) @alamb just making sure if I understand, C

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628070067 In terms of testing, I think some combination of sqllogictest / gold data style tests and maybe even real-spark runs in the [`extended.yml`](https://github.com/apache/datafusion/blo

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628065146 > Perhaps as an alternative we could setup a datafusion-udfs (pick an appropriate name) under the apache umbrella and managed by datafusion pmc's where this could live? Just a thoug

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

comphead commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627708479 > > > With that in mind, a joint effort on something in the main DataFusion repo or a `datafusion-contrib` repo could both work, and we are open to either option. > > > >

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

Omega359 commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627663730 > > With that in mind, a joint effort on something in the main DataFusion repo or a `datafusion-contrib` repo could both work, and we are open to either option. > > I am -

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627587198 Another point worth mentioning. Previously, the Comet release schedule was being driven by progresso on performance, but now that we have "ok" performance (2x on TPC-H) and base

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627545351 > I don't have strong opinions on where they should live. So far the keeping of DF and Comet in sync has mostly been fine, but sometimes when DF breaks the API it can take a whi

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627537718 > One challenge currently with Comet's expression is that since Comet operates on the physical plan level, many of the expressions have been written as implementing PhysicalExpr

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

Blizzara commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627521620 I (personally and on behalf of my employer) am very much +1 for having Spark-compatible expressions. We currently use a mix of DF expressions, Comet's stuff, wrappers around DF's

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627492221 It seems to me that we already have an Apache DataFusion project that provides Spark-compatible DataFusion expressions (Comet). I think @shehabgamin's main concern is tha

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627484867 > 3\. I'm not sure we want to say yes to spark but no to other udf suites. This is a valid point also. -- This is an automated message from the Apache Git Servic

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627483100 > With that in mind, a joint effort on something in the main DataFusion repo or a `datafusion-contrib` repo could both work, and we are open to either option. I am -1 on

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-31 Thread via GitHub

Omega359 commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627152543 I'm of the opinion that while I could see the benefit of spark udfs in datafusion I really think they would be best handled as a datafusion-contrib. That is mostly for 3 reasons:

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-30 Thread via GitHub

shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626525138 > > I will say that we have encountered numerous problems relying on downstream DataFusion-based crates,...The issue isn't with the crates themselves but arises when it's time

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-30 Thread via GitHub

kazuyukitanimura commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626192955 Echoing @andygrove 's point. Also if we move the spark-expr to DataFusion core, release management might get harder. E.g. we may want to fix spark-expr bugs quickly, but

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-30 Thread via GitHub

shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626095699 > [@shehabgamin](https://github.com/shehabgamin) if we did that, would you be willing to help implement / upstream some of your implementations and tests? Yes! @a

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-30 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626014384 I almost started a conversation about this but held back. Moving this crate upstream has a lot of value, and I support doing so. However, assuming that most DataFusion con

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-30 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2625783165 @andygrove (and maybe @Omega359 ) given the importance of spark and the fact that comet already has spark compatible expressions what would you say about moving those expressions in

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-30 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2625779458 > I love the idea of collaborating on Spark compatible `UDF`s. > > As of writing, `243/402` Spark functions doc-tests pass on Sail. We haven't focused on performance yet and i

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-27 Thread via GitHub

shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2617036154 I love the idea of collaborating on Spark compatible `UDF`s. As of writing, `243/402` Spark functions doc-tests pass on Sail. We haven't focused on performance yet and i

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-26 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614469808 The crate supports more than 100 expressions so far, most of which are listed here: https://datafusion.apache.org/comet/user-guide/expressions.html -

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-26 Thread via GitHub

andygrove commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614468114 We already have a crate with Spark-compatible expressions maintained as part of the Comet subproject. https://docs.rs/datafusion-comet-spark-expr/0.5.0/datafusion_comet_s

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-26 Thread via GitHub

alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614342453 FWIW we have now completed migrating all functions to User Defined Functions and I think there is growing interest in BTW I think there are many people interested in spark co

37 matches

Mail list logo