alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2846025987
We are tracking follow on work in
- https://github.com/apache/datafusion/issues/15914
--
This is an automated message from the Apache Git Service.
To respond to the message, p
alamb closed issue #5600: [DISCUSSION] Add separate crate to cover spark
builtin functions
URL: https://github.com/apache/datafusion/issues/5600
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2719645118
https://github.com/apache/datafusion/pull/15168 is ready for review! @alamb
@andygrove
--
This is an automated message from the Apache Git Service.
To respond to the messag
shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715756046
> BTW some backstory [@shehabgamin](https://github.com/shehabgamin) is that
Google Summer of Code applicants are starting to look for projects and this is
one of the ones list
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715751265
BTW some backstory @shehabgamin is that Google Summer of Code applicants
are starting to look for projects and this is one of the ones listed for
DataFusion:
https://datafusion.ap
shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715745098
> There is quite a lot of related discussion on
>
> * [feat: Add `datafusion-spark` crate
#14392](https://github.com/apache/datafusion/pull/14392)
>
>
> C
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2715483375
There is quite a lot of related discussion on
- https://github.com/apache/datafusion/pull/14392
Currently @shehabgamin I think plans to try and prototype this:
- https:
Spaarsh commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2648452637
> I think that would be ok (maybe implement this in datafusion-cli)
I'm sorry if I'm got this wrong, you're suggesting that we could make an
import command in the datafusion
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2648221037
> Since we're planning of having a separate mode for spark wherein a user
can access all spark functions and also not make the main code dependent on
this crate, I was thinking if t
Spaarsh commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2646416918
I began contributing to this repo only from a month ago so please pardon an
errs from my side, but I just wanted to suggest something.
Since we're planning of having a sepa
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628502728
> @alamb just making sure if I understand, CI will still run and check
datafusion/src/functions-spark even it is not a dependency?
Yes that is my assumption
> For exam
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628442110
> My assumption is this is a totally optional feature for downstream
projects (core dependencies do not change).
Correct. It is part of the workspace, but no other DataFus
findepi commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628404269
I feel positive in principle about creating a crate as a home for
Spark-compatible functions, especially if it gonna be maintained ("not dead
project") and released together with
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628311123
I created a draft PR https://github.com/apache/datafusion/pull/14392
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
kazuyukitanimura commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628231531
+1 for including spark functions in DataFusion main
> Do not add a dependency on datafusion crate (datafusion/core)
@alamb just making sure if I understand, C
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628070067
In terms of testing, I think some combination of sqllogictest / gold data
style tests and maybe even real-spark runs in the
[`extended.yml`](https://github.com/apache/datafusion/blo
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628065146
> Perhaps as an alternative we could setup a datafusion-udfs (pick an
appropriate name) under the apache umbrella and managed by datafusion pmc's
where this could live? Just a thoug
comphead commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627708479
> > > With that in mind, a joint effort on something in the main DataFusion
repo or a `datafusion-contrib` repo could both work, and we are open to either
option.
> >
> >
Omega359 commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627663730
> > With that in mind, a joint effort on something in the main DataFusion
repo or a `datafusion-contrib` repo could both work, and we are open to either
option.
>
> I am -
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627587198
Another point worth mentioning. Previously, the Comet release schedule was
being driven by progresso on performance, but now that we have "ok" performance
(2x on TPC-H) and base
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627545351
> I don't have strong opinions on where they should live. So far the keeping
of DF and Comet in sync has mostly been fine, but sometimes when DF breaks the
API it can take a whi
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627537718
> One challenge currently with Comet's expression is that since Comet
operates on the physical plan level, many of the expressions have been written
as implementing PhysicalExpr
Blizzara commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627521620
I (personally and on behalf of my employer) am very much +1 for having
Spark-compatible expressions. We currently use a mix of DF expressions, Comet's
stuff, wrappers around DF's
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627492221
It seems to me that we already have an Apache DataFusion project that
provides Spark-compatible DataFusion expressions (Comet).
I think @shehabgamin's main concern is tha
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627484867
> 3\. I'm not sure we want to say yes to spark but no to other udf suites.
This is a valid point also.
--
This is an automated message from the Apache Git Servic
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627483100
> With that in mind, a joint effort on something in the main DataFusion repo
or a `datafusion-contrib` repo could both work, and we are open to either
option.
I am -1 on
Omega359 commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627152543
I'm of the opinion that while I could see the benefit of spark udfs in
datafusion I really think they would be best handled as a datafusion-contrib.
That is mostly for 3 reasons:
shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626525138
> > I will say that we have encountered numerous problems relying on
downstream DataFusion-based crates,...The issue isn't with the crates
themselves but arises when it's time
kazuyukitanimura commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626192955
Echoing @andygrove 's point. Also if we move the spark-expr to DataFusion
core, release management might get harder. E.g. we may want to fix spark-expr
bugs quickly, but
shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626095699
> [@shehabgamin](https://github.com/shehabgamin) if we did that, would you
be willing to help implement / upstream some of your implementations and tests?
Yes!
@a
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2626014384
I almost started a conversation about this but held back. Moving this crate
upstream has a lot of value, and I support doing so.
However, assuming that most DataFusion con
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2625783165
@andygrove (and maybe @Omega359 ) given the importance of spark and the fact
that comet already has spark compatible expressions what would you say about
moving those expressions in
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2625779458
> I love the idea of collaborating on Spark compatible `UDF`s.
>
> As of writing, `243/402` Spark functions doc-tests pass on Sail. We
haven't focused on performance yet and i
shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2617036154
I love the idea of collaborating on Spark compatible `UDF`s.
As of writing, `243/402` Spark functions doc-tests pass on Sail. We haven't
focused on performance yet and i
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614469808
The crate supports more than 100 expressions so far, most of which are
listed here:
https://datafusion.apache.org/comet/user-guide/expressions.html
-
andygrove commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614468114
We already have a crate with Spark-compatible expressions maintained as part
of the Comet subproject.
https://docs.rs/datafusion-comet-spark-expr/0.5.0/datafusion_comet_s
alamb commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614342453
FWIW we have now completed migrating all functions to User Defined Functions
and I think there is growing interest in
BTW I think there are many people interested in spark co
37 matches
Mail list logo