Re: Extending Spark with a custom ExternalClusterManager

2025-05-08 Thread Dejan Pejchev
Hi all, I just created a JIRA ticket and a work in progress PR. Here is the link to the JIRA ticket - https://issues.apache.org/jira/browse/SPARK-52041 Here is the link to the GitHub PR - https://github.com/apache/spark/pull/50770 I kindly ask for feedback. Kind regards On Wed, Feb 19, 2025 at

is someone else also seeing a hang in DataFrameSubquerySuite.simple uncorrelated scalar subquery - eom?

2025-04-10 Thread Asif Shahid

Re: Extending Spark with a custom ExternalClusterManager

2025-02-19 Thread Enrico Minack
Hi devs, Let me pull some spark-submit developers into this discussion. @dongjoon-hyun @HyukjinKwon @cloud-fan What are your thoughts on making spark-submit fully and generically support ExternalClusterManager implementations? The current situation is that the only way to submit a Spark job

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Jules Damji
Yes, if this becomes a need that surfaces time and again, then it’s worthwhile to start a broader discussion in a manner of high-level proposal, which could trigger favorable discussion leading to next steps. CheersJules —Sent from my iPhonePardon the dumb thumb typos :)On Feb 7, 2025, at 8:00 AM

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Well, everything is possible. Please initiate a discussion on the matter of a proposal to "Create a pluggable cluster manager" and put it to the community. See some examples here https://lists.apache.org/list.html?dev@spark.apache.org HTH Dr Mich Talebzadeh, Architect | Data Science |

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Agreed, If the goal is to make Spark truly pluggable, the spark-submit tool itself should be more flexible in handling different cluster managers and their specific requirements. 1. Back in the days, Spark's initial development focused on a limited set of cluster managers (Standalone,

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
This External Cluster Manager is an amazing concept and I really like the separation. Would it be possible to include a broader group and discuss an approach on how to make Spark more pluggable? It is a bit far fetched but we would be very much interested in working on this if this resonates well

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread George J
To me, this seems like a gap in the "pluggable cluster manager" implementation. What is the value of making cluster managers pluggable, if spark-submit doesn't accept jobs on those cluster managers? It seems to me, for pluggable cluster managers to work, you would want some parts

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Well you can try using Environment variable and create a custom script that modifies the --master URL before invoking spark-submit. This script could replace "k8s://" with another identifier of your choice "k8s-armada://") and then modify the SparkSubmit code to handle th

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
scenario would be to edit the SparkSubmit, which we are trying to avoid because we don't want to touch Spark codebase. Do you have an idea how to run in cluster deploy mode and load an external cluster manager? Could it be possible to submit a PR for a change in SparkSubmit? Looking forward to

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Mich Talebzadeh
Kubernetes cluster *as a separate container. which provides better resource isolation and is more suitable for this type of cluster you are using Armada Anyway you can see how it progresses in debugging mode. HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
I got it to work by running it in client mode and using the `local://*` prefix. My external cluster manager gets injected just fine. On Fri, Feb 7, 2025 at 12:38 AM Dejan Pejchev wrote: > Hello Spark community! > > My name is Dejan Pejchev, and I am a Software Engineer working at >

Re: Extending Spark with a custom ExternalClusterManager

2025-02-07 Thread Dejan Pejchev
| GDPR > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Thu, 6 Feb 2025 at 23:40, Dejan Pejchev wrote: > >> Hello Spark community! >> >> My name is Dejan Pejchev, and I am a Software Engineer working at

Re: Extending Spark with a custom ExternalClusterManager

2025-02-06 Thread Mich Talebzadeh
gt; Hello Spark community! > > My name is Dejan Pejchev, and I am a Software Engineer working at > G-Research, and I am a maintainer of our Kubernetes multi-cluster batch > scheduler called Armada. > > We are trying to build an integration with Spark, where we would like to > use t

Extending Spark with a custom ExternalClusterManager

2025-02-06 Thread Dejan Pejchev
Hello Spark community! My name is Dejan Pejchev, and I am a Software Engineer working at G-Research, and I am a maintainer of our Kubernetes multi-cluster batch scheduler called Armada. We are trying to build an integration with Spark, where we would like to use the spark-submit with a master

RE: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-23 Thread Balaji Sudharsanam V
Do we have a Java Client for Spark Connect which is something like PySpark? From: Mich Talebzadeh Sent: 22 January 2025 15:05 To: Hyukjin Kwon Cc: Martin Grund ; Holden Karau ; Dongjoon Hyun ; dev Subject: [EXTERNAL] Re: FYI: A Hallucination about Spark Connect Stability in Spark 4 CI

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Mich Talebzadeh
view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Wed, 22 Jan 2025 at 09:26, Hyukjin Kwon wrote: > While it might be a bit too much to talk about its stability, it is true > that the CI dedicated for Spark Connect compat was broken there for a

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Hyukjin Kwon
While it might be a bit too much to talk about its stability, it is true that the CI dedicated for Spark Connect compat was broken there for a couple of weeks, and the errors from the tests look confusing. I agree that tests and builds could be one of the easiest measurements to tell the state of

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Martin Grund
I'm very confused about how we use stability in CI as a measure to discuss the strategy of a particular feature, particularly because we call these "hallucinations." >From real-world experience, I can say that we have thousands of clients using Spark Connect across many differe

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Jules Damji
Thanks for update and looking into it. Excuse the thumb typos On Tue, 21 Jan 2025 at 4:09 PM, Hyukjin Kwon wrote: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that is > still in de

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Ángel
I'm passionate about and have lots of experience fixing OOMs. Contact me if you need some help. El mié, 22 ene 2025, 1:10, Hyukjin Kwon escribió: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test f

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
Thank you, Hyukjin! Dongjoon On Tue, Jan 21, 2025 at 16:10 Hyukjin Kwon wrote: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that is > still in development. > I made an umbrella

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
Just a quick note on that: the major reason is 1. OOM we should figure out and fix the CI environment. 2. structured streaming test failure that is still in development. I made an umbrella JIRA (https://issues.apache.org/jira/browse/SPARK-50907), and I will work there. Should be easier to look at

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
Let me take a look. shouldn't be a major issue. On Wed, 22 Jan 2025 at 08:31, Mich Talebzadeh wrote: > As discussed on a thread over the weekend, we agreed among us including > Matei on a shift towards a more stable and version-independent APIs. > Spark Connect IMO is a key e

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Mich Talebzadeh
As discussed on a thread over the weekend, we agreed among us including Matei on a shift towards a more stable and version-independent APIs. Spark Connect IMO is a key enabler of this shift, allowing users and developers to build applications and libraries that are more resilient to changes in

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
ark 4? From my perspective, this is still actively > > under development with an open end. > > > > The bottom line is `Spark Connect` needs more community love in order to > > be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the > >

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Holden Karau
` needs more community love in order to > be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the > healthy Spark Connect CI in Spark 4. Until then, let's clarify what is > stable in `Spark Connect` and what is not yet. > > Best Regards, > Dongjoon. >

FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
le in `Spark Connect` and what is not yet. Best Regards, Dongjoon. PS. This is a seperate thread from the previous flakiness issues. https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq ([FYI] Known `Spark Connect` Test Suite Flakiness)

Re: A documentation change is a user-facing change

2025-01-16 Thread Wenchen Fan
: > Thanks for pointing it out! Based on the discussion, I’ve created a PR: > https://github.com/apache/spark/pull/49534. > > Let me know what you think! > > On Thu, Jan 16, 2025 at 2:25 PM Xiao Li > wrote: > >> Thank you for pointing it out! Let’s update the template to ex

Re: A documentation change is a user-facing change

2025-01-16 Thread Gengliang Wang
Thanks for pointing it out! Based on the discussion, I’ve created a PR: https://github.com/apache/spark/pull/49534. Let me know what you think! On Thu, Jan 16, 2025 at 2:25 PM Xiao Li wrote: > Thank you for pointing it out! Let’s update the template to exclude > documentation changes fr

Re: A documentation change is a user-facing change

2025-01-16 Thread Xiao Li
/apache/spark/pull/47756, the behavior changes were not documented. It’s crucial for all committers to carefully review the PR titles and descriptions to ensure they are accurate and complete before merging. How can we bring more attention to this issue and ensure it becomes a consistent practice

Re: A documentation change is a user-facing change

2025-01-16 Thread Reynold Xin
595c8bb5a58/.github/PULL_REQUEST_TEMPLATE#L34-L36>.” > The original intent may well have been about behavior changes only, but > that’s not reflected in the current text of the PR template. > > > On Jan 16, 2025, at 2:32 PM, Dongjoon Hyun > wrote: > > The original intent is a use

Re: A documentation change is a user-facing change

2025-01-16 Thread Nicholas Chammas
cted in the current text of the PR template. > On Jan 16, 2025, at 2:32 PM, Dongjoon Hyun wrote: > > The original intent is a user-facing *behavior* change technically > which is the same with Apache Spark migration guide. > > If so, does it make sense to you? > > Probably,

Re: A documentation change is a user-facing change

2025-01-16 Thread Dongjoon Hyun
The original intent is a user-facing *behavior* change technically which is the same with Apache Spark migration guide. If so, does it make sense to you? Probably, since the template was short to be concise, it could be interpreted in more ways than we thought. Dongjoon. On Thu, Jan 16, 2025

Re: A documentation change is a user-facing change

2025-01-16 Thread Nicholas Chammas
ggests that the author of this policy did mean that yes, even a typo fix in a user-facing documentation page merits a “yes” response to this question. It’s strict but it’s also clear and unambiguous. No one has to think about whether their user-facing change is big enough to merit a yes. IMO t

Re: A documentation change is a user-facing change

2025-01-16 Thread Dongjoon Hyun
I understand your concern, Nicholas. However, isn't it too strict? For the above example, adding a new HTML page is a user-facing change. https://github.com/apache/spark/pull/48852 (This is a new doc) [SPARK-50309][DOCS] Document SQL Pipe Syntax https://github.com/apache/spark/pull/49098

A documentation change is a user-facing change

2025-01-16 Thread Nicholas Chammas
This is not a big deal at all, but I figure it’s worth bringing up briefly because the pull request template does emphasize <https://github.com/apache/spark/blob/ffb31565e5af6f9ab2f8f7b500fbd595c8bb5a58/.github/PULL_REQUEST_TEMPLATE#L34-L36>: > ### Does this PR introduce _any_ us

Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-26 Thread Wenchen Fan
uced at least two subtle bugs > that many reviewers weren't able to catch and those two bugs would not have > been possible to introduce if we had a single pass analyzer. Single pass > can make the whole framework more robust. > > > > > > > On Tue, Aug 20, 2024 a

Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-20 Thread Reynold Xin
+1 on this too When I implemented "group by all", I introduced at least two subtle bugs that many reviewers weren't able to catch and those two bugs would not have been possible to introduce if we had a single pass analyzer. Single pass can make the whole framework more robust.

[DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-20 Thread Xiao Li
This sounds like a good idea! The Analyzer is complex. The changes in the new Analyzer should not affect the existing one. The users could add the QO rules and rely on the existing structures and patterns of the logical plan trees generated by the current one. The new Analyzer needs to generate

Re: [外部邮件] Re: Welcoming a new PMC member

2024-08-14 Thread yangjie01
Congratulations ! 发件人: Matei Zaharia 日期: 2024年8月14日 星期三 06:03 收件人: Wenchen Fan 抄送: Ruifeng Zheng , Martin Grund , Peter Toth , dev 主题: [外部邮件] Re: Welcoming a new PMC member Congrats and welcome Kent! On Aug 13, 2024, at 7:27 AM, Wenchen Fan wrote: Congratulations! On Tue, Aug 13, 2024

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Vladimir Golubev
2: Support main datasources, ...). Running both analyzers in mixed mode may lead to unexpected logical plan problems, because that would introduce a completely different chain of transformations On Wed, Aug 14, 2024 at 3:58 PM Herman van Hovell wrote: > +1(000) on this! > > This should

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Herman van Hovell
+1(000) on this! This should massively reduce allocations done in the analyzer, and it is much more efficient. I also can't count the times that I had to increase the number of iterations. This sounds like a no-brainer to me. I do have two questions: - How do we ensure that we

Re: Welcoming a new PMC member

2024-08-14 Thread Reynold Xin
>>>>> On Mon, Aug 12, 2024 at 8:46 PM Dongjoon Hyun < > dongjoon.h...@gmail.com <mailto:dongjoon.h...@gmail.com>> wrote: > > >>>>>> Congratulations, Kent. > > >>>>>> > > >>>>>> Dongjoon. > &

Re: Welcoming a new PMC member

2024-08-14 Thread Kent Yao
;> Congratulations Kent ! > >>>>> > >>>>> Regards, > >>>>> Mridul > >>>>> > >>>>> On Mon, Aug 12, 2024 at 8:46 PM Dongjoon Hyun >>>>> <mailto:dongjoon.h...@gmail.com>> wrote: > >>>

Re: Welcoming a new PMC member

2024-08-13 Thread Matei Zaharia
gt;>>> <mailto:dongjoon.h...@gmail.com>> wrote: >>>>>> Congratulations, Kent. >>>>>> >>>>>> Dongjoon. >>>>>> >>>>>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li >>>>> <mailto:gatorsm...@gmail.com>> wrote: >>>>>>> Congratulations ! >>>>>>> >>>>>>> Hyukjin Kwon mailto:gurwls...@apache.org>> >>>>>>> 于2024年8月12日周一 17:20写道: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join >>>>>>>> me in welcoming him to his new role! >>>>>>>>

Re: Welcoming a new PMC member

2024-08-13 Thread Wenchen Fan
n, Aug 12, 2024 at 8:46 PM Dongjoon Hyun >>>> wrote: >>>> >>>>> Congratulations, Kent. >>>>> >>>>> Dongjoon. >>>>> >>>>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >>>>> >>>>>> Congratulations ! >>>>>> >>>>>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join >>>>>>> me in welcoming him to his new role! >>>>>>> >>>>>>>

Re: Welcoming a new PMC member

2024-08-13 Thread Ruifeng Zheng
; >>>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >>>> >>>>> Congratulations ! >>>>> >>>>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>>>> >>>>>> Hi all, >>>>>> >>>>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join >>>>>> me in welcoming him to his new role! >>>>>> >>>>>>

Re: Welcoming a new PMC member

2024-08-13 Thread Martin Grund
; On Mon, Aug 12, 2024 at 8:46 PM Dongjoon Hyun >> wrote: >> >>> Congratulations, Kent. >>> >>> Dongjoon. >>> >>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >>> >>>> Congratulations ! >>>> >>>> Hyukjin Kwon

Re: Welcoming a new PMC member

2024-08-13 Thread Peter Toth
; >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >> >>> Congratulations ! >>> >>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>> >>>> Hi all, >>>> >>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>>> in welcoming him to his new role! >>>> >>>>

Re: Welcoming a new PMC member

2024-08-12 Thread Gengliang Wang
>>> Congratulations ! >>> >>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>> >>>> Hi all, >>>> >>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>>> in welcoming him to his new role! >>>> >>>>

Re: Welcoming a new PMC member

2024-08-12 Thread Denny Lee
Congrats, Kent! On Tue, Aug 13, 2024 at 9:06 AM Dongjoon Hyun wrote: > Congratulations, Kent. > > Dongjoon. > > On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: > >> Congratulations ! >> >> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >> >>> Hi all, &g

Re: Welcoming a new PMC member

2024-08-12 Thread huaxin gao
;> >>> Congratulations ! >>> >>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>> >>>> Hi all, >>>> >>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>>> in welcoming him to his new role! >>>> >>>>

Re: Welcoming a new PMC member

2024-08-12 Thread Mridul Muralidharan
>>> Hi all, >>> >>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>> in welcoming him to his new role! >>> >>>

Re: Welcoming a new PMC member

2024-08-12 Thread Jungtaek Lim
Congrats, Kent! On Tue, Aug 13, 2024 at 10:06 AM Dongjoon Hyun wrote: > Congratulations, Kent. > > Dongjoon. > > On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: > >> Congratulations ! >> >> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >> >>> Hi all, &g

Re: Welcoming a new PMC member

2024-08-12 Thread XiDuo You
Congratulations! Yuming Wang 于2024年8月13日周二 08:28写道: > > Congratulations! > > On Mon, Aug 12, 2024 at 5:20 PM Hyukjin Kwon wrote: >> >> Hi all, >> >> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me

Re: Welcoming a new PMC member

2024-08-12 Thread Yuming Wang
Congratulations! On Mon, Aug 12, 2024 at 5:20 PM Hyukjin Kwon wrote: > Hi all, > > The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me in > welcoming him to his new role! > >

Re: Welcoming a new PMC member

2024-08-12 Thread Dongjoon Hyun
Congratulations, Kent. Dongjoon. On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: > Congratulations ! > > Hyukjin Kwon 于2024年8月12日周一 17:20写道: > >> Hi all, >> >> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >> in welcoming him to his new role! >> >>

Re: Welcoming a new PMC member

2024-08-12 Thread Xiao Li
Congratulations ! Hyukjin Kwon 于2024年8月12日周一 17:20写道: > Hi all, > > The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me in > welcoming him to his new role! > >

Welcoming a new PMC member

2024-08-12 Thread Hyukjin Kwon
Hi all, The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me in welcoming him to his new role!

[Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-09 Thread Vladimir Golubev
unobvious, so it’s hard to introduce changes without having the full knowledge. By modifying one rule, the whole chain of transformations can change in an unobvious way. Since we can hit the maximum number of iterations, there’s no guarantee that the plan is going to be resolved. And from a

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
tps://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Wed, 8 May 2024 at 13:41, Prem Sahoo wrote: > >> Could any one help me here ? >> Sent from my iPhone >> >> > On May 7, 2024

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Mich Talebzadeh
e > > > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > > > >  > > Hello Folks, > > in Spark I have read a file and done some transformation and finally > writing to hdfs. > > > > Now I am interested in writing the same dataframe to MapRFS but for this

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > >  > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapR

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo
Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > al

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Binwei Yang
Gluten currently already support Velox backend and Clickhouse backend. data fusion support is also proposed but no one worked on it. Gluten isn't a POC. It's under actively developing but some companies already used it. On 2024/04/11 03:32:01 Dongjoon Hyun wrote: > I'm

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Dongjoon Hyun
I'm interested in your claim. Could you elaborate or provide some evidence for your claim, *a door for all native libraries*, Binwei? For example, is there any POC for that claim? Maybe, did I miss something in that SPIP? Dongjoon. On Wed, Apr 10, 2024 at 8:19 PM Binwei Yang wrote: &g

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Binwei Yang
The SPIP is not for current Gluten, but open a door for all native libraries and accelerators support. On 2024/04/11 00:27:43 Weiting Chen wrote: > Yes, the 1st Apache release(v1.2.0) for Gluten will be in September. > For Spark version support, currently Gluten v1.1.1 support Spark3.2 a

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Weiting Chen
project is still under active development now, and doesn't have a > stable release. > > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1 > > In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of > support. > And, 3.4 will have 3.4.3 next

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Weiting. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under active development now, and doesn't have a s

Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-08 Thread WeitingChen
Hi all, We are excited to introduce a new Apache incubating project called Gluten. Gluten serves as a middleware layer designed to offload Spark to native engines like Velox or ClickHouse. For more detailed information, please visit the project repository at https://github.com/apache/incubator

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-23 Thread Jay Han
> Some of you may be aware that Databricks community Home | Databricks >>> have just launched a knowledge sharing hub. I thought it would be a >>> good idea for the Apache Spark user group to have the same, especially >>> for repeat questions on Spark core, Spark SQL, Spa

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
I concur. Whilst Databricks' (a commercial entity) Knowledge Sharing Hub can be a useful resource for sharing knowledge and engaging with their respective community, ASF likely prioritizes platforms and channels that align more closely with its principles of open source, and vendor neutr

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Steve Loughran
ASF will be unhappy about this. and stack overflow exists. otherwise: apache Confluent and linkedIn exist; LI is the option I'd point at On Mon, 18 Mar 2024 at 10:59, Mich Talebzadeh wrote: > Some of you may be aware that Databricks community Home | Databricks > have just launched

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
n entertain this idea. They seem to have a well defined structure for hosting topics. Let me know your thoughts Thanks <https://community.databricks.com/t5/knowledge-sharing-hub/bd-p/Knowledge-Sharing-Hub> Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kin

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Varun Shah
+1 Great initiative. QQ : Stack overflow has a similar feature called "Collectives", but I am not sure of the expenses to create one for Apache Spark. With SO being used ( atleast before ChatGPT became quite the norm for searching questions), it already has a lot of questions asked an

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Deepak Sharma
>> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *Date: *Monday, March 18, 2024 at 6:36 AM >> *To: *user @spark , Spark dev list < >> dev@spark.apache.org>, Mich Talebzadeh >> *Cc: *Matei Zaharia >> *Subject: *R

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Hyukjin Kwon
org/wiki/Wernher_von_Braun>)". > > > On Mon, 18 Mar 2024 at 16:23, Parsian, Mahmoud > wrote: > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
OK thanks for the update. What does officially blessed signify here? Can we have and run it as a sister site? The reason this comes to my mind is that the interested parties should have easy access to this site (from ISUG Spark sites) as a reference repository. I guess the advice would be that

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Reynold Xin
;> >>> >>> >>> +1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *From:* ashok34668@ yahoo. com. INVAL

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
ars 2024 kl. 17:26 skrev Parsian, Mahmoud > : > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *Date: *Monday, March 18, 2024 at 6:36 AM >> *To: *

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Bjørn Jørgensen
y, March 18, 2024 at 6:36 AM > *To: *user @spark , Spark dev list < > dev@spark.apache.org>, Mich Talebzadeh > *Cc: *Matei Zaharia > *Subject: *Re: A proposal for creating a Knowledge Sharing Hub for Apache > Spark Community > > External message, be mindful when clicking l

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
> dev@spark.apache.org>, Mich Talebzadeh > *Cc: *Matei Zaharia > *Subject: *Re: A proposal for creating a Knowledge Sharing Hub for Apache > Spark Community > > External message, be mindful when clicking links or attachments > > > > Good idea. Will be useful >

A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
Some of you may be aware that Databricks community Home | Databricks have just launched a knowledge sharing hub. I thought it would be a good idea for the Apache Spark user group to have the same, especially for repeat questions on Spark core, Spark SQL, Spark Structured Streaming, Spark Mlib and

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Mich Talebzadeh
> shuffle and better memory management have been introduced, we plan to > publish the benchmark results (at least TPC-H) in the repo. > > > Compared to standard Spark, what kind of performance gains can be > expected with Comet? > > Currently, users could benefit from Comet in a few a

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Chao Sun
epo. > Compared to standard Spark, what kind of performance gains can be expected with Comet? Currently, users could benefit from Comet in a few areas: - Parquet read: a few improvements have been made against reading from S3 in particular, so users can expect better scan performance in this sc

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-16 Thread Mich Talebzadeh
Hi Chao, As a cool feature - Compared to standard Spark, what kind of performance gains can be expected with Comet? - Can one use Comet on k8s in conjunction with something like a Volcano addon? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-15 Thread Mich Talebzadeh
ources but of course cannot be guaranteed . It is essential to note that, as with any advice, one verified and tested result holds more weight than a thousand expert opinions. On Thu, 15 Feb 2024 at 01:18, Chao Sun wrote: > Hi Praveen, > > We will add a "Getting Started" sectio

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
Hi Praveen, We will add a "Getting Started" section in the README soon, but basically comet-spark-shell <https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell> in the repo should provide a basic tool to build Comet and launch a Spark shell with it. Note

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Liu(Laswift) Cao
wrote: > >> > >> Absolutely thrilled to see the project going open-source! Huge congrats > to Chao and the entire team on this milestone! > >> > >> Yufei > >> > >> > >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: > >>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
team on this milestone! >> >> Yufei >> >> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: >>> >>> Hi all, >>> >>> We are very happy to announce that Project Comet, a plugin to >>> accelerate Spark query execution via leve

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread John Zhuge
>> Hi all, >> >> We are very happy to announce that Project Comet, a plugin to >> accelerate Spark query execution via leveraging DataFusion and Arrow, >> has now been open sourced under the Apache Arrow umbrella. Please >> check the project repo >> ht

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Yufei Gu
Absolutely thrilled to see the project going open-source! Huge congrats to Chao and the entire team on this milestone! Yufei On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query e

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Mich Talebzadeh
Sure thanks for clarification. I gather what you are alluding to is -- in a distributed environment, when one does operations that involve shuffling or repartitioning of data, the order in which this data is processed across partitions is not guaranteed. So when repartitioning a dataframe, the

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Jack Goodson
Apologies if it wasn't clear, I was meaning the difficulty of debugging, not floating point precision :) On Wed, Feb 14, 2024 at 2:03 AM Mich Talebzadeh wrote: > Hi Jack, > > " most SQL engines suffer from the same issue... "" > > Sure. This behavior is

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveragin

Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Chao Sun
Hi all, We are very happy to announce that Project Comet, a plugin to accelerate Spark query execution via leveraging DataFusion and Arrow, has now been open sourced under the Apache Arrow umbrella. Please check the project repo https://github.com/apache/arrow-datafusion-comet for more details if

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Mich Talebzadeh
Hi Jack, " most SQL engines suffer from the same issue... "" Sure. This behavior is not a bug, but rather a consequence of the limitations of floating-point precision. The numbers involved in the example (see SPIP [SPARK-47024] Sum of floats/doubles may be incorre

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Jack Goodson
I may be ignorant of other debugging methods in Spark but the best success I've had is using smaller datasets (if runs take a long time) and adding intermediate output steps. This is quite different from application development in non-distributed systems where a debugger is trivial to attach

  1   2   3   4   5   6   7   8   9   10   >