Re: [Spark SQL] [DISK_ONLY Persistence] getting "this.inMemSorter" is null exception

2024-11-12 Thread Gurunandan
You should be able to split large job into more manageable jobs based on stages using checkpoint. if a job fails, Job can be restarted from the latest checkpoint, saving time and resources, thus xheckpoints can be used as recovery points. Smaller stages can be optimized independently, leading to be

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
👍 El mié, 13 nov 2024, 3:52, Holden Karau escribió: > So it’s deprecated but I will review some basic graph X PRs as I would > like us to bring graph X back to life — but under our current release > structure we need to deprecate now if we want to be able to remove it in > the next few years. >

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Holden Karau
So it’s deprecated but I will review some basic graph X PRs as I would like us to bring graph X back to life — but under our current release structure we need to deprecate now if we want to be able to remove it in the next few years. Twitter: https://twitter.com/holdenkarau Fight Health Insurance:

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
I thought that too ... until I read the message from Matei Zaharia: "Votes to deprecate both SparkR and GraphX have passed. These components will officially be deprecated in Spark 4." Didn't know in open source you could deprecate things that have been there years so lightly without carrying out

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Russell Jurney
Angel, okay, I see the announcement. Thanks for bringing that to my attention. So, I started out getting up to speed on GraphFrames and doing a little maintenance. Next I'm going to go in and fix some bugs in GraphX. On the GraphFrames side, there is actually a bug converting GraphFrames to GraphX

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Russell Jurney
That is unfortunate. I saw someone volunteer to review my PRs. I thought there was a holdout? On Tue, Nov 12, 2024 at 12:56 PM Ángel wrote: > Nope. didn't miss that, in fact, I mentioned that graphframes used GraphX > under the hood. > > The thing is ... even though we were trying to get maintai

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
Nope. didn't miss that, in fact, I mentioned that graphframes used GraphX under the hood. The thing is ... even though we were trying to get maintainers the deprecation of GraphX passed suddenly in the middle of that discussion. El mar, 12 nov 2024, 21:47, Russell Jurney escribió: > I guess you

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Russell Jurney
I guess you missed where Reynold Xin suggested we instead bring GraphFrames into Spark and others agreed? On Tue, Nov 12, 2024 at 12:08 PM Ángel wrote: > You only have to look at the subject of this thread of mails. It says > nothing about graphframes. I thought we were "fighting" against deprec

Re: Which shuffle operations trigger AQE and which don't?

2024-11-12 Thread Mich Talebzadeh
Yep, AEQ is a useful optimization technique that dynamically adjusts the query execution plan based on runtime statistics. It is designed to improve query performance . You are correct that AQE is primarily triggered at shuffle boundaries. These are points in the query plan where data is shuffled b

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
You only have to look at the subject of this thread of mails. It says nothing about graphframes. I thought we were "fighting" against deprecating GraphX because it seemed not have any maintainers in quite a few time. Maybe I got it wrong. El mar, 12 nov 2024, 19:12, Russell Jurney escribió: > No

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Russell Jurney
Not sure what you mean? GraphX is the core Apache Spark technology underneath GraphFrames - parts of GraphFrames use it. `git grep -i graphx | wc -l` shows 147 hits for `graphx` in GraphFrames master branch as of now. I started out getting familiar with the GraphFrames codebase with some low hangi

Getting "Cannot broadcast the table that is larger than 8GB error" - Clarification

2024-11-12 Thread Lakshminarayana Chari
Hi Spark Community, I raised an issue in Stackoverflow ( https://stackoverflow.com/staging-ground/79182163). Cross sending here. Can someone help. *** This is more of a clarification on my understanding. I get the error "Cannot broadcast the table that is larger than 8GB error" Here is m

Which shuffle operations trigger AQE and which don't?

2024-11-12 Thread Perfect Stranger
I thought that AQE is triggered after every kind of shuffle operation. But it seems that it isn't. Is there a list of operations that trigger and don't trigger AQE? For example I noticed that repartition(partitionsNumber) does not trigger AQE.

Unsubscribe

2024-11-12 Thread rakesh sharma
Get Outlook for Android

Unsubscribe

2024-11-12 Thread rakesh sharma
Get Outlook for Android

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
But the goal wasn't to fix bugs in GraphX? What has that to do with graphframes? El mar, 12 nov 2024, 12:58, Russell Jurney escribió: > I started working on GraphFrames this weekend, got it building and started > with some docs PRs. A lot of the example code no longer worked, so I fixed > it. I'

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Russell Jurney
I started working on GraphFrames this weekend, got it building and started with some docs PRs. A lot of the example code no longer worked, so I fixed it. I'm updating the docs to indicate our plan to integrate it with Apache Spark. I'll announce a hackathon in the next week or so :) Russell On W

Job Opportunities in India,UK,Australia,UAE,Singapore or USA

2024-11-12 Thread sri hari kali charan Tummala
Hello Spark Community, As a seasoned Data Engineering professional with 12+ years of experience, I specialize in Apache Spark, particularly in Structured Streaming. I'm currently exploring job opportunities in India, the UK, Australia, UAE, Singapore, and the USA. If anyone is aware of openings o