Ted Yu suggested posting the improved numbers to this thread and I think
it's good idea, so also posting here, but I also think explaining
rationalization of my issues would help understanding why I'm submitting
couple of patches, so I'll explain it first. (Sorry to post a wall of text).
tl;dr. SP
Bump. I have been having hard time working on making additional PRs since
some of these rely on non-merged PRs, so spending additional time to
decouple these things if possible.
https://github.com/apache/spark/pulls/HeartSaVioR
Pending 5 PRs so far, and may add more sooner or later.
Thanks,
Jung
Hi Team,
Could I get some review at the patch here. Would love to hear suggestions
here on the patch. I had to reopen SPARK-20168 because of this bug.
https://github.com/apache/spark/pull/21541
https://issues.apache.org/jira/browse/SPARK-20168
Cheers,
Yash
Does anyone who worked on SS or is familiar with it have an opinion on
this?:
https://issues.apache.org/jira/browse/SPARK-24699
https://github.com/apache/spark/pull/21676
I don't like the idea of making Trigger.Once execute two batches, but it is
the simplest way to get the desired batch state c
By looking into the code of ShuffleBlockFetcherIterator, seems it will load
all shuffle blocks needed by one task at one time and release all of them
when task finished.
Please correct me if I'm wrong.
If my understanding is correct, does it mean those shuffle blocks will keep
in memory even thou
Now all broadcast data are put on heap since the StorageLevel is hard-coded
to MEMORY_AND_DISK_SER in TorrentBroadcast.
So it's highly possible the broadcast data will goto oldgen and then may
have impact on full-gc.
So what about put broadcast to offheap?
Chrysan Wu
Phone:+86 17717640807
Prem sure, Thanks for suggestion.
On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote:
> try .pipe(.py) on RDD
>
> Thanks,
> Prem
>
> On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri > wrote:
>
>> Can someone please suggest me , thanks
>>
>> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri,
>> wrote:
>>
>>>