Hi,
While testing a use case where the query had an outer join such that
joining key of left outer table either had a valid value or a random value(
salting to avoid skew).
The case was reported to have incorrect results in case of node failure,
with retry.
On debugging the code, have found followi
That's me. It's not anywhere yet and it's WIP as mentioned in the talk. I'm
still dealing with its design.
On Sat, Jan 25, 2025 at 1:00 AM Deependra Patel
wrote:
> Hi all,
> There are ways through the `addArtifacts` API in an existing session but
> for that we need to have dependencies properly
Ok so the catalyst optimizer will use this method of inline key counting to
provide spark optimizer with prior notification, so it identifies the hot
keys? What is this inline key counting based? Likely Count-Min Sketch
algorithm!
HTH
Mich Talebzadeh,
Architect | Data Science | Financial Crime |
Hi Spark devs,
I recently worked on a prototype to make it easier to identify the root
cause of data skew in Spark. I wanted to see if the community was
interested in it before working on contributing the changes (SPIP and PRs).
*Problem*
When a query has data skew today, you see outlier tasks ta
Hi, All.
SPARK-49700 landed one hour ago.
Since this is another huge package redesign across 399 files in Spark 4.0,
please check if you are not affected accidentally.
Best Regards,
Dongjoon.
Hi all,
There are ways through the `addArtifacts` API in an existing session but
for that we need to have dependencies properly gzipped. In the case of
different kernel/OS between client and server, it won't work either I
believe. What I am interested in is doing some sort of `pip install
https://y