Thanks for the reply!
To clarify, for issue 2, it could still break apart a query into multiple jobs
without AQE — I have turned off the AQE in my posted example.
For 1, an end user just needs to turn on/off a knob to use the stage-level
scheduling for Spark SQL — I am considering adding a comp
see the original SPIP for as to why we only support RDD:
https://issues.apache.org/jira/browse/SPARK-27495
The main problem is exactly what you are referring to. The RDD level is not
exposed to the user when using SQL or Dataframe API. This is on purpose and
user shouldn't have to know anythin
Thanks for the clarification Tom!
A bit more backgrounds for what we want to do: we have proposed a fine-grained
(stage-level) resource optimization approach in VLDB22
https://www.vldb.org/pvldb/vol15/p3098-lyu.pdf and would like to try it over
Spark. Our approach can recommend the resource con
+1 to doc, seed argument would be great if possible
From: Sean Owen
Sent: Monday, September 26, 2022 5:26:26 PM
To: Nicholas Gustafson
Cc: dev
Subject: Re: Why are hash functions seeded with 42?
Oh yeah I get why we love to pick 42 for random things. I'm guessin