Hi both, Thanks for your replies!
Sean, your proposal to use a driver-side future wrapping the blocking call sounds a lot easier indeed. But I want to ensure that canceling the future in the driver code kills the corresponding tasks on all executors. If I wrap the driver-side call in a standard Scala or Java future it will not be cancelable, will it? I think I would need to interrupt the thread that executes the future somehow. As you can see I am far from an expert on this topic, sorry if I misunderstood your proposal. Cheers, Antonin On 07/08/2020 19:53, Edward Mitchell wrote: > I will agree that the side effects of using Futures in driver code tend > to be tricky to track down. > > If you forget to clear the job description and job group information, > when the LocalProperties on the SparkContext remain intact - > SparkContext#submitJob makes sure to pass down the localProperties. > > This has led to us doing this hack: > > image.png > > This can also cause problems with Spark Streaming where the Streaming UI > can get messed up from the various streaming related properties set > getting cleared or re-used. > > On Fri, Aug 7, 2020 at 10:38 AM Sean Owen <sro...@gmail.com > <mailto:sro...@gmail.com>> wrote: > > Why do you need to do it, and can you just use a future in your > driver code? > > On Fri, Aug 7, 2020 at 9:01 AM Antonin Delpeuch (lists) > <li...@antonin.delpeuch.eu <mailto:li...@antonin.delpeuch.eu>> wrote: > > > > Hi all, > > > > Following my request on the user mailing list [1], there does not seem > > to be any simple way to save RDDs to the file system in an > asynchronous > > way. I am looking into implementing this, so I am first checking > whether > > there is consensus around the idea. > > > > The goal would be to add methods such as `saveAsTextFileAsync` and > > `saveAsObjectFileAsync` to the RDD API. > > > > I am thinking about doing this by: > > > > - refactoring SparkHadoopWriter to allow for submitting jobs > > asynchronously (with `submitJob` rather than `runJob`) > > > > - add a `saveAsHadoopFileAsync` method in `PairRDDFunctions`, > > counterpart to the existing `saveAsHadoopFile` > > > > - add a `saveAsTextFileAsync` (and other formats) in > `AsyncRDDActions`. > > > > Because SparkHadoopWriter is private, it is complicated to reimplement > > this functionality outside of Spark as a user, so I think this > would be > > an API worth offering. It should be possible to implement this without > > too much code duplication hopefully. > > > > Cheers, > > > > Antonin > > > > [1]: > > > > http://apache-spark-user-list.1001560.n3.nabble.com/Async-API-to-save-RDDs-td38320.html > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org