I am currently trying to use Future Await to set a timeout inside the map-reduce. However, the tasks now fail instead of stuck, even if I have a Try Match to catch it. Doesn't anyone have an idea why?
The code is like ```Scala files.map { file => Try { def tmpFunc(): Boolean = { FILE CONVERTION ON HDFS } val tmpFuture = Future[Boolean] { tmpFunc() } Await.result(tmpFuture, 600 seconds) } match { case Failure(e) => "F" case Success(r) => "S" } } ``` The converter is created in a lazy function in a broadcast object, which shouldn't be a problem. Best Regards Wei On Wed, Jul 10, 2019 at 3:16 PM Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Is there a way you can identify those patterns in a file or in its name > and then just tackle them in separate jobs? I use the function > input_file_name() to find the name of input file of each record and then > filter out certain files. > > Regards, > Gourav > > On Wed, Jul 10, 2019 at 6:47 AM Wei Chen <weic...@apache.org> wrote: > >> Hello All, >> >> I am using spark to process some files parallelly. >> While most files are able to be processed within 3 seconds, >> it is possible that we stuck on 1 or 2 files as they will never finish >> (or will take more than 48 hours). >> Since it is a 3rd party file conversion tool, we are not able to debug >> why the converter stuck at the time. >> >> Is it possible that we set a timeout for our process, throw exceptions >> for those tasks, >> while still continue with other successful tasks? >> >> Best Regards >> Wei >> >