Is there a way you can identify those patterns in a file or in its name and then just tackle them in separate jobs? I use the function input_file_name() to find the name of input file of each record and then filter out certain files.
Regards, Gourav On Wed, Jul 10, 2019 at 6:47 AM Wei Chen <weic...@apache.org> wrote: > Hello All, > > I am using spark to process some files parallelly. > While most files are able to be processed within 3 seconds, > it is possible that we stuck on 1 or 2 files as they will never finish (or > will take more than 48 hours). > Since it is a 3rd party file conversion tool, we are not able to debug why > the converter stuck at the time. > > Is it possible that we set a timeout for our process, throw exceptions for > those tasks, > while still continue with other successful tasks? > > Best Regards > Wei >