Ted, Thanks very much for your reply. It took me almost a week but I have finally had a chance to implement what you noted and it appears to be working locally. However, when I launch this onto a cluster on EC2 -- this doesn't work reliably.
To expand, I think the issue is that some of the code we have takes the python GIL and hence no internal timeout will work. That is why I was hoping to learn of a task level timeout -- something at the Spark level -- the management level -- such that it can decide a task has taken to long and just kill it and move on. Does this make sense? Are you familiar with any such options? Best, - Bill On Sat, Jun 27, 2015 at 9:26 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you looked at: > > http://stackoverflow.com/questions/2281850/timeout-function-if-it-takes-too-long-to-finish > > FYI > > On Sat, Jun 27, 2015 at 8:33 AM, wasauce <wferr...@gmail.com> wrote: > >> Hello! >> >> We use pyspark to run a set of data extractors (think regex). The >> extractors >> (regexes) generally run quite quickly and find a few matches which are >> returned and stored into a database. >> >> My question is -- is it possible to make the function that runs the >> extractors have a timeout? I.E. if for a given file the extractor runs for >> more than X seconds it terminates and returns a default value? >> >> Here is a code snippet of what we are doing with some comments as to which >> function I am looking to timeout. >> >> code: https://gist.github.com/wasauce/42a956a1371a2b564918 >> >> Thank you >> >> - Bill >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-timeout-a-task-tp23513.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >