I'm sending a job to Aurora that consists of multiple processes. One of them fails, failing the job. I'd like to get the return code for the failed process.
Going to the Thermos observer on the slave that runs the job tells me which process failed, but there's no return code specified either in the table or the JSON. However, the thermos_runner.INFO logfile in the root of the sandbox contains this line: runner.py:139] Process(my_failing_ps) failed [rc=1] So Thermos, at some point, knows that my process failed with return code 1, but it's not making it back up to either the web or JSON interfaces. If it helps, the failed process has a start_time field, but is missing a stop_time. Any clues? Hussein Elgridly Senior Software Engineer, DSDE The Broad Institute of MIT and Harvard