I'm using Amazon EMR 5.0.0 and am having real difficulties with paragraphs 
aborting.  I have a number of paragraphs that I want to run daily in sequence.  
If I run them with the scheduler a minute apart in sequence I think that I get 
this problem: https://issues.apache.org/jira/browse/ZEPPELIN-1480

So to work around this I have added a paragraph at the end of each notebook to 
call the REST API to trigger execution of the next notebook.  The first 
notebook runs fine (except the "Last updated" doesn't update?), but then the 
paragraphs in the second notebook fail in "ABORT" state.  It doesn't always 
abort at the same point, sometimes in one paragraph and sometimes another.  
Below are some log excerpts with different interpreter settings, which all 
exhibit the same ABORT behaviour - but I see different errors for each.

Shared Interpreter for note:
Zeppelin Log: http://hastebin.com/raw/zerexepozu
Spark Interpreter Log: http://hastebin.com/raw/urowuvodat

Scoped Interpreter for note:
Zeppelin Log: http://hastebin.com/raw/leraxaleze
Spark Interpreter Log: http://hastebin.com/raw/gizepuyaqi

Isolated Interpreter for note:
Zeppelin Log: http://hastebin.com/raw/yebudaloze
Spark Interpreter Log: http://hastebin.com/raw/etuqaxejis

I think https://issues.apache.org/jira/browse/ZEPPELIN-1270 is causing some of 
those errors in the logs, but I don't know to what extent the "ABORT" problems 
are related to that.  Note that if I run these using the "Run all paragraphs" 
button then this problem only sometimes occurs.  I've checked the container 
logs and there's nothing obvious in there.  It doesn't appear to be a memory 
issue that I can see.

Anyone got any ideas at all because I'm stumped!?  Thanks!

Jonathan

Reply via email to