Follow up to my previous email regarding loading of external jars & Null Pointer Exception (NPE).
'*/usr/lib/zeppelin/local-repo' *doesn't exist for user 'hadoop' on master node. Is it supposed to? I created '*/var/lib/zeppelin/local-repo*', then '*ln -s /var/lib/zeppelin/local-repo /usr/lib/zeppelin/local-repo*'...but still getting NPE error. Any suggestions? Btw, in an unrelated topic, does zeppelin support a feature to email a user the output of a note? Like unix processes would return a status code, a zeppelin note can return at minimum true (success) or false (failure). On Sat, Dec 5, 2015 at 12:18 AM Work <jonathaka...@gmail.com> wrote: > 1. EMR does not currently provide anything like this for Zeppelin. (Good > idea though!) Zeppelin's built-in S3 notebook storage might help you, > especially if you turn on bucket versioning, I suppose, but I have not > tried this. > > 2. Yes, if you go to the ResourceManager on port 8088 then click the > ApplicationMaster link next to the Zeppelin app, you can get to the Spark > UI associated with the Zeppelin SparkContext (assuming you have first run a > notebook containing Spark code, otherwise the Zeppelin YARN app won't exist > yet). > > 3. Sorry, I have not tried using Zeppelin's notebook scheduler, but yes, > DataPipelines would probably provide you more reliability for production > batch ETL jobs. I don't know what your use case is, but maybe you could use > DataPipelines to generate some dataset that you store in S3 and can query > via Zeppelin? > > 4. This is a limitation of Zeppelin (really though, of Spark), not > specifically of Zeppelin on EMR, in that you must load any dependencies > before running any Spark code because the dependencies can only be loaded > once. However, once you solve this issue, you will run into a known issue > with Zeppelin on EMR where you hit a weird NPE that is caused by the > zeppelin user not having write access to /usr/lib/zeppelin/local-repo. I > would suggest creating /var/lib/zeppelin/local-repo then creating a symlink > from /usr/lib/zeppelin/local-repo to /var/lib/zeppelin/local-repo. We will > fix this in emr-4.3.0. > > ~ Jonathan > > — > Sent from Mailbox <https://www.dropbox.com/mailbox> > > > On Fri, Dec 4, 2015 at 11:18 PM, armen donigian <donig...@gmail.com> > wrote: > >> Hi all, >> Installed Zeppelin on Amazon EMR and it's running swell. Had a few >> questions... >> >> 1. How do we version control Zeppelin notes? >> >> 2. How do you check for status of a long running Zeppelin task? Is there >> a web UI for this or do you simply check the Resource Manager UI >> @master-node:8088 (in case of AWS)? >> >> 3. Are there any known issues/limitations of running Zeppelin note >> scheduler in production for batch ETL jobs? Trying to assess it vs Amazon >> Data Pipelines. >> >> 4. When trying to add an external jar, I'm getting this error. >> %dep >> z.reset() >> z.load("com.databricks:spark-redshift_2.10:0.5.2") >> Must be used before SparkInterpreter (%spark) initialized >> >> Thanks >> > >