Re: Zeppelin notes version control, scheduler and external deps

armen donigian Sat, 05 Dec 2015 10:45:07 -0800

Follow up to my previous email regarding loading of external jars & Null
Pointer Exception (NPE).


'*/usr/lib/zeppelin/local-repo' *doesn't exist for user 'hadoop' on master
node. Is it supposed to?
I created '*/var/lib/zeppelin/local-repo*', then '*ln -s
/var/lib/zeppelin/local-repo /usr/lib/zeppelin/local-repo*'...but still
getting NPE error. Any suggestions?

Btw, in an unrelated topic, does zeppelin support a feature to email a user
the output of a note? Like unix processes would return a status code, a
zeppelin note can return at minimum true (success) or false (failure).


On Sat, Dec 5, 2015 at 12:18 AM Work <jonathaka...@gmail.com> wrote:

> 1. EMR does not currently provide anything like this for Zeppelin. (Good
> idea though!) Zeppelin's built-in S3 notebook storage might help you,
> especially if you turn on bucket versioning, I suppose, but I have not
> tried this.
>
> 2. Yes, if you go to the ResourceManager on port 8088 then click the
> ApplicationMaster link next to the Zeppelin app, you can get to the Spark
> UI associated with the Zeppelin SparkContext (assuming you have first run a
> notebook containing Spark code, otherwise the Zeppelin YARN app won't exist
> yet).
>
> 3. Sorry, I have not tried using Zeppelin's notebook scheduler, but yes,
> DataPipelines would probably provide you more reliability for production
> batch ETL jobs. I don't know what your use case is, but maybe you could use
> DataPipelines to generate some dataset that you store in S3 and can query
> via Zeppelin?
>
> 4. This is a limitation of Zeppelin (really though, of Spark), not
> specifically of Zeppelin on EMR, in that you must load any dependencies
> before running any Spark code because the dependencies can only be loaded
> once. However, once you solve this issue, you will run into a known issue
> with Zeppelin on EMR where you hit a weird NPE that is caused by the
> zeppelin user not having write access to /usr/lib/zeppelin/local-repo. I
> would suggest creating /var/lib/zeppelin/local-repo then creating a symlink
> from /usr/lib/zeppelin/local-repo to /var/lib/zeppelin/local-repo. We will
> fix this in emr-4.3.0.
>
> ~ Jonathan
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Fri, Dec 4, 2015 at 11:18 PM, armen donigian <donig...@gmail.com>
> wrote:
>
>> Hi all,
>> Installed Zeppelin on Amazon EMR and it's running swell. Had a few
>> questions...
>>
>> 1. How do we version control Zeppelin notes?
>>
>> 2. How do you check for status of a long running Zeppelin task? Is there
>> a web UI for this or do you simply check the Resource Manager UI
>> @master-node:8088 (in case of AWS)?
>>
>> 3. Are there any known issues/limitations of running Zeppelin note
>> scheduler in production for batch ETL jobs? Trying to assess it vs Amazon
>> Data Pipelines.
>>
>> 4. When trying to add an external jar, I'm getting this error.
>> %dep
>> z.reset()
>> z.load("com.databricks:spark-redshift_2.10:0.5.2")
>> Must be used before SparkInterpreter (%spark) initialized
>>
>> Thanks
>>
>
>

Re: Zeppelin notes version control, scheduler and external deps

Reply via email to