Hi Jongyoul- I followed the exact same steps for compiling and setting up the new build from source as 0.5.6 (only difference is, I acquired the source for latest build using "git clone")
hive-site.xml was copied to conf directory. But, the spark interpreter is not talking to the hive metastore. Both the 0.5.6 & the latest builds are running in the same machine. In 0.5.6 when i run the below command, I see 116 databases listed, as per my expectations and I'm able to run my notebooks built on those databases. [image: Inline image 1] Thanks, Pradeep On Wed, Aug 31, 2016 at 2:52 AM, Jongyoul Lee <jongy...@gmail.com> wrote: > Hello, > > Do you copy your hive-site.xml in a proper position? > > On Wed, Aug 31, 2016 at 3:52 PM, Pradeep Reddy < > pradeepreddy.a...@gmail.com> wrote: > >> nothing obvious. I will stick to 0.5.6 build, until the latest builds >> stabilize. >> >> On Wed, Aug 31, 2016 at 1:39 AM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> Then I guess maybe you are connecting to different database. Why not >>> using 'z.show(sql("databases"))' to display the databases ? Then you >>> will get a hint what's going on. >>> >>> On Wed, Aug 31, 2016 at 2:36 PM, Pradeep Reddy < >>> pradeepreddy.a...@gmail.com> wrote: >>> >>>> Yes...I didn't wish to show the names of the databases that we have in >>>> our data lake on that screen shot. so thats why I chose to display the >>>> count. The latest zeppelin build just shows 1 count which is "default" >>>> database. >>>> >>>> Thanks, >>>> Pradeep >>>> >>>> On Wed, Aug 31, 2016 at 1:33 AM, Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> 116 is the databases count number. Do you expect a list of database ? >>>>> then you need to use 'z.show(sql("databases"))' >>>>> >>>>> On Wed, Aug 31, 2016 at 2:26 PM, Pradeep Reddy < >>>>> pradeepreddy.a...@gmail.com> wrote: >>>>> >>>>>> Here it is Jeff >>>>>> >>>>>> [image: Inline image 1] >>>>>> >>>>>> On Wed, Aug 31, 2016 at 1:24 AM, Jeff Zhang <zjf...@gmail.com> wrote: >>>>>> >>>>>>> Hi Pradeep, >>>>>>> >>>>>>> I don't see the databases on your screenshot (second one for 0.5.6). >>>>>>> I think the output is correct. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 31, 2016 at 12:55 PM, Pradeep Reddy < >>>>>>> pradeepreddy.a...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Jeff- I was able to make Kerberos work in 0.5.6 zeppelin build. >>>>>>>> It seems like Kerberos not working & spark not able to talk to the >>>>>>>> shared >>>>>>>> hive meta store are defects in the current build. >>>>>>>> >>>>>>>> On Tue, Aug 30, 2016 at 11:09 PM, Pradeep Reddy < >>>>>>>> pradeepreddy.a...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jeff- >>>>>>>>> >>>>>>>>> I switched to local mode now, I'm able to summon the implicit >>>>>>>>> objects like sc, sqlContext etc., but it doesn't show my databases & >>>>>>>>> tables, just shows 1 database "default". >>>>>>>>> >>>>>>>>> Zeppelin Latest Build >>>>>>>>> >>>>>>>>> [image: Inline image 3] >>>>>>>>> >>>>>>>>> Zeppelin 0.5.6, running on the same machine, is able to show my >>>>>>>>> databases and tables. >>>>>>>>> >>>>>>>>> [image: Inline image 4] >>>>>>>>> >>>>>>>>> On Tue, Aug 30, 2016 at 8:20 PM, Jeff Zhang <zjf...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> > the spark interpreter is not showing my tables & databases, >>>>>>>>>> may be its running in an isolated mode... I'm just getting empty >>>>>>>>>> list, so I >>>>>>>>>> attempted to do kerberos authentication to workaround that issue, and >>>>>>>>>> bumped into this road block. >>>>>>>>>> >>>>>>>>>> kerberos would not help here, actually I think it would make the >>>>>>>>>> problem more complicated. You need to first check the log why you >>>>>>>>>> get >>>>>>>>>> empty list. >>>>>>>>>> >>>>>>>>>> On Wed, Aug 31, 2016 at 8:56 AM, Pradeep Reddy < >>>>>>>>>> pradeepreddy.a...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Jeff- I was also successfully able to run spark shell, after >>>>>>>>>>> running kdestroy, with the below command and was able to get to my >>>>>>>>>>> hive >>>>>>>>>>> tables. >>>>>>>>>>> >>>>>>>>>>> spark-shell --conf spark.yarn.keytab=$HOME/pradeep.x.alla.keytab >>>>>>>>>>> --conf spark.yarn.principal=pradeep.x.alla --deploy-mode client >>>>>>>>>>> --master yarn --queue <QUEUE_NAME> >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 30, 2016 at 7:34 PM, Pradeep Reddy < >>>>>>>>>>> pradeepreddy.a...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Jeff..I have always used zeppelin in local mode, but >>>>>>>>>>>> when I migrated from 0.5.6 to this version, the spark interpreter >>>>>>>>>>>> is not >>>>>>>>>>>> showing my tables & databases, may be its running in an isolated >>>>>>>>>>>> mode... >>>>>>>>>>>> I'm just getting empty list, so I attempted to do kerberos >>>>>>>>>>>> authentication >>>>>>>>>>>> to workaround that issue, and bumped into this road block. >>>>>>>>>>>> >>>>>>>>>>>> Below is the configuration, I also tested my keytab file and >>>>>>>>>>>> its working fine. >>>>>>>>>>>> >>>>>>>>>>>> *Kerberos test:* >>>>>>>>>>>> $ kdestroy >>>>>>>>>>>> >>>>>>>>>>>> $ klist >>>>>>>>>>>> *klist: No credentials cache found (ticket cache >>>>>>>>>>>> FILE:/tmp/krb5cc_12027)* >>>>>>>>>>>> >>>>>>>>>>>> $ kinit -kt pradeep_x_alla.keytab -V pradeep.x.alla >>>>>>>>>>>> *Using default cache: /tmp/krb5cc_12027* >>>>>>>>>>>> *Using principal: pradeep.x.alla@<DOMAIN1>* >>>>>>>>>>>> *Using keytab: pradeep_x_alla.keytab* >>>>>>>>>>>> *Authenticated to Kerberos v5* >>>>>>>>>>>> >>>>>>>>>>>> $ klist >>>>>>>>>>>> *Ticket cache: FILE:/tmp/krb5cc_12027* >>>>>>>>>>>> *Default principal: pradeep.x.alla@<DOMAIN1>* >>>>>>>>>>>> >>>>>>>>>>>> *Valid starting Expires Service principal* >>>>>>>>>>>> *08/30/16 20:25:19 08/31/16 06:25:19 >>>>>>>>>>>> krbtgt/<DOMAIN1>@<DOMAIN1>* >>>>>>>>>>>> * renew until 08/31/16 20:25:19* >>>>>>>>>>>> >>>>>>>>>>>> *zeppelin-env.sh* >>>>>>>>>>>> >>>>>>>>>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf:/etc/hive/conf >>>>>>>>>>>> export SPARK_HOME=/opt/cloudera/parce >>>>>>>>>>>> ls/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark >>>>>>>>>>>> export SPARK_SUBMIT_OPTIONS="--deploy-mode client --master >>>>>>>>>>>> yarn --num-executors 2 --executor-memory 2g --queue <QUEUE_NAME>" >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Interpreter.json (Spark interpreter config)* >>>>>>>>>>>> "2BUTFVN89": { >>>>>>>>>>>> "id": "2BUTFVN89", >>>>>>>>>>>> "name": "spark", >>>>>>>>>>>> "group": "spark", >>>>>>>>>>>> "properties": { >>>>>>>>>>>> "spark.cores.max": "", >>>>>>>>>>>> "zeppelin.spark.printREPLOutput": "true", >>>>>>>>>>>> "master": "yarn-client", >>>>>>>>>>>> "zeppelin.spark.maxResult": "1000", >>>>>>>>>>>> "zeppelin.dep.localrepo": "local-repo", >>>>>>>>>>>> "spark.app.name": "Zeppelin", >>>>>>>>>>>> "spark.executor.memory": "", >>>>>>>>>>>> "zeppelin.spark.importImplicit": "true", >>>>>>>>>>>> "zeppelin.spark.sql.stacktrace": "true", >>>>>>>>>>>> "zeppelin.spark.useHiveContext": "true", >>>>>>>>>>>> "zeppelin.interpreter.localRepo": >>>>>>>>>>>> "/home/pradeep.x.alla/zeppelin/local-repo/2BUTFVN89", >>>>>>>>>>>> "zeppelin.spark.concurrentSQL": "false", >>>>>>>>>>>> "args": "", >>>>>>>>>>>> "zeppelin.pyspark.python": "python", >>>>>>>>>>>> "spark.yarn.keytab": "/home/pradeep.x.alla/pradeep. >>>>>>>>>>>> x.alla.keytab", >>>>>>>>>>>> "spark.yarn.principal": "pradeep.x.alla", >>>>>>>>>>>> "zeppelin.dep.additionalRemoteRepository": >>>>>>>>>>>> "spark-packages,http://dl.bintray.com/spark-packages/maven,f >>>>>>>>>>>> alse;" >>>>>>>>>>>> }, >>>>>>>>>>>> "status": "READY", >>>>>>>>>>>> "interpreterGroup": [ >>>>>>>>>>>> { >>>>>>>>>>>> "name": "spark", >>>>>>>>>>>> "class": "org.apache.zeppelin.spark.Spa >>>>>>>>>>>> rkInterpreter", >>>>>>>>>>>> "defaultInterpreter": true >>>>>>>>>>>> }, >>>>>>>>>>>> { >>>>>>>>>>>> "name": "sql", >>>>>>>>>>>> "class": "org.apache.zeppelin.spark.Spa >>>>>>>>>>>> rkSqlInterpreter", >>>>>>>>>>>> "defaultInterpreter": false >>>>>>>>>>>> }, >>>>>>>>>>>> { >>>>>>>>>>>> "name": "dep", >>>>>>>>>>>> "class": "org.apache.zeppelin.spark.DepInterpreter", >>>>>>>>>>>> "defaultInterpreter": false >>>>>>>>>>>> }, >>>>>>>>>>>> { >>>>>>>>>>>> "name": "pyspark", >>>>>>>>>>>> "class": "org.apache.zeppelin.spark.PyS >>>>>>>>>>>> parkInterpreter", >>>>>>>>>>>> "defaultInterpreter": false >>>>>>>>>>>> } >>>>>>>>>>>> ], >>>>>>>>>>>> "dependencies": [], >>>>>>>>>>>> "option": { >>>>>>>>>>>> "remote": true, >>>>>>>>>>>> "port": -1, >>>>>>>>>>>> "perNoteSession": false, >>>>>>>>>>>> "perNoteProcess": false, >>>>>>>>>>>> "isExistingProcess": false, >>>>>>>>>>>> "setPermission": false, >>>>>>>>>>>> "users": [] >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:52 PM, Jeff Zhang <zjf...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> It looks like your kerberos configuration issue. Do you mind >>>>>>>>>>>>> to share your configuration ? Or you can first try to run >>>>>>>>>>>>> spark-shell using >>>>>>>>>>>>> spark.yarn.keytab & spark.yarn.principle to verify them. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Aug 31, 2016 at 6:12 AM, Pradeep Reddy < >>>>>>>>>>>>> pradeepreddy.a...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi- I recently built zeppelin from source and configured >>>>>>>>>>>>>> kerberos authentication. For Kerberos I added >>>>>>>>>>>>>> "spark.yarn.keytab" & >>>>>>>>>>>>>> "spark.yarn.principal" and also set master to "yarn-client". >>>>>>>>>>>>>> But I keep >>>>>>>>>>>>>> getting this error whenever I use spark interpreter in the >>>>>>>>>>>>>> notebook >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3536728 started by scheduler org.apache.zeppelin.spark.Spar >>>>>>>>>>>>>> kInterpreter335845091 >>>>>>>>>>>>>> ERROR [2016-08-30 17:45:37,237] ({pool-2-thread-2} >>>>>>>>>>>>>> Job.java[run]:189) - Job failed >>>>>>>>>>>>>> java.lang.IllegalArgumentException: Invalid rule: L >>>>>>>>>>>>>> RULE:[2:$1@$0](.*@\Q<DOMAIN1>.COM\E$)s/@\Q<DOMAIN1>\E$//L >>>>>>>>>>>>>> RULE:[1:$1@$0](.*@\Q<DOMAIN2>\E$)s/@\Q<DOMAIN2>\E$//L >>>>>>>>>>>>>> RULE:[2:$1@$0](.*@\Q<DOMAIN2>\E$)s/@\Q<DOMAIN2>\E$//L >>>>>>>>>>>>>> DEFAULT >>>>>>>>>>>>>> at org.apache.hadoop.security.aut >>>>>>>>>>>>>> hentication.util.KerberosName.parseRules(KerberosName.java:3 >>>>>>>>>>>>>> 21) >>>>>>>>>>>>>> at org.apache.hadoop.security.aut >>>>>>>>>>>>>> hentication.util.KerberosName.setRules(KerberosName.java:386) >>>>>>>>>>>>>> at org.apache.hadoop.security.Had >>>>>>>>>>>>>> oopKerberosName.setConfiguration(HadoopKerberosName.java:75) >>>>>>>>>>>>>> at org.apache.hadoop.security.Use >>>>>>>>>>>>>> rGroupInformation.initialize(UserGroupInformation.java:227) >>>>>>>>>>>>>> at org.apache.hadoop.security.Use >>>>>>>>>>>>>> rGroupInformation.ensureInitialized(UserGroupInformation.jav >>>>>>>>>>>>>> a:214) >>>>>>>>>>>>>> at org.apache.hadoop.security.Use >>>>>>>>>>>>>> rGroupInformation.isAuthenticationMethodEnabled(UserGroupInf >>>>>>>>>>>>>> ormation.java:275) >>>>>>>>>>>>>> at org.apache.hadoop.security.Use >>>>>>>>>>>>>> rGroupInformation.isSecurityEnabled(UserGroupInformation.jav >>>>>>>>>>>>>> a:269) >>>>>>>>>>>>>> at org.apache.hadoop.security.Use >>>>>>>>>>>>>> rGroupInformation.loginUserFromKeytab(UserGroupInformation.j >>>>>>>>>>>>>> ava:820) >>>>>>>>>>>>>> at org.apache.zeppelin.spark.Spar >>>>>>>>>>>>>> kInterpreter.open(SparkInterpreter.java:539) >>>>>>>>>>>>>> at org.apache.zeppelin.interprete >>>>>>>>>>>>>> r.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) >>>>>>>>>>>>>> at org.apache.zeppelin.interprete >>>>>>>>>>>>>> r.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>>>>>>>>>>>> at org.apache.zeppelin.interprete >>>>>>>>>>>>>> r.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteI >>>>>>>>>>>>>> nterpreterServer.java:383) >>>>>>>>>>>>>> at org.apache.zeppelin.scheduler. >>>>>>>>>>>>>> Job.run(Job.java:176) >>>>>>>>>>>>>> at org.apache.zeppelin.scheduler. >>>>>>>>>>>>>> FIFOScheduler$1.run(FIFOScheduler.java:139) >>>>>>>>>>>>>> at java.util.concurrent.Executors >>>>>>>>>>>>>> $RunnableAdapter.call(Executors.java:511) >>>>>>>>>>>>>> at java.util.concurrent.FutureTas >>>>>>>>>>>>>> k.run(FutureTask.java:266) >>>>>>>>>>>>>> at java.util.concurrent.Scheduled >>>>>>>>>>>>>> ThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledT >>>>>>>>>>>>>> hreadPoolExecutor.java:180) >>>>>>>>>>>>>> at java.util.concurrent.Scheduled >>>>>>>>>>>>>> ThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPo >>>>>>>>>>>>>> olExecutor.java:293) >>>>>>>>>>>>>> at java.util.concurrent.ThreadPoo >>>>>>>>>>>>>> lExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>>>>>>>>> at java.util.concurrent.ThreadPoo >>>>>>>>>>>>>> lExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>>>>>>>>> INFO [2016-08-30 17:45:37,247] ({pool-2-thread-2} >>>>>>>>>>>>>> SchedulerFactory.java[jobFinished]:137) - Job >>>>>>>>>>>>>> remoteInterpretJob_1472593536728 finished by scheduler >>>>>>>>>>>>>> org.apache.zeppelin.spark.SparkInterpreter335845091 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Pradeep >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best Regards >>>>>>>>>>>>> >>>>>>>>>>>>> Jeff Zhang >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards >>>>>>>>>> >>>>>>>>>> Jeff Zhang >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards >>>>>>> >>>>>>> Jeff Zhang >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net >