I think you should add these notes to the JIRA note as it is not clear from the note itself. (sorry that this is not helping solving the problem itself :-))
On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <oph...@gmail.com> wrote: > It does not happen in local mode. > Actually whenever it works in the same process it works great. > It looks that somehow Zeppelin jar does not distributed into the nodes. > Still, it strange as register UDF and the UDF itslef does not need > ZeppelinContext (at least not explicitly). > > And yes, filterdNc is a local table, I just use it to enable me call the > UDF. you can try that on any table. > > On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <goi....@gmail.com> wrote: > >> Does this happen on a local mode as well or just on external cluster? >> with regard to the repro - %sql select getNum() from filteredNc limit 1 >> I guess, filterdNc is some table you have? cause when I tried it on my >> local machine I got : >> no such table filteredNc; line 1 pos 21 >> Eran >> >> On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <oph...@gmail.com> wrote: >> >>> Thank you Moon. >>> Here is the link: >>> https://issues.apache.org/jira/browse/ZEPPELIN-150 >>> >>> Please let me know how can I help further more. >>> >>> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <m...@apache.org> wrote: >>> >>>> Really appreciate for sharing the problem. >>>> Very interesting. Do you mind file a issue on JIRA? >>>> >>>> Best, >>>> moon >>>> >>>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote: >>>> >>>>> BTW, this isn't working as well: >>>>> >>>>> >>>>> >>>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit >>>>> 10")val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema) >>>>> sidNameDF2.registerTempTable("tmp_sid_name2")* >>>>> >>>>> >>>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>>> >>>>>> I've made some progress in this issue and I think it's a bug... >>>>>> >>>>>> Apparently, when trying to use registered UDFs on tables that comes >>>>>> from Hive - it returns the above exception (*ClassNotFoundException: >>>>>> org.apache.zeppelin.spark.ZeppelinContext*). >>>>>> When create new table and register it - UDFs works as expected. >>>>>> You can see below to full details and example. >>>>>> >>>>>> Can someone tell if it's the expected behavior or a bug? >>>>>> BTW >>>>>> I don't mind to work on that bug - if you can give a pointer to the >>>>>> right places. >>>>>> >>>>>> BTW2 >>>>>> Trying to register the SAME DataFrame as tempTable does not solve the >>>>>> problem - only creating new table out of new DataFrame (see below). >>>>>> >>>>>> >>>>>> *Detailed example* >>>>>> 1. I have table in Hive called '*hive_table*' with string field >>>>>> called *'name'* and int filed called *'sid'* >>>>>> >>>>>> 2. I registered a udf: >>>>>> *def getStr(str: String) = str + "_str"* >>>>>> *hc.udf.register("getStr", getStr _)* >>>>>> >>>>>> 3. Running the following on Zeppelin: >>>>>> *%sql select getStr(name), * from** hive_table* >>>>>> yields with excpetion: >>>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* >>>>>> >>>>>> 4. Creating new table, as follows: >>>>>> *case class SidName(sid: Int, name: String)* >>>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit >>>>>> 10").collectAsList().map(row => new SidName(row.getInt(0), >>>>>> row.getString(1)))* >>>>>> *val sidNameDF = hc.createDataFrame(sidNameList)* >>>>>> *sidNameDF.registerTempTable("tmp_sid_name")* >>>>>> >>>>>> 5. Query the new table in the same fashion: >>>>>> *%sql select getStr(name), * from tmp_sid_name* >>>>>> >>>>>> This time I get the expected results! >>>>>> >>>>>> >>>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> BTW >>>>>>> The same query, on the same cluster but on Spark shell return the >>>>>>> expected results. >>>>>>> >>>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes, >>>>>>>> though I can't understand why it needed for the UDF. >>>>>>>> >>>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks for the response, >>>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed. >>>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is >>>>>>>>> different name to z.sqlContext). >>>>>>>>> >>>>>>>>> I get the same results. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Ophir, >>>>>>>>>> >>>>>>>>>> Can you try below? >>>>>>>>>> >>>>>>>>>> def getNum(): Int = { >>>>>>>>>> 100 >>>>>>>>>> } >>>>>>>>>> sqlc.udf.register("getNum", getNum _) >>>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>>>>>>>> >>>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>>>>>>>> and use hiveContext as sqlContext by default. >>>>>>>>>> (If you did not change useHiveContext to be "false" in >>>>>>>>>> interpreter menu.) >>>>>>>>>> >>>>>>>>>> Hope it helps. >>>>>>>>>> >>>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Guys? >>>>>>>>>>> Somebody? >>>>>>>>>>> Can it be that Zeppelin does not support UDFs? >>>>>>>>>>> >>>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Guys, >>>>>>>>>>>> One more problem I have encountered using Zeppelin. >>>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>>>>>>>> >>>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == >>>>>>>>>>>> HiveContext): >>>>>>>>>>>> 1. Create and register the UDF: >>>>>>>>>>>> def getNum(): Int = { >>>>>>>>>>>> 100 >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> hc.udf.register("getNum",getNum _) >>>>>>>>>>>> 2. And I try to use on exist table: >>>>>>>>>>>> %sql select getNum() from filteredNc limit 1 >>>>>>>>>>>> >>>>>>>>>>>> Or: >>>>>>>>>>>> 3. Trying using direct hc: >>>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>>>>>>>> >>>>>>>>>>>> Both of them yield with >>>>>>>>>>>> *"java.lang.ClassNotFoundException: >>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>>>>>>>> (see below the full exception). >>>>>>>>>>>> >>>>>>>>>>>> And my questions is: >>>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark >>>>>>>>>>>> nodes? >>>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>>>>>>>> >>>>>>>>>>>> The exception: >>>>>>>>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID >>>>>>>>>>>> 1626, >>>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>>>>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>>>>>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>>>>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>>>>>>>> at >>>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>>>>>>>> >>>>>>>>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>>>>>>>> >>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>>>>> ... 103 more >>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>>>>>>>> ... 105 more >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >