BTW, this isn't working as well:
*val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema) sidNameDF2.registerTempTable("tmp_sid_name2")* On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote: > I've made some progress in this issue and I think it's a bug... > > Apparently, when trying to use registered UDFs on tables that comes from > Hive - it returns the above exception (*ClassNotFoundException: > org.apache.zeppelin.spark.ZeppelinContext*). > When create new table and register it - UDFs works as expected. > You can see below to full details and example. > > Can someone tell if it's the expected behavior or a bug? > BTW > I don't mind to work on that bug - if you can give a pointer to the right > places. > > BTW2 > Trying to register the SAME DataFrame as tempTable does not solve the > problem - only creating new table out of new DataFrame (see below). > > > *Detailed example* > 1. I have table in Hive called '*hive_table*' with string field called > *'name'* and int filed called *'sid'* > > 2. I registered a udf: > *def getStr(str: String) = str + "_str"* > *hc.udf.register("getStr", getStr _)* > > 3. Running the following on Zeppelin: > *%sql select getStr(name), * from** hive_table* > yields with excpetion: > *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* > > 4. Creating new table, as follows: > *case class SidName(sid: Int, name: String)* > *val sidNameList = hc.sql("select sid, name from hive_table limit > 10").collectAsList().map(row => new SidName(row.getInt(0), > row.getString(1)))* > *val sidNameDF = hc.createDataFrame(sidNameList)* > *sidNameDF.registerTempTable("tmp_sid_name")* > > 5. Query the new table in the same fashion: > *%sql select getStr(name), * from tmp_sid_name* > > This time I get the expected results! > > > On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote: > >> BTW >> The same query, on the same cluster but on Spark shell return the >> expected results. >> >> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote: >> >>> It looks that Zeppelin jar does not distributed to Spark nodes, though I >>> can't understand why it needed for the UDF. >>> >>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote: >>> >>>> Thanks for the response, >>>> I'm not sure what do you mean, it exactly what I tried and failed. >>>> As I wrote above, 'hc' is actually different name to sqlc (that is >>>> different name to z.sqlContext). >>>> >>>> I get the same results. >>>> >>>> >>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote: >>>> >>>>> Hi Ophir, >>>>> >>>>> Can you try below? >>>>> >>>>> def getNum(): Int = { >>>>> 100 >>>>> } >>>>> sqlc.udf.register("getNum", getNum _) >>>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>>> >>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>>> and use hiveContext as sqlContext by default. >>>>> (If you did not change useHiveContext to be "false" in interpreter >>>>> menu.) >>>>> >>>>> Hope it helps. >>>>> >>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>>> >>>>>> Guys? >>>>>> Somebody? >>>>>> Can it be that Zeppelin does not support UDFs? >>>>>> >>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Guys, >>>>>>> One more problem I have encountered using Zeppelin. >>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>>> >>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == HiveContext): >>>>>>> 1. Create and register the UDF: >>>>>>> def getNum(): Int = { >>>>>>> 100 >>>>>>> } >>>>>>> >>>>>>> hc.udf.register("getNum",getNum _) >>>>>>> 2. And I try to use on exist table: >>>>>>> %sql select getNum() from filteredNc limit 1 >>>>>>> >>>>>>> Or: >>>>>>> 3. Trying using direct hc: >>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>>> >>>>>>> Both of them yield with >>>>>>> *"java.lang.ClassNotFoundException: >>>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>>> (see below the full exception). >>>>>>> >>>>>>> And my questions is: >>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes? >>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>>> >>>>>>> The exception: >>>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, >>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>>> at >>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>>> >>>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>>> >>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>> at >>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>> ... 103 more >>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>>> at >>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>> at >>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>> at >>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>>> at >>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>>> ... 105 more >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >