Really appreciate for sharing the problem. Very interesting. Do you mind file a issue on JIRA?
Best, moon On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote: > BTW, this isn't working as well: > > > > *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val > sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema) > sidNameDF2.registerTempTable("tmp_sid_name2")* > > > On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote: > >> I've made some progress in this issue and I think it's a bug... >> >> Apparently, when trying to use registered UDFs on tables that comes from >> Hive - it returns the above exception (*ClassNotFoundException: >> org.apache.zeppelin.spark.ZeppelinContext*). >> When create new table and register it - UDFs works as expected. >> You can see below to full details and example. >> >> Can someone tell if it's the expected behavior or a bug? >> BTW >> I don't mind to work on that bug - if you can give a pointer to the right >> places. >> >> BTW2 >> Trying to register the SAME DataFrame as tempTable does not solve the >> problem - only creating new table out of new DataFrame (see below). >> >> >> *Detailed example* >> 1. I have table in Hive called '*hive_table*' with string field called >> *'name'* and int filed called *'sid'* >> >> 2. I registered a udf: >> *def getStr(str: String) = str + "_str"* >> *hc.udf.register("getStr", getStr _)* >> >> 3. Running the following on Zeppelin: >> *%sql select getStr(name), * from** hive_table* >> yields with excpetion: >> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* >> >> 4. Creating new table, as follows: >> *case class SidName(sid: Int, name: String)* >> *val sidNameList = hc.sql("select sid, name from hive_table limit >> 10").collectAsList().map(row => new SidName(row.getInt(0), >> row.getString(1)))* >> *val sidNameDF = hc.createDataFrame(sidNameList)* >> *sidNameDF.registerTempTable("tmp_sid_name")* >> >> 5. Query the new table in the same fashion: >> *%sql select getStr(name), * from tmp_sid_name* >> >> This time I get the expected results! >> >> >> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote: >> >>> BTW >>> The same query, on the same cluster but on Spark shell return the >>> expected results. >>> >>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote: >>> >>>> It looks that Zeppelin jar does not distributed to Spark nodes, though >>>> I can't understand why it needed for the UDF. >>>> >>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>> >>>>> Thanks for the response, >>>>> I'm not sure what do you mean, it exactly what I tried and failed. >>>>> As I wrote above, 'hc' is actually different name to sqlc (that is >>>>> different name to z.sqlContext). >>>>> >>>>> I get the same results. >>>>> >>>>> >>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote: >>>>> >>>>>> Hi Ophir, >>>>>> >>>>>> Can you try below? >>>>>> >>>>>> def getNum(): Int = { >>>>>> 100 >>>>>> } >>>>>> sqlc.udf.register("getNum", getNum _) >>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>>>> >>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>>>> and use hiveContext as sqlContext by default. >>>>>> (If you did not change useHiveContext to be "false" in interpreter >>>>>> menu.) >>>>>> >>>>>> Hope it helps. >>>>>> >>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Guys? >>>>>>> Somebody? >>>>>>> Can it be that Zeppelin does not support UDFs? >>>>>>> >>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Guys, >>>>>>>> One more problem I have encountered using Zeppelin. >>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>>>> >>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == >>>>>>>> HiveContext): >>>>>>>> 1. Create and register the UDF: >>>>>>>> def getNum(): Int = { >>>>>>>> 100 >>>>>>>> } >>>>>>>> >>>>>>>> hc.udf.register("getNum",getNum _) >>>>>>>> 2. And I try to use on exist table: >>>>>>>> %sql select getNum() from filteredNc limit 1 >>>>>>>> >>>>>>>> Or: >>>>>>>> 3. Trying using direct hc: >>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>>>> >>>>>>>> Both of them yield with >>>>>>>> *"java.lang.ClassNotFoundException: >>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>>>> (see below the full exception). >>>>>>>> >>>>>>>> And my questions is: >>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes? >>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>>>> >>>>>>>> The exception: >>>>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, >>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>>>> at >>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>>>> >>>>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>>>> >>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>> at >>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>> ... 103 more >>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>>>> at >>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>> at >>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>> at >>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>>>> at >>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>>>> ... 105 more >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >