Re: UDFs in Zeppelin??

moon soo Lee Wed, 01 Jul 2015 16:36:06 -0700

Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?


Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote:

> BTW, this isn't working as well:
>
>
>
> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
> sidNameDF2.registerTempTable("tmp_sid_name2")*
>
>
> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote:
>
>> I've made some progress in this issue and I think it's a bug...
>>
>> Apparently, when trying to use registered UDFs on tables that comes from
>> Hive - it returns the above exception (*ClassNotFoundException:
>> org.apache.zeppelin.spark.ZeppelinContext*).
>> When create new table and register it - UDFs works as expected.
>> You can see below to full details and example.
>>
>> Can someone tell if it's the expected behavior or a bug?
>> BTW
>> I don't mind to work on that bug - if you can give a pointer to the right
>> places.
>>
>> BTW2
>> Trying to register the SAME DataFrame as tempTable does not solve the
>> problem - only creating new table out of new DataFrame (see below).
>>
>>
>> *Detailed example*
>> 1. I have table in Hive called '*hive_table*' with string field called
>> *'name'* and int filed called *'sid'*
>>
>> 2. I registered a udf:
>> *def getStr(str: String) = str + "_str"*
>> *hc.udf.register("getStr", getStr _)*
>>
>> 3. Running the following on Zeppelin:
>> *%sql select getStr(name), * from** hive_table*
>> yields with excpetion:
>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>
>> 4. Creating new table, as follows:
>> *case class SidName(sid: Int, name: String)*
>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>> row.getString(1)))*
>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>
>> 5. Query the new table in the same fashion:
>> *%sql select getStr(name), * from tmp_sid_name*
>>
>> This time I get the expected results!
>>
>>
>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>
>>> BTW
>>> The same query, on the same cluster but on Spark shell return the
>>> expected results.
>>>
>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>
>>>> It looks that Zeppelin jar does not distributed to Spark nodes, though
>>>> I can't understand why it needed for the UDF.
>>>>
>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>>
>>>>> Thanks for the response,
>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>> different name to z.sqlContext).
>>>>>
>>>>> I get the same results.
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote:
>>>>>
>>>>>> Hi Ophir,
>>>>>>
>>>>>> Can you try below?
>>>>>>
>>>>>> def getNum(): Int = {
>>>>>>     100
>>>>>> }
>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>
>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>> and use hiveContext as sqlContext by default.
>>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>>> menu.)
>>>>>>
>>>>>> Hope it helps.
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Guys?
>>>>>>> Somebody?
>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>
>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Guys,
>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>
>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>> HiveContext):
>>>>>>>> 1. Create and register the UDF:
>>>>>>>> def getNum(): Int = {
>>>>>>>>     100
>>>>>>>> }
>>>>>>>>
>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>> 2. And I try to use on exist table:
>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>
>>>>>>>> Or:
>>>>>>>> 3. Trying using direct hc:
>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>
>>>>>>>> Both of them yield with
>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>> (see below the full exception).
>>>>>>>>
>>>>>>>> And my questions is:
>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes?
>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>
>>>>>>>> The exception:
>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>     at
>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>
>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>
>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>     at
>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>     ... 103 more
>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>     at
>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>     at
>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>     at
>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>     at
>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>     ... 105 more
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: UDFs in Zeppelin??

Reply via email to