Re: UDFs in Zeppelin??

Ophir Cohen Tue, 30 Jun 2015 04:33:04 -0700

BTW, this isn't working as well:



*val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
sidNameDF2.registerTempTable("tmp_sid_name2")*


On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote:

> I've made some progress in this issue and I think it's a bug...
>
> Apparently, when trying to use registered UDFs on tables that comes from
> Hive - it returns the above exception (*ClassNotFoundException:
> org.apache.zeppelin.spark.ZeppelinContext*).
> When create new table and register it - UDFs works as expected.
> You can see below to full details and example.
>
> Can someone tell if it's the expected behavior or a bug?
> BTW
> I don't mind to work on that bug - if you can give a pointer to the right
> places.
>
> BTW2
> Trying to register the SAME DataFrame as tempTable does not solve the
> problem - only creating new table out of new DataFrame (see below).
>
>
> *Detailed example*
> 1. I have table in Hive called '*hive_table*' with string field called
> *'name'* and int filed called *'sid'*
>
> 2. I registered a udf:
> *def getStr(str: String) = str + "_str"*
> *hc.udf.register("getStr", getStr _)*
>
> 3. Running the following on Zeppelin:
> *%sql select getStr(name), * from** hive_table*
> yields with excpetion:
> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>
> 4. Creating new table, as follows:
> *case class SidName(sid: Int, name: String)*
> *val sidNameList = hc.sql("select sid, name from hive_table limit
> 10").collectAsList().map(row => new SidName(row.getInt(0),
> row.getString(1)))*
> *val sidNameDF = hc.createDataFrame(sidNameList)*
> *sidNameDF.registerTempTable("tmp_sid_name")*
>
> 5. Query the new table in the same fashion:
> *%sql select getStr(name), * from tmp_sid_name*
>
> This time I get the expected results!
>
>
> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote:
>
>> BTW
>> The same query, on the same cluster but on Spark shell return the
>> expected results.
>>
>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>
>>> It looks that Zeppelin jar does not distributed to Spark nodes, though I
>>> can't understand why it needed for the UDF.
>>>
>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>
>>>> Thanks for the response,
>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>> different name to z.sqlContext).
>>>>
>>>> I get the same results.
>>>>
>>>>
>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote:
>>>>
>>>>> Hi Ophir,
>>>>>
>>>>> Can you try below?
>>>>>
>>>>> def getNum(): Int = {
>>>>>     100
>>>>> }
>>>>> sqlc.udf.register("getNum", getNum _)
>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>
>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>> and use hiveContext as sqlContext by default.
>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>> menu.)
>>>>>
>>>>> Hope it helps.
>>>>>
>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>>>
>>>>>> Guys?
>>>>>> Somebody?
>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>
>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Guys,
>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>
>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
>>>>>>> 1. Create and register the UDF:
>>>>>>> def getNum(): Int = {
>>>>>>>     100
>>>>>>> }
>>>>>>>
>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>> 2. And I try to use on exist table:
>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>
>>>>>>> Or:
>>>>>>> 3. Trying using direct hc:
>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>
>>>>>>> Both of them yield with
>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>> (see below the full exception).
>>>>>>>
>>>>>>> And my questions is:
>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes?
>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>
>>>>>>> The exception:
>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>     at
>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>
>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>
>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>     at
>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>     ... 103 more
>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>     at
>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>     at
>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>     at
>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>     at
>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>     ... 105 more
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: UDFs in Zeppelin??

Reply via email to