Re: Not able to see registered table records and Pyspark not working

moon soo Lee Thu, 09 Jul 2015 07:54:13 -0700

You can still manually configure all the environment variables and
properties for pyspark, but it is suggested to build with -Ppyspark from
now.


Thanks,
moon

On Wed, Jul 8, 2015 at 10:59 PM IT CTO <[email protected]> wrote:

> Does this means that everyone who wants pySpark to work should use this
> option in the build from now on or is that going to be the default like
> spark 1.4 ?
> Eran
>
> On Thu, Jul 9, 2015 at 12:14 AM moon soo Lee <[email protected]> wrote:
>
>> If your source code is older than 3 days? Because of -Ppyspark is merged
>> about 3 days ago.
>>
>> Thanks,
>> moon
>>
>>
>> On Wed, Jul 8, 2015 at 1:58 PM Vadla, Karthik <[email protected]>
>> wrote:
>>
>>>  I’m using this .zip https://github.com/apache/incubator-zeppelin
>>>
>>>
>>>
>>> Thanks
>>>
>>> Karthik
>>>
>>>
>>>
>>> *From:* moon soo Lee [mailto:[email protected]]
>>> *Sent:* Wednesday, July 8, 2015 1:37 PM
>>> *To:* [email protected]
>>> *Subject:* Re: Not able to see registered table records and Pyspark not
>>> working
>>>
>>>
>>>
>>> Are you building on latest master?
>>>
>>> On Wed, Jul 8, 2015 at 1:34 PM Vadla, Karthik <[email protected]>
>>> wrote:
>>>
>>>  Hi Moon,
>>>
>>>
>>>
>>> Yeah I tried below command. The build was successful, but at the end I
>>> got warning message as below
>>>
>>> [WARNING] The requested profile "pyspark" could not be activated because
>>> it does not exist.
>>>
>>>
>>>
>>>
>>>
>>> Pyspark exists on machine. Do I need to anything further.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Karthik
>>>
>>> *From:* moon soo Lee [mailto:[email protected]]
>>> *Sent:* Wednesday, July 8, 2015 10:58 AM
>>>
>>>
>>> *To:* [email protected]
>>> *Subject:* Re: Not able to see registered table records and Pyspark not
>>> working
>>>
>>>
>>>
>>> Hi
>>>
>>>
>>>
>>> I was meaning adding -Ppyspark profile, like
>>>
>>>
>>>
>>> *mvn clean package -Pspark-1.3 -Ppyspark -Dhadoop.version=2.6.0-cdh5.4.0
>>> -Phadoop-2.6 –DskipTests *
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>> On Wed, Jul 8, 2015 at 10:43 AM Vadla, Karthik <[email protected]>
>>> wrote:
>>>
>>>  Hi Moon,
>>>
>>>
>>>
>>> You mean to say I need to build something like this.
>>>
>>> *mvn clean package -Ppyspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0
>>> -Phadoop-2.6 –DskipTests*
>>>
>>>
>>>
>>> I have built my zeppelin with below command previously
>>>
>>> *mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0
>>> -Phadoop-2.6 –DskipTests*
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Karthik
>>>
>>> *From:* moon soo Lee [mailto:[email protected]]
>>> *Sent:* Wednesday, July 8, 2015 10:20 AM
>>> *To:* [email protected]
>>> *Subject:* Re: Not able to see registered table records and Pyspark not
>>> working
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> If you build latest master branch with -Ppyspark maven profile, it'll
>>> help pyspark work without setting those environment variables.
>>>
>>> Hope this helps.
>>>
>>>
>>>
>>> Best,
>>>
>>> moon
>>>
>>>
>>>
>>> On Tue, Jul 7, 2015 at 3:47 PM Vadla, Karthik <[email protected]>
>>> wrote:
>>>
>>>  Hi All,
>>>
>>>
>>>
>>> This part is commented in *zeppelin-env.sh* in my conf folder.
>>>
>>>
>>>
>>> # Pyspark (supported with Spark 1.2.1 and above)
>>>
>>> # To configure pyspark, you need to set spark distribution's path to
>>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI
>>>
>>> # export PYSPARK_PYTHON          # path to the python command. must be
>>> the same path on the driver(Zeppelin) and all workers.
>>>
>>> # export PYTHONPATH              # extra PYTHONPATH.
>>>
>>>
>>>
>>> Can you anyone help how to setup those.
>>>
>>>
>>>
>>> Appreciate your help.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Karthik
>>>
>>>
>>>
>>> *From:* Vadla, Karthik [mailto:[email protected]]
>>> *Sent:* Tuesday, July 7, 2015 3:29 PM
>>> *To:* [email protected]
>>> *Subject:* RE: Not able to see registered table records and Pyspark not
>>> working
>>>
>>>
>>>
>>> Hi Moon,
>>>
>>>
>>>
>>> Thanks for that.
>>> The problem is with my parsing. I resolved it.
>>>
>>>
>>>
>>> I have another question to ask.
>>>
>>> I’m just trying to run *print command using pyspark interpreter. *
>>> It is not responding .
>>>
>>>
>>>
>>> When I look at the log, I don’t have information except this
>>>
>>>
>>>
>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>>> SchedulerFactory.java[jobStarted]:132) - Job
>>> paragraph_1436305204170_601291630 started by scheduler
>>> remoteinterpreter_267235421
>>>
>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>>> Paragraph.java[jobRun]:194) - run paragraph 20150707-144004_475199059 using
>>> pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7
>>>
>>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>>> Paragraph.java[jobRun]:211) - RUN : list=range(1,10)
>>>
>>> print(list)
>>>
>>> INFO [2015-07-07 15:19:18,060] ({Thread-255}
>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>
>>> INFO [2015-07-07 15:19:18,678] ({Thread-255}
>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>
>>> INFO [2015-07-07 15:19:19,278] ({Thread-255}
>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>
>>> INFO [2015-07-07 15:19:19,879] ({Thread-255}
>>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>>
>>>
>>>
>>>
>>>
>>> Do I need to do any config settings in *zeppelin-env.sh or
>>> zeppelin-site.xml*???
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Karthik
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* moon soo Lee [mailto:[email protected] <[email protected]>]
>>> *Sent:* Friday, July 3, 2015 2:31 PM
>>> *To:* [email protected]
>>> *Subject:* Re: Not able to see registered table records
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> Could you try this branch?
>>> https://github.com/apache/incubator-zeppelin/pull/136
>>>
>>>
>>>
>>> It'll give you better stacktrace than just displaying "
>>> java.lang.reflect.InvocationTargetException"
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>> On Thu, Jul 2, 2015 at 10:34 AM Vadla, Karthik <[email protected]>
>>> wrote:
>>>
>>>  Hi All.
>>>
>>>
>>>
>>> I just registered a tables using below code
>>>
>>>
>>>
>>> *val eduText =
>>> sc.textFile("hdfs://ip.address/user/karthik/education.csv")*
>>>
>>>
>>>
>>> *case class Education(unitid:Integer, instnm:String, addr : String, city
>>> : String, stabbr : String, zip : Integer)*
>>>
>>>
>>>
>>> *val education =
>>> eduText.map(s=>s.split(",")).filter(s=>s(0)!="UNITID").map(*
>>>
>>> *    s=>Education(s(0).toInt, *
>>>
>>> *            s(1).replaceAll("\"", ""),*
>>>
>>> *            s(2).replaceAll("\"", ""),*
>>>
>>> *            s(3).replaceAll("\"", ""),*
>>>
>>> *            s(4).replaceAll("\"", ""),*
>>>
>>> *            s(5).replaceAll("\"", "").toInt*
>>>
>>> *        )*
>>>
>>> *)*
>>>
>>>
>>>
>>> *// Below line works only in spark 1.3.0.*
>>>
>>> *// For spark 1.1.x and spark 1.2.x,*
>>>
>>> *// use bank.registerTempTable("bank") instead.*
>>>
>>>
>>>
>>> *education.toDF().registerTempTable("education")*
>>>
>>>
>>>
>>> when I run *“%sql show tables”*
>>>
>>>
>>>
>>> It displays table “education”
>>>
>>>
>>>
>>> But when I try to run the command *“%sql select count(*) from
>>> education”. * It is throwing below error.
>>>
>>>
>>>
>>> java.lang.reflect.InvocationTargetException
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Can anyone help me with this.
>>>
>>> Appreciate your help.
>>>
>>>
>>>
>>> And I enclosed .csv file used to register table.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Karthik
>>>
>>>

Re: Not able to see registered table records and Pyspark not working

Reply via email to