Re: Not able to see registered table records and Pyspark not working

IT CTO Wed, 08 Jul 2015 23:00:07 -0700

Does this means that everyone who wants pySpark to work should use this
option in the build from now on or is that going to be the default like
spark 1.4 ?
Eran


On Thu, Jul 9, 2015 at 12:14 AM moon soo Lee <[email protected]> wrote:

> If your source code is older than 3 days? Because of -Ppyspark is merged
> about 3 days ago.
>
> Thanks,
> moon
>
>
> On Wed, Jul 8, 2015 at 1:58 PM Vadla, Karthik <[email protected]>
> wrote:
>
>>  I’m using this .zip https://github.com/apache/incubator-zeppelin
>>
>>
>>
>> Thanks
>>
>> Karthik
>>
>>
>>
>> *From:* moon soo Lee [mailto:[email protected]]
>> *Sent:* Wednesday, July 8, 2015 1:37 PM
>> *To:* [email protected]
>> *Subject:* Re: Not able to see registered table records and Pyspark not
>> working
>>
>>
>>
>> Are you building on latest master?
>>
>> On Wed, Jul 8, 2015 at 1:34 PM Vadla, Karthik <[email protected]>
>> wrote:
>>
>>  Hi Moon,
>>
>>
>>
>> Yeah I tried below command. The build was successful, but at the end I
>> got warning message as below
>>
>> [WARNING] The requested profile "pyspark" could not be activated because
>> it does not exist.
>>
>>
>>
>>
>>
>> Pyspark exists on machine. Do I need to anything further.
>>
>>
>>
>> Thanks
>>
>> Karthik
>>
>> *From:* moon soo Lee [mailto:[email protected]]
>> *Sent:* Wednesday, July 8, 2015 10:58 AM
>>
>>
>> *To:* [email protected]
>> *Subject:* Re: Not able to see registered table records and Pyspark not
>> working
>>
>>
>>
>> Hi
>>
>>
>>
>> I was meaning adding -Ppyspark profile, like
>>
>>
>>
>> *mvn clean package -Pspark-1.3 -Ppyspark -Dhadoop.version=2.6.0-cdh5.4.0
>> -Phadoop-2.6 –DskipTests *
>>
>> Thanks,
>>
>> moon
>>
>> On Wed, Jul 8, 2015 at 10:43 AM Vadla, Karthik <[email protected]>
>> wrote:
>>
>>  Hi Moon,
>>
>>
>>
>> You mean to say I need to build something like this.
>>
>> *mvn clean package -Ppyspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0
>> -Phadoop-2.6 –DskipTests*
>>
>>
>>
>> I have built my zeppelin with below command previously
>>
>> *mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0
>> -Phadoop-2.6 –DskipTests*
>>
>>
>>
>>
>>
>> Thanks
>>
>> Karthik
>>
>> *From:* moon soo Lee [mailto:[email protected]]
>> *Sent:* Wednesday, July 8, 2015 10:20 AM
>> *To:* [email protected]
>> *Subject:* Re: Not able to see registered table records and Pyspark not
>> working
>>
>>
>>
>> Hi,
>>
>>
>>
>> If you build latest master branch with -Ppyspark maven profile, it'll
>> help pyspark work without setting those environment variables.
>>
>> Hope this helps.
>>
>>
>>
>> Best,
>>
>> moon
>>
>>
>>
>> On Tue, Jul 7, 2015 at 3:47 PM Vadla, Karthik <[email protected]>
>> wrote:
>>
>>  Hi All,
>>
>>
>>
>> This part is commented in *zeppelin-env.sh* in my conf folder.
>>
>>
>>
>> # Pyspark (supported with Spark 1.2.1 and above)
>>
>> # To configure pyspark, you need to set spark distribution's path to
>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI
>>
>> # export PYSPARK_PYTHON          # path to the python command. must be
>> the same path on the driver(Zeppelin) and all workers.
>>
>> # export PYTHONPATH              # extra PYTHONPATH.
>>
>>
>>
>> Can you anyone help how to setup those.
>>
>>
>>
>> Appreciate your help.
>>
>>
>>
>> Thanks
>>
>> Karthik
>>
>>
>>
>> *From:* Vadla, Karthik [mailto:[email protected]]
>> *Sent:* Tuesday, July 7, 2015 3:29 PM
>> *To:* [email protected]
>> *Subject:* RE: Not able to see registered table records and Pyspark not
>> working
>>
>>
>>
>> Hi Moon,
>>
>>
>>
>> Thanks for that.
>> The problem is with my parsing. I resolved it.
>>
>>
>>
>> I have another question to ask.
>>
>> I’m just trying to run *print command using pyspark interpreter. *
>> It is not responding .
>>
>>
>>
>> When I look at the log, I don’t have information except this
>>
>>
>>
>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>> SchedulerFactory.java[jobStarted]:132) - Job
>> paragraph_1436305204170_601291630 started by scheduler
>> remoteinterpreter_267235421
>>
>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>> Paragraph.java[jobRun]:194) - run paragraph 20150707-144004_475199059 using
>> pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7
>>
>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41}
>> Paragraph.java[jobRun]:211) - RUN : list=range(1,10)
>>
>> print(list)
>>
>> INFO [2015-07-07 15:19:18,060] ({Thread-255}
>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>
>> INFO [2015-07-07 15:19:18,678] ({Thread-255}
>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>
>> INFO [2015-07-07 15:19:19,278] ({Thread-255}
>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>
>> INFO [2015-07-07 15:19:19,879] ({Thread-255}
>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
>>
>>
>>
>>
>>
>> Do I need to do any config settings in *zeppelin-env.sh or
>> zeppelin-site.xml*???
>>
>>
>>
>>
>>
>> Thanks
>>
>> Karthik
>>
>>
>>
>>
>>
>>
>>
>> *From:* moon soo Lee [mailto:[email protected] <[email protected]>]
>> *Sent:* Friday, July 3, 2015 2:31 PM
>> *To:* [email protected]
>> *Subject:* Re: Not able to see registered table records
>>
>>
>>
>> Hi,
>>
>>
>>
>> Could you try this branch?
>> https://github.com/apache/incubator-zeppelin/pull/136
>>
>>
>>
>> It'll give you better stacktrace than just displaying "
>> java.lang.reflect.InvocationTargetException"
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>> On Thu, Jul 2, 2015 at 10:34 AM Vadla, Karthik <[email protected]>
>> wrote:
>>
>>  Hi All.
>>
>>
>>
>> I just registered a tables using below code
>>
>>
>>
>> *val eduText =
>> sc.textFile("hdfs://ip.address/user/karthik/education.csv")*
>>
>>
>>
>> *case class Education(unitid:Integer, instnm:String, addr : String, city
>> : String, stabbr : String, zip : Integer)*
>>
>>
>>
>> *val education =
>> eduText.map(s=>s.split(",")).filter(s=>s(0)!="UNITID").map(*
>>
>> *    s=>Education(s(0).toInt, *
>>
>> *            s(1).replaceAll("\"", ""),*
>>
>> *            s(2).replaceAll("\"", ""),*
>>
>> *            s(3).replaceAll("\"", ""),*
>>
>> *            s(4).replaceAll("\"", ""),*
>>
>> *            s(5).replaceAll("\"", "").toInt*
>>
>> *        )*
>>
>> *)*
>>
>>
>>
>> *// Below line works only in spark 1.3.0.*
>>
>> *// For spark 1.1.x and spark 1.2.x,*
>>
>> *// use bank.registerTempTable("bank") instead.*
>>
>>
>>
>> *education.toDF().registerTempTable("education")*
>>
>>
>>
>> when I run *“%sql show tables”*
>>
>>
>>
>> It displays table “education”
>>
>>
>>
>> But when I try to run the command *“%sql select count(*) from
>> education”. * It is throwing below error.
>>
>>
>>
>> java.lang.reflect.InvocationTargetException
>>
>>
>>
>>
>>
>>
>>
>> Can anyone help me with this.
>>
>> Appreciate your help.
>>
>>
>>
>> And I enclosed .csv file used to register table.
>>
>>
>>
>> Thanks
>>
>> Karthik
>>
>>

Re: Not able to see registered table records and Pyspark not working

Reply via email to