RE: Not able to see registered table records and Pyspark not working

Vadla, Karthik Wed, 08 Jul 2015 13:58:50 -0700

I’m using this .zip https://github.com/apache/incubator-zeppelin

Thanks
Karthik

From: moon soo Lee [mailto:[email protected]]
Sent: Wednesday, July 8, 2015 1:37 PM
To: [email protected]
Subject: Re: Not able to see registered table records and Pyspark not working

Are you building on latest master?
On Wed, Jul 8, 2015 at 1:34 PM Vadla, Karthik 
<[email protected]<mailto:[email protected]>> wrote:
Hi Moon,

Yeah I tried below command. The build was successful, but at the end I got 
warning message as below
[WARNING] The requested profile "pyspark" could not be activated because it 
does not exist.

Pyspark exists on machine. Do I need to anything further.

Thanks
Karthik
From: moon soo Lee [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, July 8, 2015 10:58 AM

To: 
[email protected]<mailto:[email protected]>
Subject: Re: Not able to see registered table records and Pyspark not working

Hi

I was meaning adding -Ppyspark profile, like

mvn clean package -Pspark-1.3 -Ppyspark -Dhadoop.version=2.6.0-cdh5.4.0 
-Phadoop-2.6 –DskipTests

Thanks,
moon
On Wed, Jul 8, 2015 at 10:43 AM Vadla, Karthik 
<[email protected]<mailto:[email protected]>> wrote:
Hi Moon,

You mean to say I need to build something like this.
mvn clean package -Ppyspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0 -Phadoop-2.6 
–DskipTests

I have built my zeppelin with below command previously
mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0 -Phadoop-2.6 
–DskipTests

Thanks
Karthik
From: moon soo Lee [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, July 8, 2015 10:20 AM
To: 
[email protected]<mailto:[email protected]>
Subject: Re: Not able to see registered table records and Pyspark not working

Hi,

If you build latest master branch with -Ppyspark maven profile, it'll help 
pyspark work without setting those environment variables.
Hope this helps.

Best,
moon

On Tue, Jul 7, 2015 at 3:47 PM Vadla, Karthik 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

This part is commented in zeppelin-env.sh in my conf folder.

# Pyspark (supported with Spark 1.2.1 and above)
# To configure pyspark, you need to set spark distribution's path to 
'spark.home' property in Interpreter setting screen in Zeppelin GUI
# export PYSPARK_PYTHON          # path to the python command. must be the same 
path on the driver(Zeppelin) and all workers.
# export PYTHONPATH              # extra PYTHONPATH.

Can you anyone help how to setup those.

Appreciate your help.

Thanks
Karthik

From: Vadla, Karthik 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, July 7, 2015 3:29 PM
To: 
[email protected]<mailto:[email protected]>
Subject: RE: Not able to see registered table records and Pyspark not working

Hi Moon,

Thanks for that.
The problem is with my parsing. I resolved it.

I have another question to ask.
I’m just trying to run print command using pyspark interpreter.
It is not responding .

When I look at the log, I don’t have information except this

INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} 
SchedulerFactory.java[jobStarted]:132) - Job paragraph_1436305204170_601291630 
started by scheduler remoteinterpreter_267235421
INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} Paragraph.java[jobRun]:194) 
- run paragraph 20150707-144004_475199059 using pyspark 
org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7<mailto:org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7>
INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} Paragraph.java[jobRun]:211) 
- RUN : list=range(1,10)
print(list)
INFO [2015-07-07 15:19:18,060] ({Thread-255} 
NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
INFO [2015-07-07 15:19:18,678] ({Thread-255} 
NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
INFO [2015-07-07 15:19:19,278] ({Thread-255} 
NotebookServer.java[broadcast]:251) - SEND >> PROGRESS
INFO [2015-07-07 15:19:19,879] ({Thread-255} 
NotebookServer.java[broadcast]:251) - SEND >> PROGRESS

Do I need to do any config settings in zeppelin-env.sh or zeppelin-site.xml???

Thanks
Karthik

From: moon soo Lee [mailto:[email protected]]
Sent: Friday, July 3, 2015 2:31 PM
To: 
[email protected]<mailto:[email protected]>
Subject: Re: Not able to see registered table records

Hi,

Could you try this branch? https://github.com/apache/incubator-zeppelin/pull/136

It'll give you better stacktrace than just displaying 
"java.lang.reflect.InvocationTargetException"

Thanks,
moon

On Thu, Jul 2, 2015 at 10:34 AM Vadla, Karthik 
<[email protected]<mailto:[email protected]>> wrote:
Hi All.

I just registered a tables using below code

val eduText = sc.textFile("hdfs://ip.address/user/karthik/education.csv")

case class Education(unitid:Integer, instnm:String, addr : String, city : 
String, stabbr : String, zip : Integer)

val education = eduText.map(s=>s.split(",")).filter(s=>s(0)!="UNITID").map(
    s=>Education(s(0).toInt,
            s(1).replaceAll("\"", ""),
            s(2).replaceAll("\"", ""),
            s(3).replaceAll("\"", ""),
            s(4).replaceAll("\"", ""),
            s(5).replaceAll("\"", "").toInt
        )
)

// Below line works only in spark 1.3.0.
// For spark 1.1.x and spark 1.2.x,
// use bank.registerTempTable("bank") instead.

education.toDF().registerTempTable("education")

when I run “%sql show tables”

It displays table “education”

But when I try to run the command “%sql select count(*) from education”.  It is 
throwing below error.

java.lang.reflect.InvocationTargetException

Can anyone help me with this.
Appreciate your help.

And I enclosed .csv file used to register table.

Thanks
Karthik

RE: Not able to see registered table records and Pyspark not working

Reply via email to