You can still manually configure all the environment variables and properties for pyspark, but it is suggested to build with -Ppyspark from now.
Thanks, moon On Wed, Jul 8, 2015 at 10:59 PM IT CTO <[email protected]> wrote: > Does this means that everyone who wants pySpark to work should use this > option in the build from now on or is that going to be the default like > spark 1.4 ? > Eran > > On Thu, Jul 9, 2015 at 12:14 AM moon soo Lee <[email protected]> wrote: > >> If your source code is older than 3 days? Because of -Ppyspark is merged >> about 3 days ago. >> >> Thanks, >> moon >> >> >> On Wed, Jul 8, 2015 at 1:58 PM Vadla, Karthik <[email protected]> >> wrote: >> >>> I’m using this .zip https://github.com/apache/incubator-zeppelin >>> >>> >>> >>> Thanks >>> >>> Karthik >>> >>> >>> >>> *From:* moon soo Lee [mailto:[email protected]] >>> *Sent:* Wednesday, July 8, 2015 1:37 PM >>> *To:* [email protected] >>> *Subject:* Re: Not able to see registered table records and Pyspark not >>> working >>> >>> >>> >>> Are you building on latest master? >>> >>> On Wed, Jul 8, 2015 at 1:34 PM Vadla, Karthik <[email protected]> >>> wrote: >>> >>> Hi Moon, >>> >>> >>> >>> Yeah I tried below command. The build was successful, but at the end I >>> got warning message as below >>> >>> [WARNING] The requested profile "pyspark" could not be activated because >>> it does not exist. >>> >>> >>> >>> >>> >>> Pyspark exists on machine. Do I need to anything further. >>> >>> >>> >>> Thanks >>> >>> Karthik >>> >>> *From:* moon soo Lee [mailto:[email protected]] >>> *Sent:* Wednesday, July 8, 2015 10:58 AM >>> >>> >>> *To:* [email protected] >>> *Subject:* Re: Not able to see registered table records and Pyspark not >>> working >>> >>> >>> >>> Hi >>> >>> >>> >>> I was meaning adding -Ppyspark profile, like >>> >>> >>> >>> *mvn clean package -Pspark-1.3 -Ppyspark -Dhadoop.version=2.6.0-cdh5.4.0 >>> -Phadoop-2.6 –DskipTests * >>> >>> Thanks, >>> >>> moon >>> >>> On Wed, Jul 8, 2015 at 10:43 AM Vadla, Karthik <[email protected]> >>> wrote: >>> >>> Hi Moon, >>> >>> >>> >>> You mean to say I need to build something like this. >>> >>> *mvn clean package -Ppyspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0 >>> -Phadoop-2.6 –DskipTests* >>> >>> >>> >>> I have built my zeppelin with below command previously >>> >>> *mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0-cdh5.4.0 >>> -Phadoop-2.6 –DskipTests* >>> >>> >>> >>> >>> >>> Thanks >>> >>> Karthik >>> >>> *From:* moon soo Lee [mailto:[email protected]] >>> *Sent:* Wednesday, July 8, 2015 10:20 AM >>> *To:* [email protected] >>> *Subject:* Re: Not able to see registered table records and Pyspark not >>> working >>> >>> >>> >>> Hi, >>> >>> >>> >>> If you build latest master branch with -Ppyspark maven profile, it'll >>> help pyspark work without setting those environment variables. >>> >>> Hope this helps. >>> >>> >>> >>> Best, >>> >>> moon >>> >>> >>> >>> On Tue, Jul 7, 2015 at 3:47 PM Vadla, Karthik <[email protected]> >>> wrote: >>> >>> Hi All, >>> >>> >>> >>> This part is commented in *zeppelin-env.sh* in my conf folder. >>> >>> >>> >>> # Pyspark (supported with Spark 1.2.1 and above) >>> >>> # To configure pyspark, you need to set spark distribution's path to >>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI >>> >>> # export PYSPARK_PYTHON # path to the python command. must be >>> the same path on the driver(Zeppelin) and all workers. >>> >>> # export PYTHONPATH # extra PYTHONPATH. >>> >>> >>> >>> Can you anyone help how to setup those. >>> >>> >>> >>> Appreciate your help. >>> >>> >>> >>> Thanks >>> >>> Karthik >>> >>> >>> >>> *From:* Vadla, Karthik [mailto:[email protected]] >>> *Sent:* Tuesday, July 7, 2015 3:29 PM >>> *To:* [email protected] >>> *Subject:* RE: Not able to see registered table records and Pyspark not >>> working >>> >>> >>> >>> Hi Moon, >>> >>> >>> >>> Thanks for that. >>> The problem is with my parsing. I resolved it. >>> >>> >>> >>> I have another question to ask. >>> >>> I’m just trying to run *print command using pyspark interpreter. * >>> It is not responding . >>> >>> >>> >>> When I look at the log, I don’t have information except this >>> >>> >>> >>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} >>> SchedulerFactory.java[jobStarted]:132) - Job >>> paragraph_1436305204170_601291630 started by scheduler >>> remoteinterpreter_267235421 >>> >>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} >>> Paragraph.java[jobRun]:194) - run paragraph 20150707-144004_475199059 using >>> pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@33a625a7 >>> >>> INFO [2015-07-07 15:19:17,702] ({pool-1-thread-41} >>> Paragraph.java[jobRun]:211) - RUN : list=range(1,10) >>> >>> print(list) >>> >>> INFO [2015-07-07 15:19:18,060] ({Thread-255} >>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>> >>> INFO [2015-07-07 15:19:18,678] ({Thread-255} >>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>> >>> INFO [2015-07-07 15:19:19,278] ({Thread-255} >>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>> >>> INFO [2015-07-07 15:19:19,879] ({Thread-255} >>> NotebookServer.java[broadcast]:251) - SEND >> PROGRESS >>> >>> >>> >>> >>> >>> Do I need to do any config settings in *zeppelin-env.sh or >>> zeppelin-site.xml*??? >>> >>> >>> >>> >>> >>> Thanks >>> >>> Karthik >>> >>> >>> >>> >>> >>> >>> >>> *From:* moon soo Lee [mailto:[email protected] <[email protected]>] >>> *Sent:* Friday, July 3, 2015 2:31 PM >>> *To:* [email protected] >>> *Subject:* Re: Not able to see registered table records >>> >>> >>> >>> Hi, >>> >>> >>> >>> Could you try this branch? >>> https://github.com/apache/incubator-zeppelin/pull/136 >>> >>> >>> >>> It'll give you better stacktrace than just displaying " >>> java.lang.reflect.InvocationTargetException" >>> >>> >>> >>> Thanks, >>> >>> moon >>> >>> >>> >>> On Thu, Jul 2, 2015 at 10:34 AM Vadla, Karthik <[email protected]> >>> wrote: >>> >>> Hi All. >>> >>> >>> >>> I just registered a tables using below code >>> >>> >>> >>> *val eduText = >>> sc.textFile("hdfs://ip.address/user/karthik/education.csv")* >>> >>> >>> >>> *case class Education(unitid:Integer, instnm:String, addr : String, city >>> : String, stabbr : String, zip : Integer)* >>> >>> >>> >>> *val education = >>> eduText.map(s=>s.split(",")).filter(s=>s(0)!="UNITID").map(* >>> >>> * s=>Education(s(0).toInt, * >>> >>> * s(1).replaceAll("\"", ""),* >>> >>> * s(2).replaceAll("\"", ""),* >>> >>> * s(3).replaceAll("\"", ""),* >>> >>> * s(4).replaceAll("\"", ""),* >>> >>> * s(5).replaceAll("\"", "").toInt* >>> >>> * )* >>> >>> *)* >>> >>> >>> >>> *// Below line works only in spark 1.3.0.* >>> >>> *// For spark 1.1.x and spark 1.2.x,* >>> >>> *// use bank.registerTempTable("bank") instead.* >>> >>> >>> >>> *education.toDF().registerTempTable("education")* >>> >>> >>> >>> when I run *“%sql show tables”* >>> >>> >>> >>> It displays table “education” >>> >>> >>> >>> But when I try to run the command *“%sql select count(*) from >>> education”. * It is throwing below error. >>> >>> >>> >>> java.lang.reflect.InvocationTargetException >>> >>> >>> >>> >>> >>> >>> >>> Can anyone help me with this. >>> >>> Appreciate your help. >>> >>> >>> >>> And I enclosed .csv file used to register table. >>> >>> >>> >>> Thanks >>> >>> Karthik >>> >>>
