RE: newbie question for reduce

2022-01-27 Thread Christopher Robson
Hi, The reduce lambda accepts as its first argument the return value of the previous execution. The first time, it is invoked with: x = ("a", 1), y = ("b", 2) And returns 1+2=3 Second time, it is invoked with x = 3, y = ("c", 3) so you can see why it raises the error that you are seeing. There a

Re: newbie question for reduce

2022-01-18 Thread Sean Owen
The problem is that you are reducing a list of tuples, but you are producing an int. The resulting int can't be combined with other tuples with your function. reduce() has to produce the same type as its arguments. rdd.map(lambda x: x[1]).reduce(lambda x,y: x+y) ... would work On Tue, Jan 18, 2022

Re: Newbie question on how to extract column value

2018-08-07 Thread James Starks
Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by val df = sparkSession.read.format("jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")load() df

Re: Newbie question on how to extract column value

2018-08-07 Thread Gourav Sengupta
Hi James, It is always advisable to use the latest SPARK version. That said, can you please giving a try to dataframes and udf if possible. I think, that would be a much scalable way to address the issue. Also in case possible, it is always advisable to use the filter option before fetching the d

Re: newbie question about RDD

2016-11-22 Thread Mohit Durgapal
Hi Raghav, Please refer to the following code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("PersonApp"); //creating java spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); //reading file from hfs into spark rdd , the name node is localhost JavaRDD p

Re: newbie question about RDD

2016-11-21 Thread Raghav
Sorry I forgot to ask how can I use spark context here ? I have hdfs directory path of the files, as well as the name node of hdfs cluster. Thanks for your help. On Mon, Nov 21, 2016 at 9:45 PM, Raghav wrote: > Hi > > I am extremely new to Spark. I have to read a file form HDFS, and get it > in

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Jon Gregg
Piggybacking off this - how are you guys teaching DataFrames and Datasets to new users? I haven't taken the edx courses but I don't see Spark SQL covered heavily in the syllabus. I've dug through the Databricks documentation but it's a lot of information for a new user I think - hoping there is a

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Rishikesh Teke
Integrate spark with apache zeppelin https://zeppelin.apache.org/ its again a very handy way to bootstrap with spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032p2

Re: Newbie question - Best way to bootstrap with Spark

2016-11-10 Thread jggg777
A couple options: (1) You can start locally by downloading Spark to your laptop: http://spark.apache.org/downloads.html , then jump into the Quickstart docs: http://spark.apache.org/docs/latest/quick-start.html (2) There is a free Databricks community edition that runs on AWS: https://databricks.

Re: Newbie question - Best way to bootstrap with Spark

2016-11-07 Thread Raghav
Thanks a ton, guys. On Sun, Nov 6, 2016 at 4:57 PM, raghav wrote: > I am newbie in the world of big data analytics, and I want to teach myself > Apache Spark, and want to be able to write scripts to tinker with data. > > I have some understanding of Map Reduce but have not had a chance to get my

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee
The one you're looking for is the Data Sciences and Engineering with Apache Spark at https://www.edx.org/xseries/data-science-engineering-apacher-sparktm. Note, a great quick start is the Getting Started with Apache Spark on Databricks at https://databricks.com/product/getting-started-guide HTH!

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Raghav
Can you please point out the right courses from EDX/Berkeley ? Many thanks. On Sun, Nov 6, 2016 at 6:08 PM, ayan guha wrote: > I would start with Spark documentation, really. Then you would probably > start with some older videos from youtube, especially spark summit > 2014,2015 and 2016 videos

Re: Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread warmb...@qq.com
.com From: ayan guha Date: 2016-11-07 10:08 To: raghav CC: user Subject: Re: Newbie question - Best way to bootstrap with Spark I would start with Spark documentation, really. Then you would probably start with some older videos from youtube, especially spark summit 2014,2015 and 2016 videos. Rega

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread ayan guha
I would start with Spark documentation, really. Then you would probably start with some older videos from youtube, especially spark summit 2014,2015 and 2016 videos. Regading practice, I would strongly suggest Databricks cloud (or download prebuilt from spark site). You can also take courses from E

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
Right, well I don’t think the issue is with how you’re compiling the scala. I think it’s a conflict between different versions of several libs. I had similar issues with my spark modules. You need to make sure you’re not loading a different version of the same lib that is clobbering another depe

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
Added these to the pom and still the same error :-(. I will look into sbt as well. On Fri, Mar 11, 2016 at 2:31 PM, Tristan Nixon wrote: > You must be relying on IntelliJ to compile your scala, because you haven’t > set up any scala plugin to compile it from maven. > You should have something

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Tristan Nixon
You must be relying on IntelliJ to compile your scala, because you haven’t set up any scala plugin to compile it from maven. You should have something like this in your plugins: net.alchim31.maven scala-maven-plugin scala-compile-first process-resources compile

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
Hi, Doh! My eyes are bleeding to go through XMLs... 😁 Where did you specify Scala version? Dunno how it's in maven. p.s. I *strongly* recommend sbt. Jacek 11.03.2016 8:04 PM "Vasu Parameswaran" napisał(a): > Thanks Jacek. Pom is below (Currenlty set to 1.6.1 spark but I started > out with 1.

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
Thanks Jacek. Pom is below (Currenlty set to 1.6.1 spark but I started out with 1.6.0 with the same problem). http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.or

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Vasu Parameswaran
Thanks Ted. I haven't explicitly specified Scala (I tried different versions in pom.xml as well). For what it is worth, this is what I get when I do a maven dependency tree. I wonder if the 2.11.2 coming from scala-reflect matters: [INFO] | | \- org.scala-lang:scalap:jar:2.11.0:compile [I

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Jacek Laskowski
Hi, Why do you use maven not sbt for Scala? Can you show the entire pom.xml and the command to execute the app? Jacek 11.03.2016 7:33 PM "vasu20" napisał(a): > Hi > > Any help appreciated on this. I am trying to write a Spark program using > IntelliJ. I get a run time error as soon as new Sp

Re: Newbie question - Help with runtime error on augmentString

2016-03-11 Thread Ted Yu
Looks like Scala version mismatch. Are you using 2.11 everywhere ? On Fri, Mar 11, 2016 at 10:33 AM, vasu20 wrote: > Hi > > Any help appreciated on this. I am trying to write a Spark program using > IntelliJ. I get a run time error as soon as new SparkConf() is called from > main. Top few li

Re: Newbie question

2016-01-07 Thread dEEPU
If the method is not final or static then u can On Jan 8, 2016 12:07 PM, yuliya Feldman wrote: Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Re: Newbie question

2016-01-07 Thread yuliya Feldman
Thank you From: Deepak Sharma To: yuliya Feldman Cc: "user@spark.apache.org" Sent: Thursday, January 7, 2016 10:41 PM Subject: Re: Newbie question Yes , you can do it unless the method is marked static/final.Most of the methods in SparkContext are marked static so

Re: Newbie question

2016-01-07 Thread censj
You can try it. > 在 2016年1月8日,14:44,yuliya Feldman 写道: > > invoked

Re: Newbie question

2016-01-07 Thread yuliya Feldman
e.org" Sent: Thursday, January 7, 2016 10:38 PM Subject: Re: Newbie question why to override a method from SparkContext? 在 2016年1月8日,14:36,yuliya Feldman 写道: Hello, I am new to Spark and have a most likely basic question - can I override a method from SparkContext? Thanks

Re: Newbie question

2016-01-07 Thread Deepak Sharma
Yes , you can do it unless the method is marked static/final. Most of the methods in SparkContext are marked static so you can't over ride them definitely , else over ride would work usually. Thanks Deepak On Fri, Jan 8, 2016 at 12:06 PM, yuliya Feldman wrote: > Hello, > > I am new to Spark and

Re: Newbie question

2016-01-07 Thread censj
why to override a method from SparkContext? > 在 2016年1月8日,14:36,yuliya Feldman 写道: > > Hello, > > I am new to Spark and have a most likely basic question - can I override a > method from SparkContext? > > Thanks

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Corey Nolet
1) Spark only needs to shuffle when data needs to be partitioned around the workers in an all-to-all fashion. 2) Multi-stage jobs that would normally require several map reduce jobs, thus causing data to be dumped to disk between the jobs can be cached in memory.

Re: Newbie question: what makes Spark run faster than MapReduce

2015-08-07 Thread Hien Luu
This blog outlines a few things that make Spark faster than MapReduce - https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html On Fri, Aug 7, 2015 at 9:13 AM, Muler wrote: > Consider the classic word count application over a 4 node cluster with a > sizable working data. What makes Spark

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
Thanks! On Wed, Aug 5, 2015 at 5:24 PM, Saisai Shao wrote: > Yes, finally shuffle data will be written to disk for reduce stage to > pull, no matter how large you set to shuffle memory fraction. > > Thanks > Saisai > > On Thu, Aug 6, 2015 at 7:50 AM, Muler wrote: > >> thanks, so if I have enoug

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Yes, finally shuffle data will be written to disk for reduce stage to pull, no matter how large you set to shuffle memory fraction. Thanks Saisai On Thu, Aug 6, 2015 at 7:50 AM, Muler wrote: > thanks, so if I have enough large memory (with enough > spark.shuffle.memory) then shuffle (in-memory

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Muler
thanks, so if I have enough large memory (with enough spark.shuffle.memory) then shuffle (in-memory shuffle) spill doesn't happen (per node) but still shuffle data has to be ultimately written to disk so that reduce stage pulls if across network? On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao wrote:

Re: Newbie question: can shuffle avoid writing and reading from disk?

2015-08-05 Thread Saisai Shao
Hi Muler, Shuffle data will be written to disk, no matter how large memory you have, large memory could alleviate shuffle spill where temporary file will be generated if memory is not enough. Yes, each node writes shuffle data to file and pulled from disk in reduce stage from network framework (d

Re: Newbie Question on How Tasks are Executed

2015-01-19 Thread davidkl
Hello Mixtou, if you want to look at partition ID, I believe you want to use mapPartitionsWithIndex -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-Question-on-How-Tasks-are-Executed-tp21064p21228.html Sent from the Apache Spark User List mailing li

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Akhil Das
Your proxy/dns could be blocking it. Thanks Best Regards On Tue, Oct 28, 2014 at 4:06 PM, Yanbo Liang wrote: > Maybe you had wrong configuration of sbt proxy. > > 2014-10-28 18:27 GMT+08:00 nl19856 : > >> Hi, >> I have downloaded the binary spark distribution. >> When building the package with

Re: newbie question quickstart example sbt issue

2014-10-28 Thread nl19856
Sigh! Sorry I did not read the error message properly. 2014-10-28 11:39 GMT+01:00 Yanbo Liang [via Apache Spark User List] < ml-node+s1001560n17478...@n3.nabble.com>: > Maybe you had wrong configuration of sbt proxy. > > 2014-10-28 18:27 GMT+08:00 nl19856 <[hidden email] >

Re: newbie question quickstart example sbt issue

2014-10-28 Thread Yanbo Liang
Maybe you had wrong configuration of sbt proxy. 2014-10-28 18:27 GMT+08:00 nl19856 : > Hi, > I have downloaded the binary spark distribution. > When building the package with sbt package I get the following: > [root@nlvora157 ~]# sbt package > [info] Set current project to Simple Project (in buil