Re: GraphX and Spark

2014-11-04 Thread Kamal Banga
GraphX is build on *top* of Spark, so Spark can achieve whatever GraphX can. On Wed, Nov 5, 2014 at 9:41 AM, Deep Pradhan wrote: > Hi, > Can Spark achieve whatever GraphX can? > Keeping aside the performance comparison between Spark and GraphX, if I > want to implement any graph algorithm and I

Re: about aggregateByKey and standard deviation

2014-11-03 Thread Kamal Banga
I don't think directy .aggregateByKey() can be done, because we will need count of keys (for average). Maybe we can use .countByKey() which returns a map and .foldByKey(0)(_+_) (or aggregateByKey()) which gives sum of values per key. I myself ain't getting how to proceed. Regards On Fri, Oct 31,

Re: Using a Database to persist and load data from

2014-10-31 Thread Kamal Banga
You can also use PairRDDFunctions' saveAsNewAPIHadoopFile that takes an OutputFormat class. So you will have to write a custom OutputFormat class that extends OutputFormat. In this class, you will have to implement a getRecordWriter which returns a custom RecordWriter. So you will also have to writ

Re: Scaladoc

2014-10-30 Thread Kamal Banga
In IntelliJ, Tools > Generate Scaladoc. Kamal On Fri, Oct 31, 2014 at 5:35 AM, Alessandro Baretta wrote: > How do I build the scaladoc html files from the spark source distribution? > > Alex Bareta >

Re: What executes on worker and what executes on driver side

2014-10-28 Thread Kamal Banga
/spark-user/201310.mbox/%3CCAF_KkPwk7iiQVD2JzOwVVhQ_U2p3bPVM=-bka18v4s-5-lp...@mail.gmail.com%3E > > > Regards > - Saurabh Wadhawan > > > > On 20-Oct-2014, at 4:56 pm, Kamal Banga wrote: > > 1. All RDD operations are executed in workers. So reading a text file > or executing val x = 1

Re: Batch of updates

2014-10-28 Thread Kamal Banga
Hi Flavio, Doing batch += ... shouldn't work. It will create new batch for each element in the myRDD (also val initializes an immutable variable, var is for mutable variables). You can use something like accumulators . val a

Re: What executes on worker and what executes on driver side

2014-10-20 Thread Kamal Banga
1. All RDD operations are executed in workers. So reading a text file or executing val x = 1 will happen on worker. (link ) 2. a. Without braodcast: Let's say you have 'n' nodes. You can set hadoop's replication factor to n

Re: Spark Concepts

2014-10-20 Thread Kamal Banga
may contain confidential or legally privileged > information and is intended only for the use of the intended recipient(s). > Any unauthorized disclosure, dissemination, distribution, copying or the > taking of any action in reliance on the information herein is prohibited. > > -

preservesPartitioning

2014-07-17 Thread Kamal Banga
Hi All, The function *mapPartitions *in RDD.scala takes a boolean parameter *preservesPartitioning. *It seems if that parameter is passed as *false*, the passed function f will operate on the data only