from:"Marius Danciu"

Re: Java 8 vs Scala

2015-07-16 Thread Marius Danciu

If you takes time to actually learn Scala starting from its fundamental concepts AND quite importantly get familiar with general functional programming concepts, you'd immediately realize the things that you'd really miss going back to Java (8). On Fri, Jul 17, 2015 at 8:14 AM Wojciech Pituła wr

DataFrame from RDD[Row]

2015-07-16 Thread Marius Danciu

Hi, This is an ugly solution because it requires pulling out a row: val rdd: RDD[Row] = ... ctx.createDataFrame(rdd, rdd.first().schema) Is there a better alternative to get a DataFrame from an RDD[Row] since toDF won't work as Row is not a Product ? Thanks, Marius

Re: Optimizations

2015-07-03 Thread Marius Danciu

. Then run a map operation to perform the > join and whatever else you need to do. This will remove a shuffle stage but > you will still have to collect the joined RDD and broadcast it. All depends > on the size of your data if it’s worth it or not. > > From: Marius Danciu > Date:

Optimizations

2015-07-03 Thread Marius Danciu

Hi all, If I have something like: rdd.join(...).mapPartitionToPair(...) It looks like mapPartitionToPair runs in a different stage then join. Is there a way to piggyback this computation inside the join stage ? ... such that each result partition after join is passed to the mapPartitionToPair fu

Re: Spark partitioning question

2015-05-05 Thread Marius Danciu

Turned out that is was sufficient do to repartitionAndSortWithinPartitions ... so far so good ;) On Tue, May 5, 2015 at 9:45 AM Marius Danciu wrote: > Hi Imran, > > Yes that's what MyPartitioner does. I do see (using traces from > MyPartitioner) that the key is partitioned o

Re: Spark partitioning question

2015-05-04 Thread Marius Danciu

the same, but most probably close enough, and avoids doing > another expensive shuffle). If you can share a bit more information on > your partitioner, and what properties you need for your "f", that might > help. > > thanks, > Imran > > > On Tue, Apr 28, 2015

Re: Spark partitioning question

2015-04-28 Thread Marius Danciu

need to sort and repartition, try using > repartitionAndSortWithinPartitions to do it in one shot. > > Thanks, > Silvio > > From: Marius Danciu > Date: Tuesday, April 28, 2015 at 8:10 AM > To: user > Subject: Spark partitioning question > >

Spark partitioning question

2015-04-28 Thread Marius Danciu

Hello all, I have the following Spark (pseudo)code: rdd = mapPartitionsWithIndex(...) .mapPartitionsToPair(...) .groupByKey() .sortByKey(comparator) .partitionBy(myPartitioner) .mapPartitionsWithIndex(...) .mapPartitionsToPair( *f* ) The input data

Re: Shuffle question

2015-04-22 Thread Marius Danciu

Thank you Iulian ! That's precisely what I discovered today. Best, Marius On Wed, Apr 22, 2015 at 3:31 PM Iulian Dragoș wrote: > On Tue, Apr 21, 2015 at 2:38 PM, Marius Danciu > wrote: > >> Hello anyone, >> >> I have a question regarding the sort shuffle. Rou

Re: Shuffle question

2015-04-22 Thread Marius Danciu

Anyone ? On Tue, Apr 21, 2015 at 3:38 PM Marius Danciu wrote: > Hello anyone, > > I have a question regarding the sort shuffle. Roughly I'm doing something > like: > > rdd.mapPartitionsToPair(f1).groupByKey().mapPartitionsToPair(f2) > > The problem is that in

Shuffle question

2015-04-21 Thread Marius Danciu

Hello anyone, I have a question regarding the sort shuffle. Roughly I'm doing something like: rdd.mapPartitionsToPair(f1).groupByKey().mapPartitionsToPair(f2) The problem is that in f2 I don't see the keys being sorted. The keys are Java Comparable not scala.math.Ordered or scala.math.Ordering

Re: Java 8 vs Scala

DataFrame from RDD[Row]

Re: Optimizations

Optimizations

Re: Spark partitioning question

Re: Spark partitioning question

Re: Spark partitioning question

Spark partitioning question

Re: Shuffle question

Re: Shuffle question

Shuffle question

11 matches

Site Navigation

Mail list logo

Footer information