If you takes time to actually learn Scala starting from its fundamental
concepts AND quite importantly get familiar with general functional
programming concepts, you'd immediately realize the things that you'd
really miss going back to Java (8).
On Fri, Jul 17, 2015 at 8:14 AM Wojciech Pituła wr
Hi,
This is an ugly solution because it requires pulling out a row:
val rdd: RDD[Row] = ...
ctx.createDataFrame(rdd, rdd.first().schema)
Is there a better alternative to get a DataFrame from an RDD[Row] since
toDF won't work as Row is not a Product ?
Thanks,
Marius
. Then run a map operation to perform the
> join and whatever else you need to do. This will remove a shuffle stage but
> you will still have to collect the joined RDD and broadcast it. All depends
> on the size of your data if it’s worth it or not.
>
> From: Marius Danciu
> Date:
Hi all,
If I have something like:
rdd.join(...).mapPartitionToPair(...)
It looks like mapPartitionToPair runs in a different stage then join. Is
there a way to piggyback this computation inside the join stage ? ... such
that each result partition after join is passed to
the mapPartitionToPair fu
Turned out that is was sufficient do to repartitionAndSortWithinPartitions
... so far so good ;)
On Tue, May 5, 2015 at 9:45 AM Marius Danciu
wrote:
> Hi Imran,
>
> Yes that's what MyPartitioner does. I do see (using traces from
> MyPartitioner) that the key is partitioned o
the same, but most probably close enough, and avoids doing
> another expensive shuffle). If you can share a bit more information on
> your partitioner, and what properties you need for your "f", that might
> help.
>
> thanks,
> Imran
>
>
> On Tue, Apr 28, 2015
need to sort and repartition, try using
> repartitionAndSortWithinPartitions to do it in one shot.
>
> Thanks,
> Silvio
>
> From: Marius Danciu
> Date: Tuesday, April 28, 2015 at 8:10 AM
> To: user
> Subject: Spark partitioning question
>
>
Hello all,
I have the following Spark (pseudo)code:
rdd = mapPartitionsWithIndex(...)
.mapPartitionsToPair(...)
.groupByKey()
.sortByKey(comparator)
.partitionBy(myPartitioner)
.mapPartitionsWithIndex(...)
.mapPartitionsToPair( *f* )
The input data
Thank you Iulian ! That's precisely what I discovered today.
Best,
Marius
On Wed, Apr 22, 2015 at 3:31 PM Iulian Dragoș
wrote:
> On Tue, Apr 21, 2015 at 2:38 PM, Marius Danciu
> wrote:
>
>> Hello anyone,
>>
>> I have a question regarding the sort shuffle. Rou
Anyone ?
On Tue, Apr 21, 2015 at 3:38 PM Marius Danciu
wrote:
> Hello anyone,
>
> I have a question regarding the sort shuffle. Roughly I'm doing something
> like:
>
> rdd.mapPartitionsToPair(f1).groupByKey().mapPartitionsToPair(f2)
>
> The problem is that in
Hello anyone,
I have a question regarding the sort shuffle. Roughly I'm doing something
like:
rdd.mapPartitionsToPair(f1).groupByKey().mapPartitionsToPair(f2)
The problem is that in f2 I don't see the keys being sorted. The keys are
Java Comparable not scala.math.Ordered or scala.math.Ordering
11 matches
Mail list logo