[mailto:daniel.dara...@lynxanalytics.com]
Enviado el: lunes, 21 de marzo de 2016 16:20
Para: Ted Yu
CC: JOAQUIN GUANTER GONZALBEZ ;
dev@spark.apache.org
Asunto: Re: Performance improvements for sorted RDDs
There is related discussion in
https://issues.apache.org/jira/browse/SPARK-8836. It's not too ha
There is related discussion in
https://issues.apache.org/jira/browse/SPARK-8836. It's not too hard to
implement this without modifying Spark and we measured ~10x improvement
over plain RDD joins. I haven't benchmarked against DataFrames -- maybe
they also realize this performance advantage.
On Mon
Do you have performance numbers to backup this proposal for cogroup
operation ?
Thanks
On Mon, Mar 21, 2016 at 1:06 AM, JOAQUIN GUANTER GONZALBEZ <
joaquin.guantergonzal...@telefonica.com> wrote:
> Hello devs,
>
>
>
> I have found myself in a situation where Spark is doing sub-optimal
> computat