subject:"Re\: Performance improvements for sorted RDDs"

RE: Performance improvements for sorted RDDs

2016-03-21 Thread JOAQUIN GUANTER GONZALBEZ

[mailto:daniel.dara...@lynxanalytics.com] Enviado el: lunes, 21 de marzo de 2016 16:20 Para: Ted Yu CC: JOAQUIN GUANTER GONZALBEZ ; dev@spark.apache.org Asunto: Re: Performance improvements for sorted RDDs There is related discussion in https://issues.apache.org/jira/browse/SPARK-8836. It's not too ha

Re: Performance improvements for sorted RDDs

2016-03-21 Thread Daniel Darabos

There is related discussion in https://issues.apache.org/jira/browse/SPARK-8836. It's not too hard to implement this without modifying Spark and we measured ~10x improvement over plain RDD joins. I haven't benchmarked against DataFrames -- maybe they also realize this performance advantage. On Mon

Re: Performance improvements for sorted RDDs

2016-03-21 Thread Ted Yu

Do you have performance numbers to backup this proposal for cogroup operation ? Thanks On Mon, Mar 21, 2016 at 1:06 AM, JOAQUIN GUANTER GONZALBEZ < joaquin.guantergonzal...@telefonica.com> wrote: > Hello devs, > > > > I have found myself in a situation where Spark is doing sub-optimal > computat