Re: Decrease shuffle in TreeAggregate with coalesce ?

2016-04-28 Thread Guillaume Pitel
Long story short, regarding the performance issue, it appeared with recompiled version of the source TGZ downloaded from spark website. Problem disappears with 1.6.2-SNAPSHOT (branch-1.6) Guillaume Do you have code which can reproduce this performance drop in treeReduce? It would be helpful t

Re: Decrease shuffle in TreeAggregate with coalesce ?

2016-04-28 Thread Guillaume Pitel
eByKey_, let's say partitions are numbered like that : (0,p0),(1,p1),(2,p2),(3,p3) Then after the modulo, (0,p0),(1,p1),(0,p2),(1,p3) As a consequence, W1 will shuffle p2 to W0 and W0 will shuffle p1 to W1. Guillaume On Wed, Apr 27, 2016 at 4:46 AM, Guillaume Pitel mailto:guillaume.pi

Decrease shuffle in TreeAggregate with coalesce ?

2016-04-27 Thread Guillaume Pitel
eems below optimality to me. There is a huge shuffle cost, while a simple coalesce followed by a partition-level aggregation would probably perfectly do the job. Have I missed something that requires to do this reshuffle ? Best regards Guillaume Pitel

Re: Directly broadcasting (sort of) RDDs

2015-03-23 Thread Guillaume Pitel
his for map joins in Hive on Spark and Spark SQL. -Sandy On Sat, Mar 21, 2015 at 3:11 AM, Guillaume Pitel wrote: Hi, Thanks for your answer. This is precisely the use case I'm interested in, but I know it already, I should have mentionned it. Unfortunately this implementation of BlockM

Re: Directly broadcasting (sort of) RDDs

2015-03-21 Thread Guillaume Pitel
/ml-matrix Another lib: https://github.com/PasaLab/marlin/blob/master/README.md — Sent from Mailbox On Sat, Mar 21, 2015 at 12:24 AM, Guillaume Pitel wrote: Hi, I have an idea that I would like to discuss with the Spark devs. The idea comes from a very real problem that I have stru

Directly broadcasting (sort of) RDDs

2015-03-20 Thread Guillaume Pitel
Guillaume -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Maven profile in MLLib netlib-lgpl not working (1.1.1)

2014-12-10 Thread Guillaume Pitel
g> For additional commands, e-mail: user-h...@spark.apache.org <mailto:user-h...@spark.apache.org> -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.com/> 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: TorrentBroadcast slow performance

2014-10-09 Thread Guillaume Pitel
and nodes? Matei On Oct 7, 2014, at 11:42 AM, Davies Liu wrote: Could you create a JIRA for it? maybe it's a regression after https://issues.apache.org/jira/browse/SPARK-3119. We will appreciate that if you could tell how to reproduce it. On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel

TorrentBroadcast slow performance

2014-10-06 Thread Guillaume Pitel
ation error from our side, but are unable to pin it down. Does someone have any idea of the origin of the problem ? For now we're sticking with the HttpBroadcast workaround. Guillaume -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. <http://www.exensa.