Re: Strategies for propery load-balanced partitioning

2016-06-03 Thread Takeshi Yamamuro
Hi, you can control this kinda issue in the comming v2.0. See https://www.mail-archive.com/user@spark.apache.org/msg51603.html // maropu On Sat, Jun 4, 2016 at 10:23 AM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Hi Saif! > > > > When you say this happens with spark-csv, are the

Re: Strategies for propery load-balanced partitioning

2016-06-03 Thread Silvio Fiorito
Hi Saif! When you say this happens with spark-csv, are the files gzipped by any chance? GZip is non-splittable so if you’re seeing skew simply from loading data it could be you have some extremely large gzip files. So for a single stage job you will have those tasks lagging compared to the smal

RE: Strategies for propery load-balanced partitioning

2016-06-03 Thread Saif.A.Ellafi
A. Cc: user; Reynold Xin; mich...@databricks.com Subject: Re: Strategies for propery load-balanced partitioning I suppose you are running on 1.6. I guess you need some solution based on [1], [2] features which are coming in 2.0. [1] https://issues.apache.org/jira/browse/SPARK-12538 /

Re: Strategies for propery load-balanced partitioning

2016-06-03 Thread Ovidiu-Cristian MARCU
I suppose you are running on 1.6. I guess you need some solution based on [1], [2] features which are coming in 2.0. [1] https://issues.apache.org/jira/browse/SPARK-12538 / https://issues.apache.org/jira/browse/SPARK-12394