from:"Qiang Li"

Re: Can not control bucket files number if it was speficed

2016-09-19 Thread Qiang Li

Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising f

Re: Can not control bucket files number if it was speficed

2016-09-17 Thread Qiang Li

AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical cont

Can not control bucket files number if it was speficed

2016-09-17 Thread Qiang Li

Hi, I use spark to generate data , then we use hive/pig/presto/spark to analyze data, but I found even I add used bucketBy and sortBy with bucket number in Spark, the results files was generate by Spark is always far more than bucket number under each partition, then Presto can not recognize the b

Re: Spark output data to S3 is very slow

2016-09-17 Thread Qiang Li

ive.com/user@spark.apache.org/msg56791.html > > // maropu > > > On Sat, Sep 17, 2016 at 11:34 AM, Qiang Li wrote: > >> Hi, >> >> >> I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very >> quickly, but the last step, spark spend lots of time t

Spark output data to S3 is very slow

2016-09-16 Thread Qiang Li

Hi, I ran some jobs with Spark 2.0 on Yarn, I found all tasks finished very quickly, but the last step, spark spend lots of time to rename or move data from s3 temporary directory to real directory, then I try to set spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.exec

Re: Can not control bucket files number if it was speficed

Re: Can not control bucket files number if it was speficed

Can not control bucket files number if it was speficed

Re: Spark output data to S3 is very slow

Spark output data to S3 is very slow

5 matches

Site Navigation

Mail list logo

Footer information