subject:"RE\: Parquet file size"

Re: Parquet file size

2015-10-08 Thread Cheng Lian

Lian; user@spark.apache.org *Subject:* Re: Parquet file size Hi, In our case, we're using the org.apache.hadoop.mapreduce.lib.input.FileInputFormat.SPLIT_MINSIZE to increase the size of the RDD partitions when loading text files, so it would generate larger parquet files. We just set

RE: Parquet file size

2015-10-07 Thread Younes Naguib

orld.com> From: odeach...@gmail.com [odeach...@gmail.com] on behalf of Deng Ching-Mallete [och...@apache.org] Sent: Wednesday, October 07, 2015 9:14 PM To: Younes Naguib Cc: Cheng Lian; user@spark.apache.org Subject: Re: Parquet file size Hi, In our case, we&#

Re: Parquet file size

2015-10-07 Thread Deng Ching-Mallete

el.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib > @tritondigital.com > -- > *From:* Cheng Lian [lian.cs@gmail.com] > *Sent:* Wednesday, October 07, 2015 7:01 PM > > *To:* Younes Naguib; 'user@spark.apache.org' >

RE: Parquet file size

2015-10-07 Thread Younes Naguib

7:01 PM To: Younes Naguib; 'user@spark.apache.org' Subject: Re: Parquet file size The reason why so many small files are generated should probably be the fact that you are inserting into a partitioned table with three partition columns. If you want a large Parquet files, you may try

Re: Parquet file size

2015-10-07 Thread Cheng Lian

:* Younes Naguib; 'user@spark.apache.org' *Subject:* Re: Parquet file size Why do you want larger files? Doesn't the result Parquet file contain all the data in the original TSV file? Cheng On 10/7/15 11:07 AM, Younes Naguib wrote: Hi, I’m reading a large tsv file, and c

RE: Parquet file size

2015-10-07 Thread Younes Naguib

The TSV original files is 600GB and generated 40k files of 15-25MB. y From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: October-07-15 3:18 PM To: Younes Naguib; 'user@spark.apache.org' Subject: Re: Parquet file size Why do you want larger files? Doesn't the result Parquet f

Re: Parquet file size

2015-10-07 Thread Cheng Lian

Why do you want larger files? Doesn't the result Parquet file contain all the data in the original TSV file? Cheng On 10/7/15 11:07 AM, Younes Naguib wrote: Hi, I’m reading a large tsv file, and creating parquet files using sparksql: insert overwrite table tbl partition(year, month, day)..

Re: Parquet file size

RE: Parquet file size

Re: Parquet file size

RE: Parquet file size

Re: Parquet file size

RE: Parquet file size

Re: Parquet file size

7 matches

Site Navigation

Mail list logo

Footer information