subject:"Re\: question about combining small parquet files"

Re: question about combining small parquet files

2015-11-30 Thread Sabarish Sasidharan

You could use the number of input files to determine the number of output partitions. This assumes your input file sizes are deterministic. Else, you could also persist the RDD and then determine it's size using the apis. Regards Sab On 26-Nov-2015 11:13 pm, "Nezih Yigitbasi" wrote: > Hi Spark

Re: question about combining small parquet files

2015-11-30 Thread Nezih Yigitbasi

This looks interesting, thanks Ruslan. But, compaction with Hive is as simple as an insert overwrite statement as Hive supports CombineFileInputFormat, is it possible to do the same with Spark? On Thu, Nov 26, 2015 at 9:47 AM, Ruslan Dautkhanov wrote: > An interesting compaction approach of smal

Re: question about combining small parquet files

2015-11-26 Thread Ruslan Dautkhanov

An interesting compaction approach of small files is discussed recently http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ AFAIK Spark supports views too. -- Ruslan Dautkhanov On Thu, Nov 26, 2015 at 10:43 AM, Nezih Yigitbasi < nyigitb...@netflix

Re: question about combining small parquet files

Re: question about combining small parquet files

Re: question about combining small parquet files

3 matches

Site Navigation

Mail list logo

Footer information