Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread Michael Armbrust
this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19789.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread tridib
user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19789.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsu

Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread tridib
Ohh...how can I miss that. :(. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19788.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread Michael Armbrust
true); //tried with false also. Tried > repartition(1) too. > > claimSchemaRdd.saveAsParquetFile(parquetPath); > } > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-Java

Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread tridib
th false also. Tried repartition(1) too. claimSchemaRdd.saveAsParquetFile(parquetPath); } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19776.html Sent from the Apache Spark Us

Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread tridib
ilter(new NullFilter()); JavaSchemaRDD claimSchemaRdd = sqlCtx.applySchema(claimRdd, Claim.class); claimSchemaRdd.coalesce(1) claimSchemaRdd.saveAsParquetFile(parquetPath); } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-pa

Re: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread Michael Armbrust
gt; sc.hadoopConfiguration().setInt("parquet.block.size", MB_128); > > No luck. > Is there a way to control the size/number of parquet files generated? > > Thanks > Tridib > > > > -- > View this message in context: > http://apache-spark-user-list.

RE: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread Naveen Kumar Pokala
- From: tridib [mailto:tridib.sama...@live.com] Sent: Tuesday, November 25, 2014 9:54 AM To: u...@spark.incubator.apache.org Subject: Control number of parquet generated from JavaSchemaRDD Hello, I am reading around 1000 input files from disk in an RDD and generating parquet. It always produces

Control number of parquet generated from JavaSchemaRDD

2014-11-24 Thread tridib
user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apach