Hi Ankur, I also tried setting a property to write parquet file size of 256MB. I am using pyspark below is how I set the property but it's not working for me. How did you set the property?
spark_context._jsc.hadoopConfiguration().setInt( "dfs.blocksize", 268435456) spark_context._jsc.hadoopConfiguration().setInt( "parquet.block.size", 268435) Thanks, Bijay On Fri, Jun 10, 2016 at 5:24 AM, Ankur Jain <ankur.j...@yash.com> wrote: > Thanks maropu.. It worked… > > > > *From:* Takeshi Yamamuro [mailto:linguin....@gmail.com] > *Sent:* 10 June 2016 11:47 AM > *To:* Ankur Jain > *Cc:* user@spark.apache.org > *Subject:* Re: Saving Parquet files to S3 > > > > Hi, > > > > You'd better off `setting parquet.block.size`. > > > > // maropu > > > > On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann < > daniel.siegm...@teamaol.com> wrote: > > I don't believe there's anyway to output files of a specific size. What > you can do is partition your data into a number of partitions such that the > amount of data they each contain is around 1 GB. > > > > On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <ankur.j...@yash.com> wrote: > > Hello Team, > > > > I want to write parquet files to AWS S3, but I want to size each file size > to 1 GB. > > Can someone please guide me on how I can achieve the same? > > > > I am using AWS EMR with spark 1.6.1. > > > > Thanks, > > Ankur > > Information transmitted by this e-mail is proprietary to YASH Technologies > and/ or its Customers and is intended for use only by the individual or > entity to which it is addressed, and may contain information that is > privileged, confidential or exempt from disclosure under applicable law. If > you are not the intended recipient or it appears that this mail has been > forwarded to you without proper authority, you are notified that any use or > dissemination of this information in any manner is strictly prohibited. In > such cases, please notify us immediately at i...@yash.com and delete this > mail from your records. > > > > > > > > -- > > --- > Takeshi Yamamuro > Information transmitted by this e-mail is proprietary to YASH Technologies > and/ or its Customers and is intended for use only by the individual or > entity to which it is addressed, and may contain information that is > privileged, confidential or exempt from disclosure under applicable law. If > you are not the intended recipient or it appears that this mail has been > forwarded to you without proper authority, you are notified that any use or > dissemination of this information in any manner is strictly prohibited. In > such cases, please notify us immediately at i...@yash.com and delete this > mail from your records. >