Hi Ankur,

I also tried setting a property to write parquet file size of 256MB. I am
using pyspark below is how I set the property but it's not working for me.
How did you set the property?


spark_context._jsc.hadoopConfiguration().setInt( "dfs.blocksize", 268435456)
spark_context._jsc.hadoopConfiguration().setInt( "parquet.block.size",
268435)

Thanks,
Bijay

On Fri, Jun 10, 2016 at 5:24 AM, Ankur Jain <ankur.j...@yash.com> wrote:

> Thanks maropu.. It worked…
>
>
>
> *From:* Takeshi Yamamuro [mailto:linguin....@gmail.com]
> *Sent:* 10 June 2016 11:47 AM
> *To:* Ankur Jain
> *Cc:* user@spark.apache.org
> *Subject:* Re: Saving Parquet files to S3
>
>
>
> Hi,
>
>
>
> You'd better off `setting parquet.block.size`.
>
>
>
> // maropu
>
>
>
> On Thu, Jun 9, 2016 at 7:48 AM, Daniel Siegmann <
> daniel.siegm...@teamaol.com> wrote:
>
> I don't believe there's anyway to output files of a specific size. What
> you can do is partition your data into a number of partitions such that the
> amount of data they each contain is around 1 GB.
>
>
>
> On Thu, Jun 9, 2016 at 7:51 AM, Ankur Jain <ankur.j...@yash.com> wrote:
>
> Hello Team,
>
>
>
> I want to write parquet files to AWS S3, but I want to size each file size
> to 1 GB.
>
> Can someone please guide me on how I can achieve the same?
>
>
>
> I am using AWS EMR with spark 1.6.1.
>
>
>
> Thanks,
>
> Ankur
>
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at i...@yash.com and delete this
> mail from your records.
>
>
>
>
>
>
>
> --
>
> ---
> Takeshi Yamamuro
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at i...@yash.com and delete this
> mail from your records.
>

Reply via email to