subject:"Parquet partitioning for unique identifier"

Re: Parquet partitioning for unique identifier

2015-09-04 Thread Cheng Lian

What version of Spark were you using? Have you tried increasing --executor-memory? This schema looks pretty normal. And Parquet stores all keys of a map in a single column. Cheng On 9/4/15 4:00 PM, Kohki Nishio wrote: The stack trace is this java.lang.OutOfMemoryError: Java heap space

Re: Parquet partitioning for unique identifier

2015-09-04 Thread Kohki Nishio

The stack trace is this java.lang.OutOfMemoryError: Java heap space at parquet.bytes.CapacityByteArrayOutputStream.initSlabs(CapacityByteArrayOutputStream.java:65) at parquet.bytes.CapacityByteArrayOutputStream.(CapacityByteArrayOutputStream.java:57) at parquet.column.va

Re: Parquet partitioning for unique identifier

2015-09-03 Thread Cheng Lian

Could you please provide the full stack track of the OOM exception? Another common case of Parquet OOM is super wide tables, say hundred or thousands of columns. And in this case, the number of rows is mostly irrelevant. Cheng On 9/4/15 1:24 AM, Kohki Nishio wrote: let's say I have a data li

Re: Parquet partitioning for unique identifier

2015-09-03 Thread Kohki Nishio

let's say I have a data like htis ID | Some1 | Some2| Some3 | A1 | kdsfajfsa | dsafsdafa | fdsfafa | A2 | dfsfafasd | 23jfdsjkj | 980dfs | A3 | 99989df | jksdljas | 48dsaas | .. Z00.. | fdsafdsfa | fdsdafdas | 89sdaff | My understanding is that if I giv

Re: Parquet partitioning for unique identifier

2015-09-02 Thread Adrien Mogenet

Any code / Parquet schema to provide? I'm not sure to understand which step fails right there... On 3 September 2015 at 04:12, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > Did you specify partitioning column while saving data.. > On Sep 3, 2015 5:41 AM, "Kohki Nishio" wrote: > >>

Re: Parquet partitioning for unique identifier

2015-09-02 Thread Raghavendra Pandey

Did you specify partitioning column while saving data.. On Sep 3, 2015 5:41 AM, "Kohki Nishio" wrote: > Hello experts, > > I have a huge json file (> 40G) and trying to use Parquet as a file > format. Each entry has a unique identifier but other than that, it doesn't > have 'well balanced value'

Parquet partitioning for unique identifier

2015-09-02 Thread Kohki Nishio

Hello experts, I have a huge json file (> 40G) and trying to use Parquet as a file format. Each entry has a unique identifier but other than that, it doesn't have 'well balanced value' column to partition it. Right now it just throws OOM and couldn't figure out what to do with it. It would be ide

Re: Parquet partitioning for unique identifier

Re: Parquet partitioning for unique identifier

Re: Parquet partitioning for unique identifier

Re: Parquet partitioning for unique identifier

Re: Parquet partitioning for unique identifier

Re: Parquet partitioning for unique identifier

Parquet partitioning for unique identifier

7 matches

Site Navigation

Mail list logo

Footer information