What version of Spark were you using? Have you tried increasing
--executor-memory?
This schema looks pretty normal. And Parquet stores all keys of a map in
a single column.
Cheng
On 9/4/15 4:00 PM, Kohki Nishio wrote:
The stack trace is this
java.lang.OutOfMemoryError: Java heap space
The stack trace is this
java.lang.OutOfMemoryError: Java heap space
at
parquet.bytes.CapacityByteArrayOutputStream.initSlabs(CapacityByteArrayOutputStream.java:65)
at
parquet.bytes.CapacityByteArrayOutputStream.(CapacityByteArrayOutputStream.java:57)
at
parquet.column.va
Could you please provide the full stack track of the OOM exception?
Another common case of Parquet OOM is super wide tables, say hundred or
thousands of columns. And in this case, the number of rows is mostly
irrelevant.
Cheng
On 9/4/15 1:24 AM, Kohki Nishio wrote:
let's say I have a data li
let's say I have a data like htis
ID | Some1 | Some2| Some3 |
A1 | kdsfajfsa | dsafsdafa | fdsfafa |
A2 | dfsfafasd | 23jfdsjkj | 980dfs |
A3 | 99989df | jksdljas | 48dsaas |
..
Z00.. | fdsafdsfa | fdsdafdas | 89sdaff |
My understanding is that if I giv
Any code / Parquet schema to provide? I'm not sure to understand which step
fails right there...
On 3 September 2015 at 04:12, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:
> Did you specify partitioning column while saving data..
> On Sep 3, 2015 5:41 AM, "Kohki Nishio" wrote:
>
>>
Did you specify partitioning column while saving data..
On Sep 3, 2015 5:41 AM, "Kohki Nishio" wrote:
> Hello experts,
>
> I have a huge json file (> 40G) and trying to use Parquet as a file
> format. Each entry has a unique identifier but other than that, it doesn't
> have 'well balanced value'
Hello experts,
I have a huge json file (> 40G) and trying to use Parquet as a file format.
Each entry has a unique identifier but other than that, it doesn't have
'well balanced value' column to partition it. Right now it just throws OOM
and couldn't figure out what to do with it.
It would be ide