Pinging back to see if anybody could provide me with some pointers on hot
to stream/batch JSON-to-ORC conversion in Spark SQL or why I get an OOM
dump with such small memory footprint?
Thanks,
Alec
On Wed, Nov 15, 2017 at 11:03 AM, Alec Swan wrote:
> Thanks Steve and Vadim for the feedback.
>
Thanks Steve and Vadim for the feedback.
@Steve, are you suggesting creating a custom receiver and somehow piping it
through Spark Streaming/Spark SQL? Or are you suggesting creating smaller
datasets from the stream and using my original code to process smaller
datasets? It'd be very helpful for a
There's a lot of off-heap memory involved in decompressing Snappy,
compressing ZLib.
Since you're running using `local[*]`, you process multiple tasks
simultaneously, so they all might consume memory.
I don't think that increasing heap will help, since it looks like you're
hitting system memory l
On 14 Nov 2017, at 15:32, Alec Swan
mailto:alecs...@gmail.com>> wrote:
But I wonder if there is a way to stream/batch the content of JSON file in
order to convert it to ORC piecemeal and avoid reading the whole JSON file in
memory in the first place?
That is what you'll need to do; you'd
Thanks all. I am not submitting a spark job explicitly. Instead, I am using
the Spark library functionality embedded in my web service as shown in the
code I included in the previous email. So, effectively Spark SQL runs in
the web service's JVM. Therefore, --driver-memory option would not (and did
If you are running Spark with local[*] as master, there will be a single
process whose memory will be controlled by --driver-memory command line
option to spark submit. Check
http://spark.apache.org/docs/latest/configuration.html
spark.driver.memory 1g Amount of memory to use for the driver proce
https://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 6:22 PM, Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format. Effectively, my Java service starts up an embe
Hi Joel,
Here are the relevant snippets of my code and an OOM error thrown
in frameWriter.save(..). Surprisingly, the heap dump is pretty small ~60MB
even though I am running with -Xmx10G and 4G executor and driver memory as
shown below.
SparkConf sparkConf = new SparkConf()
Have you tried increasing driver, exec mem (gc overhead too if required)?
your code snippet and stack trace will be helpful.
On Mon, Nov 13, 2017 at 7:23 PM Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format. Effectively, my Java service
Hello,
I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
format. Effectively, my Java service starts up an embedded Spark cluster
(master=local[*]) and uses Spark SQL to convert JSON to ORC. However, I
keep getting OOM errors with large (~1GB) files.
I've tried different ways
10 matches
Mail list logo