You can get some more insights by using the Spark history server
(http://spark.apache.org/docs/latest/monitoring.html), it can show you
which task is failing and some other information that might help you
debugging the issue.
On 05/10/2016 19:00, Babak Alipour wrote:
> The issue seems to lie in t
The issue seems to lie in the RangePartitioner trying to create equal
ranges. [1]
[1] https://spark.apache.org/docs/2.0.0/api/java/org/apache/
spark/RangePartitioner.html
The *Double* values I'm trying to sort are mostly in the range [0,1] (~70%
of the data which roughly equates 1 billion record
Thanks Vadim for sharing your experience, but I have tried multi-JVM setup
(2 workers), various sizes for spark.executor.memory (8g, 16g, 20g, 32g,
64g) and spark.executor.core (2-4), same error all along.
As for the files, these are all .snappy.parquet files, resulting from
inserting some data fr
oh, and try to run even smaller executors, i.e. with
`spark.executor.memory` <= 16GiB. I wonder what result you're going to get.
On Sun, Oct 2, 2016 at 1:24 AM, Vadim Semenov
wrote:
> > Do you mean running a multi-JVM 'cluster' on the single machine?
> Yes, that's what I suggested.
>
> You can g
> Do you mean running a multi-JVM 'cluster' on the single machine?
Yes, that's what I suggested.
You can get some information here:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
> How would that affect performance/memory-consumption? If a multi-JVM
setup can han
To add one more note, I tried running more smaller executors each with
32-64g memory and executor.cores 2-4 (with 2 workers as well) and I'm still
getting the same exception:
java.lang.IllegalArgumentException: Cannot allocate a page with more than
17179869176 bytes
at
org.apache.spark.mem
Do you mean running a multi-JVM 'cluster' on the single machine? How would
that affect performance/memory-consumption? If a multi-JVM setup can handle
such a large input, then why can't a single-JVM break down the job into
smaller tasks?
I also found that SPARK-9411 mentions making the page_size c
Run more smaller executors: change `spark.executor.memory` to 32g and
`spark.executor.cores` to 2-4, for example.
Changing driver's memory won't help because it doesn't participate in
execution.
On Fri, Sep 30, 2016 at 2:58 PM, Babak Alipour
wrote:
> Thank you for your replies.
>
> @Mich, using
Thank you for your replies.
@Mich, using LIMIT 100 in the query prevents the exception but given the
fact that there's enough memory, I don't think this should happen even
without LIMIT.
@Vadim, here's the full stack trace:
Caused by: java.lang.IllegalArgumentException: Cannot allocate a page wi
Can you post the whole exception stack trace?
What are your executor memory settings?
Right now I assume that it happens in UnsafeExternalRowSorter ->
UnsafeExternalSorter:insertRecord
Running more executors with lower `spark.executor.memory` should help.
On Fri, Sep 30, 2016 at 12:57 PM, Babak
What will happen if you LIMIT the result set to 100 rows only -- select
from order by field LIMIT 100. Will that work?
How about running the whole query WITHOUT order by?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Greetings everyone,
I'm trying to read a single field of a Hive table stored as Parquet in
Spark (~140GB for the entire table, this single field should be just a few
GB) and look at the sorted output using the following:
sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC")
​But
12 matches
Mail list logo