No problem, Antony. ML lib is tricky! I'd love to chat with you about your
use case - sounds like we're working on similar problems/scales.
On Fri, Feb 20, 2015 at 1:55 PM Xiangrui Meng wrote:
> Hi Antony, Is it easy for you to try Spark 1.3.0 or master? The ALS
> performance should be improved i
Hi Antony, Is it easy for you to try Spark 1.3.0 or master? The ALS
performance should be improved in 1.3.0. -Xiangrui
On Fri, Feb 20, 2015 at 1:32 PM, Antony Mayi
wrote:
> Hi Ilya,
>
> thanks for your insight, this was the right clue. I had default parallelism
> already set but it was quite low
Hi Ilya,
thanks for your insight, this was the right clue. I had default parallelism
already set but it was quite low (hundreds) and moreover the number of
partitions of the input RDD was low as well so the chunks were really too big.
Increased parallelism and repartitioning seems to be helping.
Hi Anthony - you are seeing a problem that I ran into. The underlying issue
is your default parallelism setting. What's happening is that within ALS
certain RDD operations end up changing the number of partitions you have of
your data. For example if you start with an RDD of 300 partitions, unless
now with reverted spark.shuffle.io.preferDirectBufs (to true) getting again GC
overhead limit exceeded:
=== spark stdout ===15/02/19 12:08:08 WARN scheduler.TaskSetManager: Lost task
7.0 in stage 18.0 (TID 5329, 192.168.1.93): java.lang.OutOfMemoryError: GC
overhead limit exceeded at
jav
it is from within the ALS.trainImplicit() call. btw. the exception varies
between this "GC overhead limit exceeded" and "Java heap space" (which I guess
is just different outcome of same problem).
just tried another run and here are the logs (filtered) - note I tried this run
with spark.shuffle.
Oh OK you are saying you are requesting 25 executors and getting them,
got it. You can consider making fewer, bigger executors to pool rather
than split up your memory, but at some point it becomes
counter-productive. 32GB is a fine executor size.
So you have ~8GB available per task which seems li
based on spark UI I am running 25 executors for sure. why would you expect
four? I submit the task with --num-executors 25 and I get 6-7 executors running
per host (using more of smaller executors allows me better cluster utilization
when running parallel spark sessions (which is not the case of
This should result in 4 executors, not 25. They should be able to
execute 4*4 = 16 tasks simultaneously. You have them grab 4*32 = 128GB
of RAM, not 1TB.
It still feels like this shouldn't be running out of memory, not by a
long shot though. But just pointing out potential differences between
what
Hi,
I have 4 very powerful boxes (256GB RAM, 32 cores each). I am running spark
1.2.0 in yarn-client mode with following layout:
spark.executor.cores=4
spark.executor.memory=28G
spark.yarn.executor.memoryOverhead=4096
I am submitting bigger ALS trainImplicit task (rank=100, iters=15) on a dataset
10 matches
Mail list logo