Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
00 > >> > executors's memory in SparkSQL, on which we would do some calculation > >> > using > >> > UDFs in pyspark. > >> > If I run my SQL on only a portion of the data (filtering by one of the > >> > attributes), let's say 800 million records,

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Davies Liu
ARD_ACCOUNT_CITY_SRC, >> > STANDARD_ACCOUNT_CITY_SRC) >> > / >> > CASE WHEN LENGTH (STANDARD_ACCOUNT_CITY_SRC)>LENGTH >> > (STANDARD_ACCOUNT_CITY_SRC) >> > THEN LENGTH (STANDARD_ACCOUNT_CITY_SRC)

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-09 Thread Zoltan Fedor
let's say 800 million records, then all works well. But > when I > > run the same SQL on all the data, then I receive > > "java.lang.OutOfMemoryError: GC overhead limit exceeded" from basically > all > > of the executors. > > > > It seems to me

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Davies Liu
on which we would do some calculation using > UDFs in pyspark. > If I run my SQL on only a portion of the data (filtering by one of the > attributes), let's say 800 million records, then all works well. But when I > run the same SQL on all the data, then I receive > "jav

java.lang.OutOfMemoryError: GC overhead limit exceeded when using UDFs in SparkSQL (Spark 2.0.0)

2016-08-08 Thread Zoltan Fedor
nly a portion of the data (filtering by one of the attributes), let's say 800 million records, then all works well. But when I run the same SQL on all the data, then I receive "*java.lang.OutOfMemoryError: GC overhead limit exceeded"* from basically all of the executors. It seems to me

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Adrien Mogenet
Spark tasks >> are done. >> >> After the spark tasks are done, the job appears to be running for over an >> hour, until I get the following (full stack trace below): >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> at >>

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Cheng Lian
dataset successfully. I can see the output in HDFS once all Spark tasks are done. After the spark tasks are done, the job appears to be running for over an hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC overhead

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Jerry Lam
; successfully. I can see the output in HDFS once all Spark tasks are done. > > After the spark tasks are done, the job appears to be running for over an > hour, until I get the following (full stack trace below): > >

Re: df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-12-02 Thread Don Drake
ull stack trace below): > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238) > > I had set the driver memory to be 20GB. > > I attempted to read in

df.partitionBy().parquet() java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-28 Thread Don Drake
hour, until I get the following (full stack trace below): java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:238) I had set the driver memory to be 20GB. I attempted to

newbie simple app, small data set: Py4JJavaError java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-18 Thread Andy Davidson
'An error occurred while calling {0}{1}{2}.\n'. --> 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o65.partitions. : java.lang.OutOfMemoryError: GC o

RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-07 Thread Sun, Rui
val Patel [mailto:dhaval1...@gmail.com] Sent: Saturday, November 7, 2015 12:26 AM To: Spark User Group Subject: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded I have been struggling through this error since past 3 days and have tried all possible ways/suggestion

[sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-06 Thread Dhaval Patel
cast_2_piece0 on localhost:39562 in memory (size: 2.4 KB, free: 530.0 MB) 15/11/06 10:45:20 INFO ContextCleaner: Cleaned accumulator 2 15/11/06 10:45:53 WARN ServletHandler: Error for /static/timeline-view.css java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.zip.Zip

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread Ted Yu
1.2.0 is quite old. You may want to try 1.5.1 which was released in the past week. Cheers > On Oct 4, 2015, at 4:26 AM, t_ras wrote: > > I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying > coutn action on a file. > > The file is a CSV file 217GB zi

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-10-04 Thread t_ras
I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying coutn action on a file. The file is a CSV file 217GB zise Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0 configutation: spark.app.id:local-1443956477103 spark.app.name:Spark shell spark.cores.max

AW: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-08-11 Thread rene.pfitzner
endet: Samstag, 11. Juli 2015 03:58 An: Ted Yu; Robin East; user Betreff: Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g o

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
html and >> in the reduce phase we keep the html that has the shortest URL. However, >> after running for 2-3 hours the application crashes due to memory issue. >> Here is the exception: >> >> 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage

Re: Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Ted Yu
-1 signature from the html and in > the reduce phase we keep the html that has the shortest URL. However, after > running for 2-3 hours the application crashes due to memory issue. Here is > the exception: > > 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stag

Strange Error: "java.lang.OutOfMemoryError: GC overhead limit exceeded"

2015-07-15 Thread Saeed Shahrivari
ep the html that has the shortest URL. However, after running for 2-3 hours the application crashes due to memory issue. Here is the exception: 15/07/15 18:24:05 WARN scheduler.TaskSetManager: Lost task 267.0 in stage 0.0 (TID 267, psh-11.nse.ir): java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-10 Thread Roman Sokolov
Hello again. So I could compute triangle numbers when run the code from spark shell without workers (with --driver-memory 15g option), but with workers I have errors. So I run spark shell: ./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory 6900m --driver-memory 15g and workers (

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Yep, I already found it. So I added 1 line: val graph = GraphLoader.edgeListFile(sc, "", ...) val newgraph = graph.convertToCanonicalEdges() and could successfully count triangles on "newgraph". Next will test it on bigger (several Gb) networks. I am using Spark 1.3 and 1.4 but haven't seen

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Ted Yu
See SPARK-4917 which went into Spark 1.3.0 On Fri, Jun 26, 2015 at 2:27 AM, Robin East wrote: > You’ll get this issue if you just take the first 2000 lines of that file. > The problem is triangleCount() expects srdId < dstId which is not the case > in the file (e.g. vertex 28). You can get round

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Robin East
You’ll get this issue if you just take the first 2000 lines of that file. The problem is triangleCount() expects srdId < dstId which is not the case in the file (e.g. vertex 28). You can get round this by calling graph.convertToCanonical Edges() which removes bi-directional edges and ensures sr

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
Ok, but what does it means? I did not change the core files of spark, so is it a bug there? PS: on small datasets (<500 Mb) I have no problem. Am 25.06.2015 18:02 schrieb "Ted Yu" : > The assertion failure from TriangleCount.scala corresponds with the > following lines: > > g.outerJoinVertices

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Ted Yu
The assertion failure from TriangleCount.scala corresponds with the following lines: g.outerJoinVertices(counters) { (vid, _, optCounter: Option[Int]) => val dblCount = optCounter.getOrElse(0) // double count should be even (divisible by two) assert((dblCount & 1)

Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Roman Sokolov
Hello! I am trying to compute number of triangles with GraphX. But get memory error or heap size, even though the dataset is very small (1Gb). I run the code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on separate machines 8Gb RAM each). So I have 15x more memory than the dat

Re: Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread Deng Ching-Mallete
ve a Spark job that throws "java.lang.OutOfMemoryError: GC overhead > limit exceeded". > > The job is trying to process a filesize 4.5G. > > I've tried following spark configuration: > > --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G > > I tried i

Spark job throwing “java.lang.OutOfMemoryError: GC overhead limit exceeded”

2015-06-15 Thread diplomatic Guru
Hello All, I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead limit exceeded". The job is trying to process a filesize 4.5G. I've tried following spark configuration: --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G I tried

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-28 Thread Guru Medasani
bject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded I have yarn configured with yarn.nodemanager.vmem-check-enabled=false and yarn.nodemanager.pmem-check-enabled=false to avoid yarn killing the containers. the stack trace is bellow. thanks, Antony. 15/01/27 17:0

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
17:02:53 ERROR executor.Executor: Exception in task 21.0 in stage 12.0 (TID 1312)java.lang.OutOfMemoryError: GC overhead limit exceeded        at java.lang.Integer.valueOf(Integer.java:642)        at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70)        at

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
Can you attach the logs where this is failing? From: Sven Krasser Date: Tuesday, January 27, 2015 at 4:50 PM To: Guru Medasani Cc: Sandy Ryza , Antony Mayi , "user@spark.apache.org" Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Since it's an executor

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sven Krasser
e: Tuesday, January 27, 2015 at 3:33 PM > To: Antony Mayi > Cc: "user@spark.apache.org" > Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded > > Hi Antony, > > If you look in the YARN NodeManager logs, do you see that it's killing the > exec

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
: Tuesday, January 27, 2015 at 3:33 PM To: Antony Mayi Cc: "user@spark.apache.org" Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing for a d

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Sandy Ryza
Hi Antony, If you look in the YARN NodeManager logs, do you see that it's killing the executors? Or are they crashing for a different reason? -Sandy On Tue, Jan 27, 2015 at 12:43 PM, Antony Mayi wrote: > Hi, > > I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors > crashe

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Antony Mayi
Hi, I am using spark.yarn.executor.memoryOverhead=8192 yet getting executors crashed with this error. does that mean I have genuinely not enough RAM or is this matter of config tuning? other config options used:spark.storage.memoryFraction=0.3 SPARK_EXECUTOR_MEMORY=14G running spark 1.2.0 as yarn

[Spark Streaming] java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-09-08 Thread Yan Fang
Hi guys, My Spark Streaming application have this "java.lang.OutOfMemoryError: GC overhead limit exceeded" error in SparkStreaming driver program. I have done the following to debug with it: 1. improved the driver memory from 1GB to 2GB, this error came after 22 hrs. When the memory w

Re: Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-08-04 Thread Sean Owen
er and attempting to run a simple spark app that > processes about 10-15GB raw data but I keep running into this error: > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > Each node has 8 cores and 2GB memory. I notice the heap size on the > executors is set to 512MB wi

Spark app throwing java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-08-04 Thread buntu
I got a 40 node cdh 5.1 cluster and attempting to run a simple spark app that processes about 10-15GB raw data but I keep running into this error: java.lang.OutOfMemoryError: GC overhead limit exceeded Each node has 8 cores and 2GB memory. I notice the heap size on the executors is set to

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-07-21 Thread Yifan LI
Thanks, Abel. Best, Yifan LI On Jul 21, 2014, at 4:16 PM, Abel Coronado Iruegas wrote: > Hi Yifan > > This works for me: > > export SPARK_JAVA_OPTS="-Xms10g -Xmx40g -XX:MaxPermSize=10g" > export ADD_JARS=/home/abel/spark/MLI/target/MLI-assembly-1.0.jar > export SPARK_MEM=40g > ./spark-shell

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-07-21 Thread Abel Coronado Iruegas
Hi Yifan This works for me: export SPARK_JAVA_OPTS="-Xms10g -Xmx40g -XX:MaxPermSize=10g" export ADD_JARS=/home/abel/spark/MLI/target/MLI-assembly-1.0.jar export SPARK_MEM=40g ./spark-shell Regards On Mon, Jul 21, 2014 at 7:48 AM, Yifan LI wrote: > Hi, > > I am trying to load the Graphx examp

java.lang.OutOfMemoryError: GC overhead limit exceeded

2014-07-21 Thread Yifan LI
Hi, I am trying to load the Graphx example dataset(LiveJournal, 1.08GB) through Scala Shell on my standalone multicore machine(8 cpus, 16GB mem), but an OutOfMemory error was returned when below code was running, val graph = GraphLoader.edgeListFile(sc, path, minEdgePartitions = 16).partitionB

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Aaron Davidson
e objects similar to MapReduce >> (HadoopRDD does this by actually using Hadoop's Writables, for instance), >> but the general Spark APIs don't support this because mutable objects are >> not friendly to caching or serializing. >> >> >> On Tue, Jul 8, 201

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Jerry Lam
> > On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> Hi all, >> >> I faced with the next exception during map step: >> java.lang.OutOfMemoryError (java.lang.O

Re: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Aaron Davidson
erializing. On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Hi all, > > I faced with the next exception during map step: > java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit > exceeded) > java.lang.re

java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded)

2014-07-08 Thread Konstantin Kudryavtsev
Hi all, I faced with the next exception during map step: java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) java.lang.reflect.Array.newInstance(Array.java:70) com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read