Re: NegativeArraySizeException when doing joins on skewed data

2015-05-21 Thread jstripit
I ran into this problem yesterday, but outside of the context of Spark. It's a limitation of Kryo's IdentityObjectIntMap. In Spark you might try using Java's internal serializer instead. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NegativeArraySizeE

Re: NegativeArraySizeException when doing joins on skewed data

2015-03-12 Thread Soila Pertet Kavulya
Hi Tristan, Did upgrading to Kryo3 help? Thanks, Soila On Sun, Mar 1, 2015 at 2:48 PM, Tristan Blakers wrote: > Yeah I implemented the same solution. It seems to kick in around the 4B > mark, but looking at the log I suspect it’s probably a function of the > number of unique objects more than

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-26 Thread Tristan Blakers
Hi Imran, I can confirm this still happens when calling Kryo serialisation directly, not I’m using Java. The output file is at about 440mb at the time of the crash. Kryo is version 2.21. When I get a chance I’ll see if I can make a shareable test case and try on Kryo 3.0, I doubt they’d be intere

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-26 Thread Imran Rashid
Hi Tristan, at first I thought you were just hitting another instance of https://issues.apache.org/jira/browse/SPARK-1391, but I actually think its entirely related to kryo. Would it be possible for you to try serializing your object using kryo, without involving spark at all? If you are unfamil

Re: NegativeArraySizeException when doing joins on skewed data

2015-02-25 Thread Tristan Blakers
I get the same exception simply by doing a large broadcast of about 6GB. Note that I’m broadcasting a small number (~3m) of fat objects. There’s plenty of free RAM. This and related kryo exceptions seem to crop-up whenever an object graph of more than a couple of GB gets passed around. at