Re: java.lang.NegativeArraySizeException in pyspark

2014-09-26 Thread Brad Miller
> What is the error? Could you file a JIRA for it? Turns out there's actually 3 separate errors (indicated below), one of which **silently returns the wrong value to the user*.* Should I file a separate JIRA for each one? What level should I mark these as (critical, major, etc.)? I'm not sure t

Re: java.lang.NegativeArraySizeException in pyspark

2014-09-25 Thread Davies Liu
On Thu, Sep 25, 2014 at 11:25 AM, Brad Miller wrote: > Hi Davies, > > Thanks for your help. > > I ultimately re-wrote the code to use broadcast variables, and then received > an error when trying to broadcast self.all_models that the size did not fit > in an int (recall that broadcasts use 32 bit

Re: java.lang.NegativeArraySizeException in pyspark

2014-09-25 Thread Brad Miller
Hi Davies, Thanks for your help. I ultimately re-wrote the code to use broadcast variables, and then received an error when trying to broadcast self.all_models that the size did not fit in an int (recall that broadcasts use 32 bit ints to store size), suggesting that it was in fact over 2G. I do

Re: java.lang.NegativeArraySizeException in pyspark

2014-09-23 Thread Davies Liu
Or maybe there is a bug related to the base64 in py4j, could you dumps the serialized bytes of closure to verify this? You could add a line in spark/python/pyspark/rdd.py: ser = CloudPickleSerializer() pickled_command = ser.dumps(command) + print len(pickled_command), repr(pi

Re: java.lang.NegativeArraySizeException in pyspark

2014-09-22 Thread Davies Liu
The traceback said that the serialized closure cannot be parsed (base64) correctly by py4j. The string in Java cannot be longer than 2G, so the serialized closure cannot longer than 1.5G (there are overhead in base64), is it possible that your data used in the map function is so big? If it's, you

java.lang.NegativeArraySizeException in pyspark

2014-09-20 Thread Brad Miller
Hi All, I'm experiencing a java.lang.NegativeArraySizeException in a pyspark script I have. I've pasted the full traceback at the end of this email. I have isolated the line of code in my script which "causes" the exception to occur. Although the exception seems to occur deterministically, it is