> What is the error? Could you file a JIRA for it?
Turns out there's actually 3 separate errors (indicated below), one of
which **silently returns the wrong value to the user*.* Should I file a
separate JIRA for each one? What level should I mark these as (critical,
major, etc.)?
I'm not sure t
On Thu, Sep 25, 2014 at 11:25 AM, Brad Miller
wrote:
> Hi Davies,
>
> Thanks for your help.
>
> I ultimately re-wrote the code to use broadcast variables, and then received
> an error when trying to broadcast self.all_models that the size did not fit
> in an int (recall that broadcasts use 32 bit
Hi Davies,
Thanks for your help.
I ultimately re-wrote the code to use broadcast variables, and then
received an error when trying to broadcast self.all_models that the size
did not fit in an int (recall that broadcasts use 32 bit ints to store
size), suggesting that it was in fact over 2G. I do
Or maybe there is a bug related to the base64 in py4j, could you
dumps the serialized bytes of closure to verify this?
You could add a line in spark/python/pyspark/rdd.py:
ser = CloudPickleSerializer()
pickled_command = ser.dumps(command)
+ print len(pickled_command), repr(pi
The traceback said that the serialized closure cannot be parsed (base64)
correctly by py4j.
The string in Java cannot be longer than 2G, so the serialized closure
cannot longer than 1.5G (there are overhead in base64), is it possible
that your data used in the map function is so big? If it's, you
Hi All,
I'm experiencing a java.lang.NegativeArraySizeException in a pyspark script
I have. I've pasted the full traceback at the end of this email.
I have isolated the line of code in my script which "causes" the exception
to occur. Although the exception seems to occur deterministically, it is