[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

dondrake Fri, 13 Feb 2015 13:18:32 -0800

Github user dondrake commented on the pull request:

    https://github.com/apache/spark/pull/4521#issuecomment-74328474
  
    This failure comes from my test, but it shouldn't fail when saving a Long 
with the exception can't convert Integer to Long.
    
    ```
    File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py", line 
1475, in pyspark.sql.SQLContext.parquetFile
    Failed example:
        srdd.saveAsParquetFile(parquetFile)
    Exception raised:
        Traceback (most recent call last):
          File "/usr/lib64/python2.6/doctest.py", line 1253, in __run
            compileflags, 1) in test.globs
          File "<doctest pyspark.sql.SQLContext.parquetFile[4]>", line 1, in 
<module>
            srdd.saveAsParquetFile(parquetFile)
          File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py", line 
1906, in saveAsParquetFile
            self._jschema_rdd.saveAsParquetFile(path)
          File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
 line 538, in __call__
            self.target_id, self.name)
          File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
 line 300, in get_return_value
            format(target_id, '.', name), value)
        Py4JJavaError: An error occurred while calling o715.saveAsParquetFile.
        : org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 3 in stage 55.0 failed 1 times, most recent failure: Lost task 3.0 in 
stage 55.0 (TID 147, localhost): java.lang.ClassCastException: 
java.lang.Integer cannot be cast to java.lang.Long
                at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:110)
                at 
org.apache.spark.sql.catalyst.expressions.GenericRow.getLong(Row.scala:153)
                at 
org.apache.spark.sql.parquet.MutableRowWriteSupport.consumeType(ParquetTableSupport.scala:350)
                at 
org.apache.spark.sql.parquet.MutableRowWriteSupport.write(ParquetTableSupport.scala:328)
                at 
org.apache.spark.sql.parquet.MutableRowWriteSupport.write(ParquetTableSupport.scala:314)
                at 
    ```
    
    It appears to be casting (not converting) an Integer to a Long, which you 
can't do.  But, why does it think this is an Integer in the first place when 
it's defined as a LongType in Python and the spark Scala code??
    
    I can confirm that I did see this in Spark 1.2.0 which motivated me to 
start this JIRA, and why I added this additional test case.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

Reply via email to