Re: Error in using saveAsParquetFile

Cheng Lian Tue, 09 Jun 2015 00:39:59 -0700

Yeah, this does look confusing. We are trying to improve the errorreporting by catching similar issues at the end of the analysis phaseand give more descriptive error messages. Part of the work can be foundhere:https://github.com/apache/spark/blob/0902a11940e550e85a53e110b490fe90e16ddaf4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala


Cheng


On 6/9/15 2:51 PM, Bipin Nag wrote:

Cheng you were right. It works when I remove the field from eitherone. I should have checked the types beforehand. What confused me isthat Spark attempted to join it and midway threw the error. It isn'tquite there yet. Thanks for the help.

On Mon, Jun 8, 2015 at 8:29 PM Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:


    I suspect that Bookings and Customerdetails both have a PolicyType
    field, one is string and the other is an int.


    Cheng


    On 6/8/15 9:15 PM, Bipin Nag wrote:

    Hi Jeetendra, Cheng

    I am using following code for joining

    val Bookings =
    sqlContext.load("/home/administrator/stageddata/Bookings")
    val Customerdetails =
    sqlContext.load("/home/administrator/stageddata/Customerdetails")

    val CD = Customerdetails.
        where($"CreatedOn" > "2015-04-01 00:00:00.0").
        where($"CreatedOn" < "2015-05-01 00:00:00.0")

    //Bookings by CD
    val r1 = Bookings.
        withColumnRenamed("ID","ID2")
    val r2 = CD.
        join(r1,CD.col("CustomerID") === r1.col("ID2"),"left")

    r2.saveAsParquetFile("/home/administrator/stageddata/BOOKING_FULL");

    @Cheng I am not appending the joined table to an existing parquet
    file, it is a new file.
    @Jitender I have a rather large parquet file and it also contains
    some confidential data. Can you tell me what you need to check in it.

    Thanks


    On 8 June 2015 at 16:47, Jeetendra Gangele <gangele...@gmail.com
    <mailto:gangele...@gmail.com>> wrote:

        Parquet file when are you loading these file?
        can you please share the code where you are passing parquet
        file to spark?.

        On 8 June 2015 at 16:39, Cheng Lian <lian.cs....@gmail.com
        <mailto:lian.cs....@gmail.com>> wrote:

            Are you appending the joined DataFrame whose PolicyType
            is string to an existing Parquet file whose PolicyType is
            int? The exception indicates that Parquet found a column
            with conflicting data types.

            Cheng


            On 6/8/15 5:29 PM, bipin wrote:

                Hi I get this error message when saving a table:

                parquet.io
                <http://parquet.io>.ParquetDecodingException: The
                requested schema is not compatible
                with the file schema. incompatible types: optional
                binary PolicyType (UTF8)
                != optional int32 PolicyType
                        at
                
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
                        at
                
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
                        at
                parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
                        at
                
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
                        at
                
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
                        at
                parquet.schema.MessageType.accept(MessageType.java:55)
                        at
                parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
                        at
                parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
                        at
                parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
                        at
                
parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
                        at
                
parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
                        at
                
parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
                        at
                
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
                        at
                
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
                        at
                org.apache.spark.sql.parquet.ParquetRelation2.org
                
<http://org.apache.spark.sql.parquet.ParquetRelation2.org>$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
                        at
                
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
                        at
                
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
                        at
                
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
                        at
                org.apache.spark.scheduler.Task.run(Task.scala:64)
                        at
                
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
                        at
                
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                        at
                
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                        at java.lang.Thread.run(Thread.java:745)

                I joined two tables both loaded from parquet file,
                the joined table when
                saved throws this error. I could not find anything
                about this error. Could
                this be a bug ?



                --
                View this message in context:
                
http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
                Sent from the Apache Spark User List mailing list
                archive at Nabble.com.

                
---------------------------------------------------------------------
                To unsubscribe, e-mail:
                user-unsubscr...@spark.apache.org
                <mailto:user-unsubscr...@spark.apache.org>
                For additional commands, e-mail:
                user-h...@spark.apache.org
                <mailto:user-h...@spark.apache.org>




            
---------------------------------------------------------------------
            To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
            <mailto:user-unsubscr...@spark.apache.org>
            For additional commands, e-mail:
            user-h...@spark.apache.org
            <mailto:user-h...@spark.apache.org>

--Hi,


        Find my attached resume. I have total around 7 years of work
        experience.
        I worked for Amazon and Expedia in my previous assignments
        and currently I am working with start- up technology company
        called Insideview in hyderabad.

        Regards
        Jeetendra

Re: Error in using saveAsParquetFile

Reply via email to