Re: BigDecimal problem in parquet file

Bipin Nag Wed, 10 Jun 2015 05:54:13 -0700

Hi Cheng,

I am using Spark 1.3.1 binary available for Hadoop 2.6. I am loading an
existing parquet file, then repartitioning and saving it. Doing this gives
the error. The code for this doesn't look like causing  problem. I have a
feeling the source - the existing parquet is the culprit.


I created that parquet using a jdbcrdd (pulled from microsoft sql server).
First I saved jdbcrdd as an objectfile on disk. Then loaded it again, made
a dataframe from it using a schema then saved it as a parquet.

Following is the code :
For saving jdbcrdd:
 name - fullqualifiedtablename
 pk - string for primarykey
 pklast - last id to pull
    val myRDD = new JdbcRDD( sc, () =>
        DriverManager.getConnection(url,username,password) ,
        "SELECT * FROM " + name + " WITH (NOLOCK) WHERE ? <= "+pk+" and
"+pk+" <= ?",
        1, lastpk, 1, JdbcRDD.resultSetToObjectArray)
    myRDD.saveAsObjectFile("rawdata/"+name);

For applying schema and saving the parquet:
    val myschema = schemamap(name)
    val myrdd =
sc.objectFile[Array[Object]]("/home/bipin/rawdata/"+name).map(x =>
org.apache.spark.sql.Row(x:_*))
    val actualdata = sqlContext.createDataFrame(myrdd, myschema)
    actualdata.saveAsParquetFile("/home/bipin/stageddata/"+name)

Schema structtype can be made manually, though I pull table's metadata and
make one. It is a simple string translation (see sql docs
<https://msdn.microsoft.com/en-us/library/ms378878%28v=sql.110%29.aspx>
and/or spark datatypes
<https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#data-types>)

That is how I created the parquet file. Any help to solve the issue is
appreciated.
Thanks
Bipin


On 9 June 2015 at 20:44, Cheng Lian <lian.cs....@gmail.com> wrote:

> Would you please provide a snippet that reproduce this issue? What version
> of Spark were you using?
>
> Cheng
>
> On 6/9/15 8:18 PM, bipin wrote:
>
>> Hi,
>> When I try to save my data frame as a parquet file I get the following
>> error:
>>
>> java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to
>> org.apache.spark.sql.types.Decimal
>>         at
>>
>> org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:220)
>>         at
>>
>> org.apache.spark.sql.parquet.RowWriteSupport.writeValue(ParquetTableSupport.scala:192)
>>         at
>>
>> org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:171)
>>         at
>>
>> org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:134)
>>         at
>>
>> parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:120)
>>         at
>> parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
>>         at
>> parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
>>         at
>> org.apache.spark.sql.parquet.ParquetRelation2.org
>> $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:671)
>>         at
>>
>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
>>         at
>>
>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> How to fix this problem ?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/BigDecimal-problem-in-parquet-file-tp23221.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>

Re: BigDecimal problem in parquet file

Reply via email to