Yeah, this does look confusing. We are trying to improve the error
reporting by catching similar issues at the end of the analysis phase
and give more descriptive error messages. Part of the work can be found
here:
https://github.com/apache/spark/blob/0902a11940e550e85a53e110b490fe90e16ddaf4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
Cheng
On 6/9/15 2:51 PM, Bipin Nag wrote:
Cheng you were right. It works when I remove the field from either
one. I should have checked the types beforehand. What confused me is
that Spark attempted to join it and midway threw the error. It isn't
quite there yet. Thanks for the help.
On Mon, Jun 8, 2015 at 8:29 PM Cheng Lian <lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
I suspect that Bookings and Customerdetails both have a PolicyType
field, one is string and the other is an int.
Cheng
On 6/8/15 9:15 PM, Bipin Nag wrote:
Hi Jeetendra, Cheng
I am using following code for joining
val Bookings =
sqlContext.load("/home/administrator/stageddata/Bookings")
val Customerdetails =
sqlContext.load("/home/administrator/stageddata/Customerdetails")
val CD = Customerdetails.
where($"CreatedOn" > "2015-04-01 00:00:00.0").
where($"CreatedOn" < "2015-05-01 00:00:00.0")
//Bookings by CD
val r1 = Bookings.
withColumnRenamed("ID","ID2")
val r2 = CD.
join(r1,CD.col("CustomerID") === r1.col("ID2"),"left")
r2.saveAsParquetFile("/home/administrator/stageddata/BOOKING_FULL");
@Cheng I am not appending the joined table to an existing parquet
file, it is a new file.
@Jitender I have a rather large parquet file and it also contains
some confidential data. Can you tell me what you need to check in it.
Thanks
On 8 June 2015 at 16:47, Jeetendra Gangele <gangele...@gmail.com
<mailto:gangele...@gmail.com>> wrote:
Parquet file when are you loading these file?
can you please share the code where you are passing parquet
file to spark?.
On 8 June 2015 at 16:39, Cheng Lian <lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
Are you appending the joined DataFrame whose PolicyType
is string to an existing Parquet file whose PolicyType is
int? The exception indicates that Parquet found a column
with conflicting data types.
Cheng
On 6/8/15 5:29 PM, bipin wrote:
Hi I get this error message when saving a table:
parquet.io
<http://parquet.io>.ParquetDecodingException: The
requested schema is not compatible
with the file schema. incompatible types: optional
binary PolicyType (UTF8)
!= optional int32 PolicyType
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
at
parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
at
parquet.schema.MessageType.accept(MessageType.java:55)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
at
parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
at
parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
at
parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at
org.apache.spark.sql.parquet.ParquetRelation2.org
<http://org.apache.spark.sql.parquet.ParquetRelation2.org>$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at
org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I joined two tables both loaded from parquet file,
the joined table when
saved throws this error. I could not find anything
about this error. Could
this be a bug ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
Sent from the Apache Spark User List mailing list
archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail:
user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail:
user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail:
user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
--
Hi,
Find my attached resume. I have total around 7 years of work
experience.
I worked for Amazon and Expedia in my previous assignments
and currently I am working with start- up technology company
called Insideview in hyderabad.
Regards
Jeetendra