Hi Cheng, thank you very much for helping me to finally find out the secret of this magic...
actually we defined this external table with SID STRING REQUEST_ID STRING TIMES_DQ TIMESTAMP TOTAL_PRICE FLOAT ... using "desc table ext_fullorders" it is only shown as [# col_name data_type comment ] ... [times_dq string from deserializer ] [total_price string from deserializer ] ... because, as you said, CSVSerde sets all field object inspectors to javaStringObjectInspector and therefore there are comments "from deserializer" but in StorageDescriptor, are the real user defined types, using "desc extended table ext_fullorders" we can see his sd:StorageDescriptor is: FieldSchema(name:times_dq, type:timestamp, comment:null), FieldSchema(name:total_price, type:float, comment:null) and Spark HiveContext reads the schema info from this StorageDescriptor https://github.com/apache/spark/blob/7e191fe29bb09a8560cd75d453c4f7f662dff406/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L316 so, in the SchemaRDD, the fields in Row were filled with strings (via fillObject, all of values were retrieved from CSVSerDe with javaStringObjectInspector) but Spark considers that some of them are float or timestamp (schema info were got from sd:StorageDescriptor) crazy... and sorry for update on the weekend... a little more about how i fand this problem and why it is a trouble for us. we use the new spark thrift server, to query normal managed hive table, it works fine but when we try to access the external tables with custom SerDe such as this CSVSerDe, then we will get this ClassCastException, such as: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Float the reason is https://github.com/apache/spark/blob/d94a44d7caaf3fe7559d9ad7b10872fa16cf81ca/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala#L104-L105 here Spark's thrift server try to get a float value from SparkRow, because in the schema info (sd:StorageDescriptor) this column is float, but actually in SparkRow, this field was filled with string value... -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/HiveContext-schemaRDD-printSchema-get-different-dataTypes-feature-or-a-bug-really-strange-and-surpri-tp8035p8157.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org