Hi ,
I am coverting jsonRDD to parquet by saving it as parquet file
(saveAsParquetFile)
cacheContext.jsonFile("file:///u1/sample.json").saveAsParquetFile("sample.parquet")
I am reading parquet file and registering it as a table :
val parquet = cacheContext.parquetFile("sample_trades.parquet")
parquet.registerTempTable("sample")
When I do a print schema , I see :
root
|-- SAMPLE: struct (nullable = true)
| |-- CODE: integer (nullable = true)
| |-- DESC: string (nullable = true)
When I query :
cacheContext.sql("select SAMPLE.DESC from sample where
SAMPLE.CODE=1").map(t=>t).collect.foreach(println) , I get error that
java.lang.IllegalArgumentException: Column [CODE] was not found in schema!
but if I put SAMPLE.CODE in single code (forcing it as string) , it works ,
for example :
cacheContext.sql("select SAMPLE.DESC from sample where
*SAMPLE.CODE='1'*").map(t=>t).collect.foreach(println) works
What am I missing here ? I understand catalyst will do optimization so data
type doesn't matter that much , but something is off here .
Regards,
Gaurav
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Integer-column-in-schema-RDD-from-parquet-being-considered-as-string-tp21917.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]