Hi , I am coverting jsonRDD to parquet by saving it as parquet file (saveAsParquetFile) cacheContext.jsonFile("file:///u1/sample.json").saveAsParquetFile("sample.parquet")
I am reading parquet file and registering it as a table : val parquet = cacheContext.parquetFile("sample_trades.parquet") parquet.registerTempTable("sample") When I do a print schema , I see : root |-- SAMPLE: struct (nullable = true) | |-- CODE: integer (nullable = true) | |-- DESC: string (nullable = true) When I query : cacheContext.sql("select SAMPLE.DESC from sample where SAMPLE.CODE=1").map(t=>t).collect.foreach(println) , I get error that java.lang.IllegalArgumentException: Column [CODE] was not found in schema! but if I put SAMPLE.CODE in single code (forcing it as string) , it works , for example : cacheContext.sql("select SAMPLE.DESC from sample where *SAMPLE.CODE='1'*").map(t=>t).collect.foreach(println) works What am I missing here ? I understand catalyst will do optimization so data type doesn't matter that much , but something is off here . Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Integer-column-in-schema-RDD-from-parquet-being-considered-as-string-tp21917.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org