Thanks much Zhan Zhang. I will open a JIRA saying orc files created using hiveContext.sql can't be read by dataframe reader.
Regards, Umesh On Oct 4, 2015 10:14, "Zhan Zhang" <zzh...@hortonworks.com> wrote: > HI Umesh, > > It depends on how you create and read the orc file, although everything > happens in side of spark. Because there are two paths in spark to create > table, one is through hive, and the other one is through data frame. Due to > version compatibility issue, > there may be conflicts between these two paths. You have to use > dataframe.write and dataframe.read to avoid such issue. The ORC path has to > be upgraded to the same version as hive to solve this issue. > > Because ORC becomes a independent project now, and we are waiting for it > to be totally isolated from hive. Then we can upgrade ORC to latest > version, and put it to SqlContext. I think you can open a JIRA to tracking > this upgrade. > > BTW, my name is Zhan Zhang instead of Zang. > > Thanks. > > Zhan Zhang > > On Oct 3, 2015, at 2:18 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > > Hi Zang any idea why is this happening? I can load ORC files created by > Hive table but I cant load ORC files created by Spark itself. It looks like > bug. > > On Wed, Sep 30, 2015 at 12:03 PM, Umesh Kacha <umesh.ka...@gmail.com> > wrote: > >> Hi Zang thanks much please find the code below >> >> Working code loading data from a path created by Hive table using hive >> console outside of spark : >> >> DataFrame df = >> hiveContext.read().format("orc").load("/hdfs/path/to/hive/table/partition") >> >> Not working code inside spark hive tables created using hiveContext.sql >> insert into partition queries >> >> DataFrame df = >> hiveContext.read().format("orc").load("/hdfs/path/to/hive/table/partition/created/by/spark") >> >> You see above is same in both cases just second code is trying to load >> orc data created by Spark. >> On Sep 30, 2015 11:22 AM, "Zhan Zhang" <zzh...@hortonworks.com> wrote: >> >>> Hi Umesh, >>> >>> The potential reason is that Hive and Spark does not use same >>> OrcInputFormat. In new hive version, there are NewOrcInputFormat, but it is >>> not in spark because of backward compatibility (which is not available in >>> hive-0.12). >>> Do you mind post the code that works and not works for you? >>> >>> Thanks. >>> >>> Zhan Zhang >>> >>> On Sep 29, 2015, at 10:05 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote: >>> >>> Hi I can read/load orc data created by hive table in a dataframe why is >>> it throwing Malformed ORC exception when I try to load data created by >>> hiveContext.sql into dataframe? >>> On Sep 30, 2015 2:37 AM, "Hortonworks" <zzh...@hortonworks.com> wrote: >>> >>>> You can try to use data frame for both read and write >>>> >>>> Thanks >>>> >>>> Zhan Zhang >>>> >>>> >>>> Sent from my iPhone >>>> >>>> On Sep 29, 2015, at 1:56 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote: >>>> >>>> Hi Zang, thanks for the response. Table is created using Spark >>>> hiveContext.sql and data inserted into table also using hiveContext.sql. >>>> Insert into partition table. When I try to load orc data into dataframe I >>>> am loading particular partition data stored in path say >>>> /user/xyz/Hive/xyz.db/sparktable/partition1=abc >>>> >>>> Regards, >>>> Umesh >>>> On Sep 30, 2015 02:21, "Hortonworks" <zzh...@hortonworks.com> wrote: >>>> >>>>> How was the table is generated, by hive or by spark? >>>>> >>>>> If you generate table using have but read it by data frame, it may >>>>> have some comparability issue. >>>>> >>>>> Thanks >>>>> >>>>> Zhan Zhang >>>>> >>>>> >>>>> Sent from my iPhone >>>>> >>>>> > On Sep 29, 2015, at 1:47 PM, unk1102 <umesh.ka...@gmail.com> wrote: >>>>> > >>>>> > Hi I have a spark job which creates hive tables in orc format with >>>>> > partitions. It works well I can read data back into hive table using >>>>> hive >>>>> > console. But if I try further process orc files generated by Spark >>>>> job by >>>>> > loading into dataframe then I get the following exception >>>>> > Caused by: java.io.IOException: Malformed ORC file >>>>> > hdfs://localhost:9000/user/hive/warehouse/partorc/part_tiny.txt. >>>>> Invalid >>>>> > postscript. >>>>> > >>>>> > Dataframe df = hiveContext.read().format("orc").load(to/path); >>>>> > >>>>> > Please guide. >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Hive-ORC-Malformed-while-loading-into-spark-data-frame-tp24876.html >>>>> > Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com <http://nabble.com/>. >>>>> > >>>>> > --------------------------------------------------------------------- >>>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> > For additional commands, e-mail: user-h...@spark.apache.org >>>>> > >>>>> > >>>>> >>>>> -- >>>>> CONFIDENTIALITY NOTICE >>>>> NOTICE: This message is intended for the use of the individual or >>>>> entity to >>>>> which it is addressed and may contain information that is confidential, >>>>> privileged and exempt from disclosure under applicable law. If the >>>>> reader >>>>> of this message is not the intended recipient, you are hereby notified >>>>> that >>>>> any printing, copying, dissemination, distribution, disclosure or >>>>> forwarding of this communication is strictly prohibited. If you have >>>>> received this communication in error, please contact the sender >>>>> immediately >>>>> and delete it from your system. Thank You. >>>>> >>>> >>>> CONFIDENTIALITY NOTICE >>>> NOTICE: This message is intended for the use of the individual or >>>> entity to which it is addressed and may contain information that is >>>> confidential, privileged and exempt from disclosure under applicable law. >>>> If the reader of this message is not the intended recipient, you are hereby >>>> notified that any printing, copying, dissemination, distribution, >>>> disclosure or forwarding of this communication is strictly prohibited. If >>>> you have received this communication in error, please contact the sender >>>> immediately and delete it from your system. Thank You. >>> >>> >>> > >