Hi Navis, thank you for sorting this out! I have tried getting around this by using views towards a single master table instead in combination with UDFs instead . Seems to work so far.
/Petter 2013/12/18 Navis류승우 <navis....@nexr.com> > Hive uses path to table(or partition) mapping internally (you can see that > in MapredWork, etc.), which might caused first table overwritten by other. > > I didn't tried symlink on hdfs, which could be a solution. > > > > 2013/12/12 Petter von Dolwitz (Hem) <petter.von.dolw...@gmail.com> > > Hi, >> >> I have declared several external tables pointing to the same location. >> The things that tells these tables apart (apart from their names) is that >> they have unique properties. These properties help me choose the correct >> rows from the underlying file. I use a single storage handler (accompanied >> by a single InputFormat and a single Serde) . The first columns in all >> tables are the same but the last (a struct) is unique and >> is constructed from the Serde (with help of the serde properties). A >> simplified version of the tables look like so: >> >> CREATE EXTERNAL TABLE Table1 ( >> column1 STRING, >> column2 STRING) >> STORED BY 'MyStorageHandler' >> WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass1') >> LOCATION 'mylocation' >> TBLPROPERTIES('recordreader.filter'='table1_filter'); >> >> CREATE EXTERNAL TABLE Table2 ( >> column1 STRING, >> column2 STRING) >> STORED BY 'MyStorageHandler' >> WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass2') >> LOCATION 'mylocation' >> TBLPROPERTIES('recordreader.filter'='table2_filter'); >> >> >> All works well for simple select queries towards the two tables. The >> following query gives very strange results though: >> >> SELECT * FROM ( >> SELECT column1,'Table1' FROM Table1 WHERE column2 = 'myValue' >> union all >> SELECT column1,'Table2' FROM Table2 WHERE column2 = 'myValue' >> ) my_union >> ORDER BY my_union.column1 >> >> >> It seems like one job task is created per file stored in the table >> location. This task gets the table properties from the second table and in >> the SerDe-step later on it seems like the records gets mixed up. >> >> I would have expected that hive would need to iterated the source files >> two times using two different tasks (with the correct table properties >> passed) in order to get this to work. >> >> Anyone here that can shed some light on this scenario? >> >> Thanks, >> Petter >> >> >> >> >> >> >> >