In my try, it worked (and should be). CREATE EXTERNAL TABLE MasterTable ( column1 STRING, column2 STRING) LOCATION 'hdfs://localhost:9000/home/navis/my_location';
CREATE VIEW IF NOT EXISTS View1 (column1, column2) AS SELECT column1, column2 FROM MasterTable WHERE column1<30; CREATE VIEW IF NOT EXISTS View2 (column1, column2) AS SELECT column1, column2 FROM MasterTable WHERE column1<100; SELECT View1.* FROM View1 JOIN View2 ON (View1.column1 = View2.column1); below is result of explain, which takes single whole scan for master table and handled by two TS followed by FIL with expected predicates. view1:view1:mastertable TableScan alias: mastertable Filter Operator predicate: expr: (column1 < 30) type: boolean view2:view2:mastertable TableScan alias: mastertable Filter Operator predicate: expr: (column1 < 100) type: boolean Truncated Path -> Alias: hdfs://localhost:9000/home/navis/my_location [view1:view1:mastertable, view2:view2:mastertable] Can I ask the version of hive you are using? 2014/1/9 Petter von Dolwitz (Hem) <petter.von.dolw...@gmail.com> > Hi Navis (and others), > > seems like my solution with views does not work after all. That is, it > works fine as long as I do not use filter pushdown. My setup is something > like below: > > CREATE EXTERNAL TABLE MasterTable ( > column1 STRING, > column2 STRING, > column3 STRING > column4 STRING) > PARTITIONED BY (partition INT) > ROW FORMAT SERDE 'MySerde' > STORED AS INPUTFORMAT 'MyInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION 'my_location'; > > CREATE VIEW IF NOT EXISTS View1 > (column1, column2, column3, column4, partition) > PARTITIONED ON (partition) > AS SELECT column1, column2, column3, column4, partition > FROM MasterEventTable > WHERE column1='value1' AND column2='value2'; > > CREATE VIEW IF NOT EXISTS View2 > (column1, column2, column3, column4, partition) > PARTITIONED ON (partition) > AS SELECT column1, column2, column3, column4, partition > FROM MasterEventTable > WHERE column1='value3' AND column2='value4'; > > > The following query works fine without filter pushdown: > SELECT View1.* FROM View1 JOIN View2 ON (View1.column3 = View2.column3); > > Now if I enable filter pushdown (setting hive.optimize.index.filter=true) > and apply the filter in my record reader I do not get the correct result. I > do not get any records back at all. It seems like only the second filter > (column1='value3' AND column2='value4) is pushed to my record reader. The > underlying file is only traversed once. I would have expected that I either > got an OR expression down ((column1='value3' AND column2='value4) OR > (column1='value1' AND column2='value2)) or that the underlying file was > scanned twice with each separate expression. > > Do you have any thoughts on this? > > Thanks, > Petter > > > > > > 2013/12/22 Petter von Dolwitz (Hem) <petter.von.dolw...@gmail.com> > > Hi Navis, >> >> thank you for sorting this out! I have tried getting around this by using >> views towards a single master table instead in combination with UDFs >> instead . Seems to work so far. >> >> /Petter >> >> >> 2013/12/18 Navis류승우 <navis....@nexr.com> >> >>> Hive uses path to table(or partition) mapping internally (you can see >>> that in MapredWork, etc.), which might caused first table overwritten by >>> other. >>> >>> I didn't tried symlink on hdfs, which could be a solution. >>> >>> >>> >>> 2013/12/12 Petter von Dolwitz (Hem) <petter.von.dolw...@gmail.com> >>> >>> Hi, >>>> >>>> I have declared several external tables pointing to the same location. >>>> The things that tells these tables apart (apart from their names) is that >>>> they have unique properties. These properties help me choose the correct >>>> rows from the underlying file. I use a single storage handler (accompanied >>>> by a single InputFormat and a single Serde) . The first columns in all >>>> tables are the same but the last (a struct) is unique and >>>> is constructed from the Serde (with help of the serde properties). A >>>> simplified version of the tables look like so: >>>> >>>> CREATE EXTERNAL TABLE Table1 ( >>>> column1 STRING, >>>> column2 STRING) >>>> STORED BY 'MyStorageHandler' >>>> WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass1') >>>> LOCATION 'mylocation' >>>> TBLPROPERTIES('recordreader.filter'='table1_filter'); >>>> >>>> CREATE EXTERNAL TABLE Table2 ( >>>> column1 STRING, >>>> column2 STRING) >>>> STORED BY 'MyStorageHandler' >>>> WITH SERDEPROPERTIES ('ser.class'='MyStructSerializationClass2') >>>> LOCATION 'mylocation' >>>> TBLPROPERTIES('recordreader.filter'='table2_filter'); >>>> >>>> >>>> All works well for simple select queries towards the two tables. The >>>> following query gives very strange results though: >>>> >>>> SELECT * FROM ( >>>> SELECT column1,'Table1' FROM Table1 WHERE column2 = 'myValue' >>>> union all >>>> SELECT column1,'Table2' FROM Table2 WHERE column2 = 'myValue' >>>> ) my_union >>>> ORDER BY my_union.column1 >>>> >>>> >>>> It seems like one job task is created per file stored in the table >>>> location. This task gets the table properties from the second table and in >>>> the SerDe-step later on it seems like the records gets mixed up. >>>> >>>> I would have expected that hive would need to iterated the source files >>>> two times using two different tasks (with the correct table properties >>>> passed) in order to get this to work. >>>> >>>> Anyone here that can shed some light on this scenario? >>>> >>>> Thanks, >>>> Petter >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >