[ 
https://issues.apache.org/jira/browse/HIVE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977003#comment-14977003
 ] 

Bolke de Bruin commented on HIVE-12216:
---------------------------------------

Update: we have noticed this mainly happens with large datasets and smaller 
datasets do not seem to affected. Also we see a similar issue that we will 
report separately.

> WHERE on the FROM table not (always) working when JOIN are present
> ------------------------------------------------------------------
>
>                 Key: HIVE-12216
>                 URL: https://issues.apache.org/jira/browse/HIVE-12216
>             Project: Hive
>          Issue Type: Bug
>          Components: hpl/sql, Parser, Query Processor, SQL
>    Affects Versions: 1.1.0, 1.2.1
>         Environment: CDH 5.4.7 HDP2.3.2 MR TEZ
>            Reporter: Bolke de Bruin
>            Priority: Blocker
>
> In case we use a where clause in a state where also joins are present, the 
> clauses are not (always) respected. We have been able to reproduce this issue 
> consistently with Hive 1.1.0 on MR, Hive 1.2.1 on Tez (MR Fails here). 
> So fo the below query we *do* get results back like:
> 'gs.i_s_c = 23' (and this goes for all clauses!)
> CREATE TABLE tmp_hub_and_sats AS
> SELECT
>    f.dt,
>    f.t_c,
>    sum(f.transaction_amount) as amount,
>    sum(f.amount_euro) amount_euro,
>    IF(f.org_grid is null, f.org_cust, f.org_grid) as org,
>    IF(f.org_grid is null, 0, 1) as is_org_grid,
>    IF(f.org_up is null, if(f.org_grid is null, f.org_cust, f.org_grid), 
> f.org_up) as org_up,
>    IF(f.to_grid is null, f.to_cust, f.to_grid) to,
>    IF(f.to_grid is null, 0, 1) as is_to_grid,
>    IF(f.to_up is null, if(f.to_grid is null, f.to_cust, f.to_grid), f.to_up) 
> as to_up,
>    gh.i_g_c as customer_code_hub,
>    gs.i_g_c as customer_code_satellite
> from x_grid_orders f
> LEFT OUTER JOIN
>  grid.grid gh
>  ON f.org_grid = gh.hashed_gridid
> LEFT OUTER JOIN
>  grid.grid gs
>  ON f.to_grid = gs.hashed_gridid
> where
> IF(f.org_up is null, f.org_cust, f.org_up) <> IF(f.to_up is null, f.to_cust, 
> f.to_up)
>  AND
>  (substring(gh.i_g_c, 1, 2) <> "06" or gh.i_g_c is null)
>  AND
>  (substring(gs.i_g_c, 1, 2) <> "06" or gs.i_g_c is null)
>  AND
>  (gh.i_s_c <> "23" or gh.i_s_c is null)
>  AND
>  (gs.i_s_c <> "23" or gs.i_s_c is null)
> group by
>  f.dt,
>  f.t_c,
>  IF(f.org_grid is null, f.org_cust, f.org_grid),
>  IF(f.org_grid is null, 0, 1),
>  IF(f.org_up is null, if(f.org_grid is null, f.org_cust, f.org_grid), 
> f.org_up),
>  IF(f.to_grid is null, f.to_cust, f.to_grid),
>  IF(f.to_grid is null, 0, 1),
>  IF(f.to_up is null, if(f.to_grid is null, f.to_cust, f.to_grid), f.to_up),
>  gh.i_g_c,
>  gs.i_g_c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to