I have come across a strange situation in hive and I want to know if there is 
an explanation.  The CASE operation below does not work when the operator is != 
but does work then the operator is =.  Maybe it is true that an = operation is 
valid if a value is NULL.  But an != operation is invalid if a value is NULL.  
That seems bizarre.  Is this a bug or can this be explained?

I am using Amazon EMR w/hadoop v1.0.3 & hive v0.11.0

-- the following SQL results are expected such that the last column is 1 or 0
SELECT
    user_name
  , val0
  , val1
  , CASE WHEN val0 = val1 THEN 1 ELSE 0 END
FROM
    (
    SELECT
        user_name
      , MIN(STR_TO_MAP(kvp, "&", "=")['val0']) AS val0
      , MIN(STR_TO_MAP(kvp, "&", "=")['val1']) AS val1
    FROM
        stgdb.fact_webrequest
    GROUP BY
        user_name
    ) x;

user0 42.01   42.01   1
user1 NULL    14.1301 0
user2 NULL    15.03   0
user3 NULL    43.01   0
user4 NULL    40.05   0
user5 NULL    13.1305 0
user6 51.0913 51.0913 1
user7 NULL    11.0701 0
user8 NULL    52.02   0

-- the following SQL results are strange such that the last column is always 0
SELECT
    user_name
  , val0
  , val1
  , CASE WHEN val0 != val1 THEN 1 ELSE 0 END
FROM
    (
    SELECT
        user_name
      , MIN(STR_TO_MAP(kvp, "&", "=")['val0']) AS val0
      , MIN(STR_TO_MAP(kvp, "&", "=")['val1']) AS val1
    FROM
        stgdb.fact_webrequest
    GROUP BY
        user_name
    ) x;

user0 42.01   42.01   0
user1 NULL    14.1301 0
user2 NULL    15.03   0
user3 NULL    43.01   0
user4 NULL    40.05   0
user5 NULL    13.1305 0
user6 51.0913 51.0913 0
user7 NULL    11.0701 0
user8 NULL    52.02   0


Blaine Elliott
Chegg | Senior Data Engineer
* 805 637 4556 | * bla...@chegg.com<mailto:bla...@chegg.com>

Reply via email to