> If I have a series of entries that look like
...
> { "update", {"baz" : "bar" }}

Due to the way the split distribution works, you need a global ordering
key for each operation.

0, "ADD", "baz", ""
1, "SET", "baz", "bar"
2, "DEL", "baz", null

If you do not have updates coming in within a second, you could store a
timestamp.

Then you can write a windowing function for Hive to merge/order them.

select flatten_txns(op, key, value) over (partition by key order by ts)
from txns;

At this point, you're nearly reinventing what Hive's own
insert/update/delete statements do.

Except, compared to that, these updates are faster (since it's really an
unconditional SET).

Cheers,
Gopal


Reply via email to