Re: Hive Query Performance Tuning

2019-12-03 Thread Matthew Dixon
Hi Rajbir, some thoughts to consider, I’m wondering what the row_number() functionality is doing. Because the window frame has no ORDER BY clause the result may not be deterministic, is this the expected behaviour? I ask because analytic functions can be expensive to compute so make sure you

RE: COMMERCIAL:Re: COMMERCIAL:Re: Hive - regexp_replace function for multiplestrings

2015-02-17 Thread Matthew Dixon
be - Ilikelisteningtorockmusic since with the IN condition it selected this statement and its replacing all spaces with no space as per the regexp_replace function. Correct me if I am understanding your solution 2 wrong? Thanks, Viral From: Matthew Dixon mailto:matthew.di...@jagex.c

RE: COMMERCIAL:Re: Hive - regexp_replace function for multiple strings

2015-02-06 Thread Matthew Dixon
Below 2 solutions. Solution1 uses lookahead and lookbehind but works with bi-grams only. It also doesn’t enforce the pairs you’re asking for, so for instance hip music would become hipmusic. Solution2 uses simple IN syntax with if(), works with n-grams beyond bi-grams and enforces the actual

RE: COMMERCIAL:Re: Partitioned table and Bucket Map Join

2015-01-30 Thread Matthew Dixon
Not sure if this is going to solve your problems and I agree with your point about partition join optimisation but if your query is indeed an inner join (and not A LEFT OUTER JOIN B) then you should arrange your table in order from smallest to biggest. See this section on the hive wiki: https: