You can look for explode(), posexplode() UDF’s in hive.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
Thanks
Prasanth Jayachandran
On Oct 2, 2014, at 7:15 AM, Kevin Weiler wrote:
> Hi all,
>
> I wanted to note that I figured out a better soluti
Hi all,
I wanted to note that I figured out a better solution to my problem. I was
selecting each percentile I wanted to compute (0.1, 0.5, 0.9 etc) as an
individual percentile calculation which was blowing up my query. It turns out
that if you do it like this:
SELECT
PERCENTILE(col, array(0
Not an answer to your question, but you can compute approximate percentiles
with only the memory overhead of a single integer ( two integers if you
want better results)
http://link.springer.com/chapter/10.1007/978-3-642-40273-9_7
So you could pretty easily implement that above algorithm as a pyth
Hi All,
I have a query that attempts to computer percentiles on some datasets that are
well in excess of 100,000,000 rows and have thus opted to use percentile_approx
as we are routinely overrunning the memory. I’m having trouble finding a
threshold that I want to begin sampling. Before this da