https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-CollectionFunctions
cheers,
-James
On Wed, Mar 21, 2012 at 4:30 PM, Saurabh S wrote:
> How do I get the length of an array in Hive?
>
> Specifically, I'm looking at the following problem: I'm splitting a column
> using the
How do I get the length of an array in Hive?
Specifically, I'm looking at the following problem: I'm splitting a column
using the split() function and a pattern. However, the resulting array can have
variable number of entries and I want to handle each case separately.
I have filed a JIRA that describes the desired 'IF NOT EXISTS' functionality:
https://issues.apache.org/jira/browse/HIVE-2889
From: Gabi D mailto:gabi...@gmail.com>>
Reply-To: mailto:user@hive.apache.org>>
Date: Wed, 21 Mar 2012 10:52:25 +0200
To: mailto:user@hive.apache.org>>
Subject: Re: LOAD
Hi All,
My raw data looks like this:
DateTime,OtherData
01-01-2000-01:00:00,blablabla1
01-01-2000-04:00:00,blablabla2
01-02-2000-02:00:00,blablabla3
I would like to partition on the datepart of DateTime. What does *not *work,
unfortunately, is this:
Create table mytable (DateTime
I figured it out.
To help the future generations:
The problem was in property hive.groupby.mapaggr.checkinterval which
defaults to 10. Since I was doing 'group by' query and each row was 4Kb
and each mapper got only 3 rows, no mapper had an opportunity to do
whatever checkinterval option w
We also do the check before loading the file into hive, but we're not very
happy with this solution. A hack on the backend is better since a hack on
the front end has to happen for every file while a hack on the backend
would actually happen only for duplicate files. So performance wise backend
is