I think in HIVE the partitioned columns are virtual. They are not physical
data columns but a directory name corr. to the partition key value to
facilitate partition pruning during select.
On May 14, 2012 6:29 AM, "Mark Grover" wrote:
> Hi Shin,
> If you could list the query that failed and the q
Hi Jon,
First off, processing the data from (say, Apache) logs in Hive and storing
aggregates in a reporting server like you mentioned is a fairly common paradigm.
You have some large scale data (Apache logs) and some dimension data (user
data). The problem you really have is how to make use of
Hive tables can sit on top of S3 storage so you dont really need a separate
export process
thanks,
Shrikanth
On May 15, 2012, at 11:35 AM, Jon Palmer wrote:
> That seems like a very reasonable approach. However, if we use a technology
> like Amazon Elastic Map Reduce my Hive cluster is (potenti
That seems like a very reasonable approach. However, if we use a technology
like Amazon Elastic Map Reduce my Hive cluster is (potentially) going to be
destroyed and recreated. As a result I'd really need to export the update
history Hive table to some other store (like S3) so that it can be re-
For one, your data size is so small that I am not sure that indexes would help
(the fixed cost of the extra MR job would probably over shadow any benefits
from indexes). AFAIK the difference b/w compact and bitmap indexes is how they
store the mapping from values to the rows in which the val
I would agree on keeping track of the history of updates in a separate table in
Hive (you may not need to maintain it in the application tier). This pattern
seems to be the "Slowly Changing Dimension" pattern used in other (more
traditional) Data Warehouses... I suspect the challenge here would
On Tue, May 15, 2012 at 5:11 AM, Jon Palmer wrote:
> I can see a few potential solutions:
>
> 1. Don’t solve it. Accept that you have some artifacts in your
> reporting data that cannot be recovered from the source data.
>
> 2. Create status and location history tables in the applicati
I knocked up the following when we were experimenting with Hive. I've been
meaning to go and tidy it up for a while, but using it with a separator of
"" (empty string) should have the desired effect. (Obviously the UDF throws
an exception if the array is empty, been meaning to fix that for a while.
I will write an UDF for array concatenation and upload on GIT if anyone
does not have it already
On Tue, May 15, 2012 at 7:24 PM, Zoltán Tóth-Czifra <
zoltan.tothczi...@softonic.com> wrote:
> Matt, thanks!
>
> Luckily the order of the parts of the date is correct (reordering them
> would bet he
Matt, thanks!
Luckily the order of the parts of the date is correct (reordering them would
bet he same craziness).
Finally it is:
regexp_replace(
date_sub(
to_date(
from_unixtime(
unix_timestamp()
)
), 1
), "[-]", ""
)
Nitin, concat apparently doesn't take arrays, and I did not find any other
What about wrapping it in regexp_replace(..., "[-]", "") ? It may not be the
cleanest, but I'd recommend passing variables from the shell :)
Matt Tucker
From: Zoltán Tóth-Czifra [mailto:zoltan.tothczi...@softonic.com]
Sent: Tuesday, May 15, 2012 9:27 AM
To: user@hive.apache.org
Subject: RE: Dat
may be something like this will work
can you try using concat(split(date_sub(),"-")))
split returns the array and then you can concat them as you want
if this does not work for you, writing a simple UDF is easy as well
Thanks,
nitin
On Tue, May 15, 2012 at 6:56 PM, Zoltán Tóth-Czifra <
zoltan
Nitin,
Thank you. As you see below I know and use this function. My problem is that it
doesn't give MMDD format, but -MM-DD instead, and formatting is not
trivial as you can see it too.
From: Nitin Pawar [nitinpawar...@gmail.com]
Sent: Tuesday, May 15, 2
you may want to have a look at this function
date_sub(string startdate, int days)Subtract a number of days to startdate:
date_sub('2008-12-31', 1) = '2008-12-30'
On Tue, May 15, 2012 at 6:41 PM, Zoltán Tóth-Czifra <
zoltan.tothczi...@softonic.com> wrote:
> Hi guys,
>
> Thanks you very much in a
Hi guys,
Thanks you very much in advance for your help.
My problem in short is getting the date for yesterday in a MMDD format. As
I use this format for partitions, I need this format in quite some queries.
So far I have this:
concat(
year( date_sub( to_date( from_unixtime( unix_timestamp(
All,
I'm a relative newcomer to Hadoop/Hive. We have a very standard setup of
multiple webapp servers backed by a mySql database. We are evaluating Hive as a
high scale solution for our relatively sophisticated reporting and analytics
needs. However, it's not clear what the best practices are a
the problem with hive server with jdbc currently is that it does not handle
concurrent connection in a seamless manner and chokes down on larger number
of parallel query executions.
For this one reason, I had actually written a pipeline kind of infra using
shell scripts which used to run queries a
Thanks all for their replies.
Just now I tried one thing that as folows:
1) I open tho two hive CLI. hive>
2) I have one query which takes 7 jobs for execution. I submitted that
query to both the CLI.
3) one of the hive CLI took 147.319 seconds and second one took: 161.542
seconds
4) Later I trie
Hi,
I'd like to document HIVE-2810 and possibly HIVE-1634 in the wiki.
Could I get edit access? My username is lars.francke.
I think it'd be great to have a policy to accept patches only with
documentation. There's a lot of stuff in Hive that people I know don't
use because it's not documented an
19 matches
Mail list logo