Hello,
I am making a query such that:
insert overwrite table selection_hourly_clicks partition (date_hour =
PARTNAME) select sel_sid, count(*) cc from (select
split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from item_raw
iv where iv.date_hour='PARTNAME' AND iv.referrer_url is not null AN
Hi,
There are quite a few databases online with known robots.
http://www.robotstxt.org/db.html
and http://www.botsvsbrowsers.com/category/1/index.html comes to mind. The
hardest part is figuring out the suspect robots which do not identify
themselves.
From: Ca
CB: does the dynamic partitioning fill your need? I don't totally understand
it but if it does, awesome.
Otherwise there isn't a for/each construct in HiveQL. You'd have to write an
external program.
I'm curious though, do you have to reprocess each partition each day or is
there a partitio
One of the impediments for uptake of the CREATE VIEW feature in Hive has been
the lack of partition awareness. This made it non-transparent to replace a
table with a view, e.g. for renaming purposes. To address this as well as some
other use cases, I'm proposing the first steps towards view pa
Hi Cam,
A bit of information that may be useful for you, Cloudera's Oozie has a Hive
action that you can use from workflow jobs.
Cheers
Alejandro
On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz wrote:
> Hello,
>
> I am looking over oozie's coordinator. But meanwhile, I managed to
> write a simple j
Try Azkaban - We use it here @ngmoco to run MR Jobs (not Hive Queries) and its
pretty good - http://sna-projects.com/azkaban/
Also, it is faster learning / easy to setup. I have never worked on Oozie so I
can't compare but you can google it.
On Feb 8, 2011, at 7:44 PM, Cam Bazz wrote:
> Hello
You can use dynamic partitioning:
insert overwrite table item_view_aggregate partition
(date_hour) select iv.sid, count(*), date_hour from item_view iv where
(iv.date_hour='2011310116' or date_hour=''' or date_hour='.)
group by iv.sid, date_hour;
On 2/9/11 5:49 AM, "Cam Bazz" wrote:
>We
Well, I designed my dataflow to work incrementally based on
partitions. But I have a number of datafiles now,
and for the first run, I have to for example:
insert overwrite table item_view_aggregate partition
(date_hour=2011310116) select iv.sid, count(*) from item_view iv where
iv.date_hour='2011