t.join=true;
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -Original Message-
> From: Cam Bazz
> Date: Wed, 13 Jun 2012 19:46:18
> To:
> Reply-To: user@hive.apache.org
> Subject: joining user sessions
>
> hello,
>
> for
hello,
for all the log files i have i log the session id and user cookie. now
i need to seperate certain items of certain users, so i need to join
all my data to a global cookike table.
what are some common practices doing this? just put it in a table and
join? or maybe keep them in some sort of
g the hive query from your code? that will
> be helpful to answer your question
>
>
> On Mon, Jun 11, 2012 at 6:15 PM, Cam Bazz wrote:
>>
>> Hello,
>>
>> I have finally wrote a program to upload my data to amazon s3, start a
>> cluster on amazon emr, and
Hello,
I have finally wrote a program to upload my data to amazon s3, start a
cluster on amazon emr, and recover my partitions, and can issue simple
queries on hive.
now I would like to:
select count(*),itemSid from items group by itemSid <- gives me how
many times an item as viewed
and another
Hello All,
So I had a single node pseudo cluster that has been calculating me
some statistics running for a year. finally it grew more than
do-it-at-home task.
So I have my data uploaded to s3, and I have configured everything so
that I can load my tables, and load the partitions, and the data is
Hello,
Now I would like to count impressions per item. To achieve this, I
made a logger, for instance when the user goes in a category or search
page, and some items are listed, I am logging:
CATPAGE CAT11,2,3,4,5
CATPAGE CAT26,7,8,9,10
SEARCH keyword 1,6
basically I am logging
he entire table before limiting the data. If your data
> is not partitioned please go ahead and remove that restriction.
> - I join on the date_day columns to make sure the data is correct if the
> tables are not partitioned or the query plan causes table scans because
> there are cha
Hello,
I have three tables, one that counts hits, the other unique visits,
and the other clicks on that page:
The query below will fail to produce correct results: (number of
uniques is wrong, always set to 8, same number for all)
select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null,
c.c
d, count(distinct ip_number, session_id) from item_raw where
>> date_day = '20110202' group by item_sid;
>> On Mon, Feb 21, 2011 at 9:42 PM, Cam Bazz wrote:
>>>
>>> The query you have produced mulltiple item_sid's.
>>>
>>> This is rath
er, session_id (I've not test it, maybe it should be
> concat(ip_number, session_id) instead of ip_number, session_id )
> is what you want.
>
> 2011/2/21 Cam Bazz
>>
>> Hello,
>>
>> So I have table of item views with item_sid, ip_number, session_id
>>
and ip_number from group by
> clause) and then join with the parent table to get session_id and
> ip_number.
>
> -Ajo
>
> On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz wrote:
>>
>> Hello,
>>
>> So I have table of item views with item_sid, ip_number, session_id
>&
Hello,
So I have table of item views with item_sid, ip_number, session_id
I know it will not be that exact, but I want to get unique views per
item, and i will accept ip_number, session_id tuple as an unique view.
when I want to query just item hits I say: select item_sid, count(*)
from item_raw
Hello,
When we do a left outer join, and the right table does not have row,
it will return NULL s for those values.
is there any way to turn those nulls into 0's ? since it is cointing
operation, if the right table does not have the row, it means 0's not
nulls.
best regards,
-c.b.
thats good news, but does it run in linux?
On Tue, Feb 15, 2011 at 6:48 PM, Guy Doulberg wrote:
> Hey,
>
>
>
> I started using Toad for querying hive, looks promising
>
>
>
> http://nosql.mypopescu.com/post/2913202510/hive-and-hbase-in-toad-for-cloud-demo
>
> http://toadforcloud.com/index.jspa
>
Hello,
So all my statistics is finally being calculated, results being
processed etc, i have a 1 node cluster. Mainly taking 3 aggreate logs
from my apache logs.
How far this setup will go? I have another machine ready to be hooked
up to my setup, and i wonder if it is worth at the moment to add
Hello,
When we write to an extermal file I noticed that it creates a
directory and files like:
attempt_201102130126_0014_m_00_0, attempt_201102130126_0014_m_01_0
with different parts of the data inside them.
i will be loading these files to rdbms for quick lookup, that is at a
remote se
java:167)
at hivecommander.Main.main(Main.java:124)
how can i know, this is due to some partition preexisting, and not
some other error?
Best regards,
-c.b.
On Sun, Feb 13, 2011 at 12:36 AM, Edward Capriolo wrote:
> On Sat, Feb 12, 2011 at 5:30 PM, Cam Bazz wrote:
>> ok, is there a
Hello,
Is it possible to delete rows belonging to a partition? or is it
undeletable like a table's data?
best regards,
-c.b.
ok, is there a way to do it from the
http://localhost:50075/browseDirectory.jsp interface?
On Sun, Feb 13, 2011 at 12:19 AM, Edward Capriolo wrote:
> On Sat, Feb 12, 2011 at 5:17 PM, Cam Bazz wrote:
>> Hello,
>>
>> Is it possible to delete rows belonging to a partition? or
leaves a file handle open or if it's some
> other process. my script does not run on the same box so it's definitely not
> my script that is holding onto file handles.
> -Viral
> On Fri, Feb 11, 2011 at 5:20 PM, Cam Bazz wrote:
>>
>> yes, i have a lot of small fi
hat's the web page for your hdfs
> namenode. It has status information on your hdfs including size.
>
> Pat
>
> -Original Message-
> From: Cam Bazz [mailto:camb...@gmail.com]
> Sent: Friday, February 11, 2011 4:55 PM
> To: user@hive.apache.org
> Subject: Re: e
Feb 11, 2011 at 2:51 PM, Cam Bazz wrote:
>>
>> Hello,
>>
>> I sometimes need to delete everything in hdfs and recreate the tables.
>>
>> The question is: how do I clear everything in the hdfs and hive?
>>
>> I delete everything in /tmp, hadoop/logs
.
best regards,
-cam
On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
wrote:
> Is your hdfs hitting its space limits?
>
> Pat
>
> -Original Message-----
> From: Cam Bazz [mailto:camb...@gmail.com]
> Sent: Friday, February 11, 2011 4:38 PM
> To: user@hive.apache.org
Hello,
I set up my one node pseudo distributed system, left with a cronjob,
copying data from a remote server and loading them to hadoop, and
doing some calculations per hour.
It stopped working today, giving me this error. I deleted everything,
and made it reprocess from beginning, and i still g
Hello,
I sometimes need to delete everything in hdfs and recreate the tables.
The question is: how do I clear everything in the hdfs and hive?
I delete everything in /tmp, hadoop/logs and any metastore_db i can find.
then hadoop namenode -format
-
is this enough?
best regards,
c.b.
Hello,
The query below produces a blank file, when no results are found:
insert overwrite table selection_hourly_clicks partition (date_hour = PARTNAME)
select sel_sid, count(*) cc from (
select split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from
item_raw iv where iv.date_hour='PARTNAME
itions that I processed that day.
b. I dont know how to calculate the name of the new partiton, like:
20110210, without resorting to an external program.
Any ideas greatly appreciated,
Best Regards,
-Cam Bazz
Hello,
I am making a query such that:
insert overwrite table selection_hourly_clicks partition (date_hour =
PARTNAME) select sel_sid, count(*) cc from (select
split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from item_raw
iv where iv.date_hour='PARTNAME' AND iv.referrer_url is not null AN
uming dateval is a field in your table and partition is your partition
> field.
>
> Lemme know if you need more or if it doesn't work. I'll check it out
> tomorrow at work.
>
> Pat
>
>
>
> -- Sent from my Palm Pre
>
>
>
Hello,
How can I do some process for each partition in some other table.
for example lets say table A has partitions 1,2,3
I want to be able to say
for each partition in A do {
select * from A where partition is ? into some othertable where partition is ?
}
Best Regards,
C.B.
Hello,
Is there a practical way to filter the logs left by crawlers like google?
They usually have user-agent strings like
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
is there a database for the
wrote:
> Hey Cam,
> You should use Oozie's
> Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
> Regards,
> Jeff
>
> On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz wrote:
>>
>> Hello,
>>
>> What kind of strategy must i follow, in order t
Hello,
What kind of strategy must i follow, in order to periodically run
certain things.
For example, each hour, i want to look up log files from certain dir,
and for new files, i need to run:
load data local inpath '/home/cam/logs/log.2011310120' into table
item_view_raw partition (date_hour=20
Hello,
After doing some aggregate counting, I now have data in a table like this:
id countdate_hour (this is a partition name)
1 3 2011310115
1 1 2011310116
2 1 2011310117
2 1 2011310118
and I need to turn this into:
1 [2011310115,2011310
thank you very much, exactly what i needed.
On Mon, Jan 31, 2011 at 9:57 PM, Adam O'Donnell wrote:
> I would create separate partitions, one for each day worth of data,
> and then drop the partitions that are no longer needed.
>
> On Mon, Jan 31, 2011 at 11:56 AM, Cam Bazz
Hello,
I understand there is no way to delete data stored in a table. like a
`delete from table_name` in the hive language.
All this is fine, because all the query results are inserted into
another table, and the previous data in it is overwritten.
When we need to store data collectively in to a
Cryans wrote:
> It seems the hadoop version you're running isn't the same as the one
> that hive is using. Check the lib/ folder and if it's not the same,
> replace the hadoop jars with the ones from the version you're running.
>
> J-D
>
> On Mon, Jan 31, 2
Hello,
I have written a problem in my previous email. I now tried: select
item_view_raw.* from item_view_raw WHERE log_level = 'INFO';
and I get the same error. select * from item_view_raw works just fine,
but when i do a WHERE clause to any column I get the same exception:
Total MapReduce jobs
Hello,
I just started hive today. Following instructions I did set it up, and
made it work to play with my web server log files.
I created two tables:
CREATE TABLE item_view(view_time BIGINT, ip_number STRING, session_id
STRING, session_cookie STRING, referrer_url STRING, eser_sid INT,
sale_stat
39 matches
Mail list logo