Re: joining user sessions

2012-06-13 Thread Cam Bazz
t.join=true; > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -Original Message- > From: Cam Bazz > Date: Wed, 13 Jun 2012 19:46:18 > To: > Reply-To: user@hive.apache.org > Subject: joining user sessions > > hello, > > for

joining user sessions

2012-06-13 Thread Cam Bazz
hello, for all the log files i have i log the session id and user cookie. now i need to seperate certain items of certain users, so i need to join all my data to a global cookike table. what are some common practices doing this? just put it in a table and join? or maybe keep them in some sort of

Re: counting number of pageviews and unique pageviews

2012-06-11 Thread Cam Bazz
g the hive query from your code? that will > be helpful to answer your question > > > On Mon, Jun 11, 2012 at 6:15 PM, Cam Bazz wrote: >> >> Hello, >> >> I have finally wrote a program to upload my data to amazon s3, start a >> cluster on amazon emr, and

counting number of pageviews and unique pageviews

2012-06-11 Thread Cam Bazz
Hello, I have finally wrote a program to upload my data to amazon s3, start a cluster on amazon emr, and recover my partitions, and can issue simple queries on hive. now I would like to: select count(*),itemSid from items group by itemSid <- gives me how many times an item as viewed and another

amazon elastic mapreduce

2011-12-11 Thread Cam Bazz
Hello All, So I had a single node pseudo cluster that has been calculating me some statistics running for a year. finally it grew more than do-it-at-home task. So I have my data uploaded to s3, and I have configured everything so that I can load my tables, and load the partitions, and the data is

counting impressions strategy

2011-03-01 Thread Cam Bazz
Hello, Now I would like to count impressions per item. To achieve this, I made a logger, for instance when the user goes in a category or search page, and some items are listed, I am logging: CATPAGE CAT11,2,3,4,5 CATPAGE CAT26,7,8,9,10 SEARCH keyword 1,6 basically I am logging

Re: why this query gives wrong results

2011-02-23 Thread Cam Bazz
he entire table before limiting the data. If your data > is not partitioned please go ahead and remove that restriction. > - I join on the date_day columns to make sure the data is correct if the > tables are not partitioned or the query plan causes table scans because > there are cha

why this query gives wrong results

2011-02-23 Thread Cam Bazz
Hello, I have three tables, one that counts hits, the other unique visits, and the other clicks on that page: The query below will fail to produce correct results: (number of uniques is wrong, always set to 8, same number for all) select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.c

Re: calculating unique views based on ip, session_id

2011-02-23 Thread Cam Bazz
d, count(distinct ip_number, session_id) from item_raw where >> date_day = '20110202' group by item_sid; >> On Mon, Feb 21, 2011 at 9:42 PM, Cam Bazz wrote: >>> >>> The query you have produced mulltiple item_sid's. >>> >>> This is rath

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Cam Bazz
er, session_id (I've not test it, maybe it should be > concat(ip_number, session_id) instead of ip_number, session_id ) > is what you want. > > 2011/2/21 Cam Bazz >> >> Hello, >> >> So I have table of item views with item_sid, ip_number, session_id >>

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Cam Bazz
and ip_number from group by > clause) and then join with the parent table to get session_id  and > ip_number. > > -Ajo > > On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz wrote: >> >> Hello, >> >> So I have table of item views with item_sid, ip_number, session_id >&

calculating unique views based on ip, session_id

2011-02-21 Thread Cam Bazz
Hello, So I have table of item views with item_sid, ip_number, session_id I know it will not be that exact, but I want to get unique views per item, and i will accept ip_number, session_id tuple as an unique view. when I want to query just item hits I say: select item_sid, count(*) from item_raw

left outer join and nulls

2011-02-18 Thread Cam Bazz
Hello, When we do a left outer join, and the right table does not have row, it will return NULL s for those values. is there any way to turn those nulls into 0's ? since it is cointing operation, if the right table does not have the row, it means 0's not nulls. best regards, -c.b.

Re: TOAD for hive

2011-02-15 Thread Cam Bazz
thats good news, but does it run in linux? On Tue, Feb 15, 2011 at 6:48 PM, Guy Doulberg wrote: > Hey, > > > > I started using Toad for querying hive, looks promising > > > > http://nosql.mypopescu.com/post/2913202510/hive-and-hbase-in-toad-for-cloud-demo > > http://toadforcloud.com/index.jspa >

how far can I go with a 1 node cluster

2011-02-13 Thread Cam Bazz
Hello, So all my statistics is finally being calculated, results being processed etc, i have a 1 node cluster. Mainly taking 3 aggreate logs from my apache logs. How far this setup will go? I have another machine ready to be hooked up to my setup, and i wonder if it is worth at the moment to add

outputting to external file

2011-02-12 Thread Cam Bazz
Hello, When we write to an extermal file I noticed that it creates a directory and files like: attempt_201102130126_0014_m_00_0, attempt_201102130126_0014_m_01_0 with different parts of the data inside them. i will be loading these files to rdbms for quick lookup, that is at a remote se

Re: how to delete data for a partition

2011-02-12 Thread Cam Bazz
java:167) at hivecommander.Main.main(Main.java:124) how can i know, this is due to some partition preexisting, and not some other error? Best regards, -c.b. On Sun, Feb 13, 2011 at 12:36 AM, Edward Capriolo wrote: > On Sat, Feb 12, 2011 at 5:30 PM, Cam Bazz wrote: >> ok, is there a

how to delete data for a partition

2011-02-12 Thread Cam Bazz
Hello, Is it possible to delete rows belonging to a partition? or is it undeletable like a table's data? best regards, -c.b.

Re: how to delete data for a partition

2011-02-12 Thread Cam Bazz
ok, is there a way to do it from the http://localhost:50075/browseDirectory.jsp interface? On Sun, Feb 13, 2011 at 12:19 AM, Edward Capriolo wrote: > On Sat, Feb 12, 2011 at 5:17 PM, Cam Bazz wrote: >> Hello, >> >> Is it possible to delete rows belonging to a partition? or

Re: error out of all sudden

2011-02-12 Thread Cam Bazz
leaves a file handle open or if it's some > other process. my script does not run on the same box so it's definitely not > my script that is holding onto file handles. > -Viral > On Fri, Feb 11, 2011 at 5:20 PM, Cam Bazz wrote: >> >> yes, i have a lot of small fi

Re: error out of all sudden

2011-02-11 Thread Cam Bazz
hat's the web page for your hdfs > namenode.  It has status information on your hdfs including size. > > Pat > > -Original Message- > From: Cam Bazz [mailto:camb...@gmail.com] > Sent: Friday, February 11, 2011 4:55 PM > To: user@hive.apache.org > Subject: Re: e

Re: reset hive and hadoop

2011-02-11 Thread Cam Bazz
Feb 11, 2011 at 2:51 PM, Cam Bazz wrote: >> >> Hello, >> >> I sometimes need to delete everything in hdfs and  recreate the tables. >> >> The question is: how do I clear everything in the hdfs and hive? >> >> I delete everything in /tmp, hadoop/logs

Re: error out of all sudden

2011-02-11 Thread Cam Bazz
. best regards, -cam On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat wrote: > Is your hdfs hitting its space limits? > > Pat > > -Original Message----- > From: Cam Bazz [mailto:camb...@gmail.com] > Sent: Friday, February 11, 2011 4:38 PM > To: user@hive.apache.org

error out of all sudden

2011-02-11 Thread Cam Bazz
Hello, I set up my one node pseudo distributed system, left with a cronjob, copying data from a remote server and loading them to hadoop, and doing some calculations per hour. It stopped working today, giving me this error. I deleted everything, and made it reprocess from beginning, and i still g

reset hive and hadoop

2011-02-11 Thread Cam Bazz
Hello, I sometimes need to delete everything in hdfs and recreate the tables. The question is: how do I clear everything in the hdfs and hive? I delete everything in /tmp, hadoop/logs and any metastore_db i can find. then hadoop namenode -format - is this enough? best regards, c.b.

why some queries produce blank files

2011-02-11 Thread Cam Bazz
Hello, The query below produces a blank file, when no results are found: insert overwrite table selection_hourly_clicks partition (date_hour = PARTNAME) select sel_sid, count(*) cc from ( select split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from item_raw iv where iv.date_hour='PARTNAME

dynamic partition

2011-02-10 Thread Cam Bazz
itions that I processed that day. b. I dont know how to calculate the name of the new partiton, like: 20110210, without resorting to an external program. Any ideas greatly appreciated, Best Regards, -Cam Bazz

query returns sometext instead of none

2011-02-09 Thread Cam Bazz
Hello, I am making a query such that: insert overwrite table selection_hourly_clicks partition (date_hour = PARTNAME) select sel_sid, count(*) cc from (select split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from item_raw iv where iv.date_hour='PARTNAME' AND iv.referrer_url is not null AN

Re: for each partition

2011-02-09 Thread Cam Bazz
uming dateval is a field in your table and partition is your partition > field. > > Lemme know if you need more or if it doesn't work.  I'll check it out > tomorrow at work. > > Pat > > > > -- Sent from my Palm Pre > > >

for each partition

2011-02-08 Thread Cam Bazz
Hello, How can I do some process for each partition in some other table. for example lets say table A has partitions 1,2,3 I want to be able to say for each partition in A do { select * from A where partition is ? into some othertable where partition is ? } Best Regards, C.B.

filtering out crawlers

2011-02-08 Thread Cam Bazz
Hello, Is there a practical way to filter the logs left by crawlers like google? They usually have user-agent strings like Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) is there a database for the

Re: periodic execution

2011-02-08 Thread Cam Bazz
wrote: > Hey Cam, > You should use Oozie's > Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases. > Regards, > Jeff > > On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz wrote: >> >> Hello, >> >> What kind of strategy must i follow, in order t

periodic execution

2011-02-08 Thread Cam Bazz
Hello, What kind of strategy must i follow, in order to periodically run certain things. For example, each hour, i want to look up log files from certain dir, and for new files, i need to run: load data local inpath '/home/cam/logs/log.2011310120' into table item_view_raw partition (date_hour=20

data question

2011-01-31 Thread Cam Bazz
Hello, After doing some aggregate counting, I now have data in a table like this: id countdate_hour (this is a partition name) 1 3 2011310115 1 1 2011310116 2 1 2011310117 2 1 2011310118 and I need to turn this into: 1 [2011310115,2011310

Re: always insert overwrite, so how do we collect data?

2011-01-31 Thread Cam Bazz
thank you very much, exactly what i needed. On Mon, Jan 31, 2011 at 9:57 PM, Adam O'Donnell wrote: > I would create separate partitions, one for each day worth of data, > and then drop the partitions that are no longer needed. > > On Mon, Jan 31, 2011 at 11:56 AM, Cam Bazz

always insert overwrite, so how do we collect data?

2011-01-31 Thread Cam Bazz
Hello, I understand there is no way to delete data stored in a table. like a `delete from table_name` in the hive language. All this is fine, because all the query results are inserted into another table, and the previous data in it is overwritten. When we need to store data collectively in to a

Re: problem with hadoop or hive

2011-01-31 Thread Cam Bazz
Cryans wrote: > It seems the hadoop version you're running isn't the same as the one > that hive is using. Check the lib/ folder and if it's not the same, > replace the hadoop jars with the ones from the version you're running. > > J-D > > On Mon, Jan 31, 2

problem with hadoop or hive

2011-01-31 Thread Cam Bazz
Hello, I have written a problem in my previous email. I now tried: select item_view_raw.* from item_view_raw WHERE log_level = 'INFO'; and I get the same error. select * from item_view_raw works just fine, but when i do a WHERE clause to any column I get the same exception: Total MapReduce jobs

trouble loading from raw data table to data table

2011-01-31 Thread Cam Bazz
Hello, I just started hive today. Following instructions I did set it up, and made it work to play with my web server log files. I created two tables: CREATE TABLE item_view(view_time BIGINT, ip_number STRING, session_id STRING, session_cookie STRING, referrer_url STRING, eser_sid INT, sale_stat