Re: Unique users analysis

2011-09-07 Thread Radek Maciaszek
Hello, In case anyone will be looking for a similar solution in future I put a short blog post on this subject: http://www.dataminelab.com/blog/calculating-unique-visitors-in-hadoop-and-hive/ Best, Radek On 14 January 2011 12:50, Radek Maciaszek wrote: > Hi Itai, > > I did not th

Unit testing Hive script

2011-02-18 Thread Radek Maciaszek
Hello, I was wondering if anyone managed to unit test Hive scripts and share his/her experience? My first thought was to prepare sample data, run hive scripts in order to generate output and then compare the generated output with the expected output. Sounds fairly simple but it may be a bit compli

Re: Unique users analysis

2011-01-14 Thread Radek Maciaszek
s with 00. > > At the end we multiply by 256 and get a pretty close number to the real > number. > > > > Itai > > > > On 01/14/2011 01:14 PM, Radek Maciaszek wrote: > > Hi, >> >> I am working on some large scale unique users analysis (think hund

Unique users analysis

2011-01-14 Thread Radek Maciaszek
Hi, I am working on some large scale unique users analysis (think hundreds of millions of records per day). Since number of all records per month goes into many billions I am hoping that there may be some alternative to running "SELECT DISTINCT user_unique_id..." such as sampling data or perhaps d