Hello,
In case anyone will be looking for a similar solution in future I put a
short blog post on this subject:
http://www.dataminelab.com/blog/calculating-unique-visitors-in-hadoop-and-hive/
Best,
Radek
On 14 January 2011 12:50, Radek Maciaszek wrote:
> Hi Itai,
>
> I did not th
Hello,
I was wondering if anyone managed to unit test Hive scripts and share
his/her experience? My first thought was to prepare sample data, run hive
scripts in order to generate output and then compare the generated output
with the expected output. Sounds fairly simple but it may be a bit
compli
s with 00.
>
> At the end we multiply by 256 and get a pretty close number to the real
> number.
>
>
>
> Itai
>
>
>
> On 01/14/2011 01:14 PM, Radek Maciaszek wrote:
>
> Hi,
>>
>> I am working on some large scale unique users analysis (think hund
Hi,
I am working on some large scale unique users analysis (think hundreds of
millions of records per day). Since number of all records per month goes
into many billions I am hoping that there may be some alternative to running
"SELECT DISTINCT user_unique_id..." such as sampling data or perhaps
d