Re: Hive for large statistics tables?

Wojciech Langiewicz Tue, 27 Sep 2011 06:34:24 -0700

Hello,

I'm using Hive to query data like yours. In my case I have about 300 -500GB data per day, so it is much larger. We use Flume to load data intoHive - data is rolled every day (this can be changed).

Hive queries - ad-hoc or scheduled usually take at least 10-20s or more(possibly hours) - it won't speed up your processing. Hive shows itpower when you reach more data than serveral GB per month.

I think, that in your case Hive is not a good solution, you'll be betteroff using more powerful MySQL servers.


On 27.09.2011 11:14, Benjamin Fonze wrote:

Dear All,

I'm new to this list, and I hope I'm sending this to the right place.

I'm currently using MySQL to store a large amount of visitor statistics.
(Visits, clicks, etc....)

Basically, each visit is logged in a text file, and every 15 minutes, a job
consolidate it into MySQL, into tables that looks like this :

COUNTRY | DATE | USER_AGENT | REFERRER | SEARCH | ... | NUM_HITS

This generates million of rows a month, and several GB of data. Then, when
querying these tables, it would typically take a few seconds. (Yes, there
are indexes, etc...)

I was thinking to move all that data to a noSQL DB like Hive, but I want to
make sure it is adapted to my purpose. Can you confirm that Hive is a good
fit for such statistical data. More importantly, can you confirm that ad-hoc
queries on that data will be much faster that MySQL?

Thanks in advance!

Benjamin.

Re: Hive for large statistics tables?

Reply via email to