----- Original Message -----
> From: "Manivannan S." <manivanna...@spanservices.com>
> 
> Hi all,
> 
> [lots of data]
> [slow reports]
> [wooo NoSQL magic]

Not that I want to discourage you, but my standard first question is "why do 
you think NoSQL (let alone any specific product) is the right solution?" :-)

Don't get me wrong, it might be; but from what little I now know about your 
environment, it sounds like applying some data warehousing techniques might 
suffice - and being the cynical dinosaur that I am, I have a healthy reluctance 
about welding new technology onto a stable environment.

To speed up reporting (and note that these techniques are often applied even 
when implementing NoSQL solutions, too) it is usually a good first step to set 
up a process of data summarization.

Basically, you pre-calculate averages, medians, groupings, whatever you need 
for your reports; and your job also saves the last record IDs it's processed; 
then on the next run, you only read the new records and update your summary 
tables to incorporate the new data.

Suppose I have a table like this:

ID | Val
--------
 1     1
 2     7
 3     5
 4    13

I want to report the average on a daily basis, and calculating that over those 
rows is unbearably slow because I'm running the process on a wristwatch from 
1860 :-)

So I get a summary table, calculate (1+7+5+13)/4 = 6.5 and that then gets a 
record saying this:

Avg | elementCount | lastSeen
-----------------------------
6.5              4          4

Now, over the course of the day, the elements 4, 17 and 2 get added with 
sequential row numbers. Instead of calculating (1+7+5+13+4+17+2)/7, which would 
be slow; I can substitute the already summarized data by Avg*elementCount. 
Thus, I calculate (6.5*4 + 4+17+2)/7 = 7, which is a lot faster, and my summary 
table now looks like this:

Avg | elementCount | lastSeen
-----------------------------
  7              7          7

This is of course a stupid example, but it saves you a lot of time if you 
already have the summary of several thousand elements and only need to update 
it for a handful. Similar tricks are possible for a lot of typical reporting 
stuff - you don't need to re-calculate data for past months over and over 
again, for instance - and that's what makes your reports run fast.


Just my 2 cents :-)
/johan

-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to