Re: [algogeeks] Maintaining frequency of millions of user

Rohit Saraf Wed, 09 Dec 2009 03:32:02 -0800

@mayur: surely it would not be just for finding frequency. It's a server
which may have to do insert/delete operation.


A web server will have lot's of things to do and this is just a tiny part.
I guess WORM is not apt for this...
In general, a web server will have to maintain a queue or something similar.

Rohit Saraf
Sophomore
Computer Science and Engineering
IIT Bombay


On Wed, Dec 9, 2009 at 4:49 PM, Mayur <[email protected]> wrote:

> We have a server hit by millions of users. Sever log files contains
> the user ids of all of them. How do we find the frequency of login of
> each user. What will the most efficient way to store the users, and
> access them to find their frequency(The log files are very huge!!)
>
> I thought of using B+ tree indexing with user ids as the key. Leaf
> nodes will have the pointers to bucket of user ids. One item of bucket
> will contain user id and frequency of this user.
>   For insertion, search complexity will be ~O(logn)
>
> Any potential problem with approach? Are there any better approach to
> tackle this problem?
>
> Not problems really, but opportunities. Here're a few things:
> 1. They're append-only (no inserts, no deletes). Why then have a B+ tree on
> the log file itself, which is a structure optimized for both retrieval and
> inserts?
> 2. If all that you want to query is the frequency of login of each user,
> one might ask the following questions:
>     i) How often do you want this frequency data? Is the data required to
> be updated in real-time?
>   ii) Is it okay to query this data each time your log's rotated?
> If you require the data as and when it's created (meaning as soon as it is
> logged), it may be a good idea to change the application receiving the call
> to explicitly update the frequency, in say, a database.
>
> On the other hand, if you want it periodically, you can simply run a
> script/program that takes the "old" size of the log file as input, and
> begins scanning the log from that offset, thus saving you time.
> The frequency data itself can be stored in a hash file / b+ tree or
> whatever else you'd like (so that it can be updated).
>
> If you still wish to index your log files, search for Write-Once Read-Many
> (WORM) data structures on the web.
>
> Apologies for the non-algorithmic nature of my response, if it isn't apt
> for the group.
>
> On Wed, Dec 9, 2009 at 3:54 PM, Rohit Saraf 
> <[email protected]>wrote:
>
>> if you don't know abt fibonacci heaps then check out
>> http://en.wikipedia.org/wiki/Fibonacci_heap
>>
>>
>>
>>
>>
>> Rohit Saraf
>> Sophomore
>> Computer Science and Engineering
>> IIT Bombay
>>
>>
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Algorithm Geeks" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<algogeeks%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/algogeeks?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<algogeeks%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>

--

You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Maintaining frequency of millions of user

Reply via email to