@mayur: surely it would not be just for finding frequency. It's a server which may have to do insert/delete operation.
A web server will have lot's of things to do and this is just a tiny part. I guess WORM is not apt for this... In general, a web server will have to maintain a queue or something similar. Rohit Saraf Sophomore Computer Science and Engineering IIT Bombay On Wed, Dec 9, 2009 at 4:49 PM, Mayur <[email protected]> wrote: > We have a server hit by millions of users. Sever log files contains > the user ids of all of them. How do we find the frequency of login of > each user. What will the most efficient way to store the users, and > access them to find their frequency(The log files are very huge!!) > > I thought of using B+ tree indexing with user ids as the key. Leaf > nodes will have the pointers to bucket of user ids. One item of bucket > will contain user id and frequency of this user. > For insertion, search complexity will be ~O(logn) > > Any potential problem with approach? Are there any better approach to > tackle this problem? > > Not problems really, but opportunities. Here're a few things: > 1. They're append-only (no inserts, no deletes). Why then have a B+ tree on > the log file itself, which is a structure optimized for both retrieval and > inserts? > 2. If all that you want to query is the frequency of login of each user, > one might ask the following questions: > i) How often do you want this frequency data? Is the data required to > be updated in real-time? > ii) Is it okay to query this data each time your log's rotated? > If you require the data as and when it's created (meaning as soon as it is > logged), it may be a good idea to change the application receiving the call > to explicitly update the frequency, in say, a database. > > On the other hand, if you want it periodically, you can simply run a > script/program that takes the "old" size of the log file as input, and > begins scanning the log from that offset, thus saving you time. > The frequency data itself can be stored in a hash file / b+ tree or > whatever else you'd like (so that it can be updated). > > If you still wish to index your log files, search for Write-Once Read-Many > (WORM) data structures on the web. > > Apologies for the non-algorithmic nature of my response, if it isn't apt > for the group. > > On Wed, Dec 9, 2009 at 3:54 PM, Rohit Saraf > <[email protected]>wrote: > >> if you don't know abt fibonacci heaps then check out >> http://en.wikipedia.org/wiki/Fibonacci_heap >> >> >> >> >> >> Rohit Saraf >> Sophomore >> Computer Science and Engineering >> IIT Bombay >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Algorithm Geeks" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<algogeeks%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/algogeeks?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<algogeeks%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
