On Thu, Dec 3, 2009 at 1:19 PM, Orchid Fairy (兰花仙子) <practicalp...@gmail.com > wrote:
> Thanks all. > How about the files parsing with huge size (about 1T of each day)? > > The basic logic is: > > reach each line of every file (many files, each is gziped) > look for special info (like IP, request url, session_id, datetime etc). > count for them and write the result into a database > generate the daily report and monthly report > > I'm afraid perl can't finish the daily job so I want to know the speed > difference between perl and C for this case. > > // Xiao lan > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > A daily job that by the sound of it will not be changing a whole lot, jut get executed pretty much till the end of times... C is your friend. Perl would certainly get the job done and on time without to much problems, but if you are worried there isn't much that will out perform C/C++ when it comes to raw speed. If you are not planning on making any changes to the code in the foreseeable future, the extra readability of the Perl code should not really mater as you do not expect to be making changes to it on a regular basis anyway. My biggest worry would not be the 1T (of logs I guess) the code needs to parse now, but in 4 or 5 years from now the likely doubled amount of data. It sounds a lot like you are parsing logs from a web server or well actually quite a few web servers or something along these lines. If the world keeps on spinning the same way it has for the past couple of million years I am willing to bet that the size of the logs you are parsing will increase a lot thus making speed more and more important. (good argument for C right there) Also if you have a lot of files think about using a lot of parsers/threads or how ever you do it. Every single machine tasked to process that amount of data will have more then a single CPU and thus will benefit from using more then one process at a time executing the work regardless of the language in which you do the work. Personally I really don't see a role for Perl in something like this as the size of the data to parse grows there will be a bigger and bigger desire for speed. I would simply do it in C and ignore the ease of Perl for parsing logs. You might save some time in development choosing Perl but you will likely have to redo the work in C at some point as the input data will just be getting bigger and bigger.