Hi Xiao!

On Thursday 03 Dec 2009 14:19:10 Orchid Fairy (兰花仙子) wrote:
> Thanks all.
> How about the files parsing with huge size (about 1T of each day)?
> 
> The basic logic is:
> 
> reach each line of every file (many files, each is gziped)
> look for special info (like IP, request url, session_id, datetime etc).
> count for them and write the result into a database
> generate the daily report and monthly report
> 
> I'm afraid perl can't finish the daily job so I want to know the speed
> difference between perl and C for this case.
> 

OK, I assume you've tested the Perl code. If not - try it, because writing it 
in Perl would take much less time than writing it in C and would also serve as 
a useful prototype.

I can imagine that Perl would be unable to handle such load. However, with 1 
TB of gzipped data, it's very possible your problem is bound by the I/O and 
gunzip-ing constraints. If so, you may need to throw better iron at the 
problem. I know many insurance companies / banks / etc. are still using IBM 
mainframe machines (zSeries, iSeries, etc.) because they have really good I/O 
which can not be easily worked around using commodity PC hardware. (I'm not 
suggesting you buy something in that excess, but you may have to buy better 
hardware of a similar form.)

That put aside, if you still want to try writing the C or C++ program, then I 
suggest looking at some of the following abstraction libraries:

http://www.shlomifish.org/open-source/portability-libs/

They are helpful and provides similar APIs to the Perl built-ins and also some 
CPAN APIs, and allow you to write Perl-like code in C, without having to 
implement the lower-level details yourself (while still being more wordy, 
verbose and with a less idiomatic syntax than with Perl, but that is expected 
of C and C++). Due to the fact they are generic and written from a general 
purpose in mind, they may incur a small run-time overhead, but I doubt it will 
break you in most cases.

Regards,

        Shlomi Fish

> // Xiao lan
> 

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Best Introductory Programming Language - http://shlom.in/intro-lang

Chuck Norris read the entire English Wikipedia in 24 hours. Twice.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to