Re: How to remove subset from a file efficiently?

Bengt Richter Sat, 14 Jan 2006 14:26:34 -0800

On 13 Jan 2006 23:17:05 -0800, [EMAIL PROTECTED] wrote:

>
>fynali wrote:
>> $ cat cleanup_ray.py
>>     #!/usr/bin/python
>>     import itertools
>>
>>     b = set(file('/home/sajid/python/wip/stc/2/CBR0000333'))
>>
>> file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP0000333')))
>>
>>     --
>>     $ time ./cleanup_ray.py
>>
>>     real    0m5.451s
>>     user    0m4.496s
>>     sys     0m0.428s
>>
>> (-: Damn!  That saves a bit more time!  Bravo!
>>
>> Thanks to you Raymond.
>Have you tried the explicit loop variant with psyco ? My experience is
>that psyco is pretty good at optimizing for loop which usually results
>in faster code than even built-in map/filter variant.
>
>Though it would just be 1 or 2 sec difference(given what you already
>have) so may not be important but could be fun.
>
OTOH, when you are dealing with large files and near-optimal simple processing 
you are
likely to be comparing i/o-bound processes, meaning differences observed
will be symptoms of os and file system performance more than of the algorithms.


An exception is when a slight variation in algorithm can cause a large change
in i/o performance, such as if it causes physical seek and read patterns of disk
access that the OS/file_system and disk interface hardware can't entirely 
optimize out
with smart buffering etc. Not to mention possible interactions with all the 
other things
an OS may be doing "simultaneously" switching between things that it accounts 
for as real/user/sys.

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to remove subset from a file efficiently?

Reply via email to