How to remove subset from a file efficiently?

2006-01-12 Thread fynali
Hi all,

I have two files:

  - PSP320.dat (quite a large list of mobile numbers),
  - CBR319.dat (a subset of the above, a list of barred bumbers)

# head PSP320.dat CBR319.dat
==> PSP320.dat <==
96653696338
96653766996
96654609431
96654722608
96654738074
96655697044
96655824738
96656190117
96656256762
96656263751

==> CBR319.dat <==
96651131135
96651131135
96651420412
96651730095
96652399117
96652399142
96652399142
96652399142
96652399160
96652399271

Objective: to remove the numbers present in barred-list from the
PSPfile.

$ ls -lh PSP320.dat CBR319.dat
...  56M Dec 28 19:41 PSP320.dat
... 8.6M Dec 28 19:40 CBR319.dat

$ wc -l PSP320.dat CBR319.dat
 4,462,603 PSP320.dat
   693,585 CBR319.dat

I wrote the following in python to do it:

#: c01:rmcommon.py
barredlist = open(r'/home/sjd/python/wip/CBR319.dat', 'r')
postlist = open(r'/home/sjd/python/wip/PSP320.dat', 'r')
outfile = open(r'/home/sjd/python/wip/PSP-CBR.dat', 'w')

# reading it all in one go, so as to avoid frequent disk accesses
(assume machine has plenty memory)
barredlist.read()
postlist.read()

#
for number in postlist:
if number in barrlist:
pass
else:
outfile.write(number)

barredlist.close(); postlist.close(); outfile.close()
#:~

The above code simply takes too long to complete.  If I were to do a
diff -y PSP320.dat CBR319.dat, catch the '<' & clean it up with
sed -e 's/\([0-9]*\) * PSP-CBR.dat it takes <4 minutes to
complete.

I wrote the following in bash to do the same:

#!/bin/bash

ARGS=2

if [ $# -ne $ARGS ] # takes two arguments
then
echo; echo "Usage: `basename $0` {PSPfile} {CBRfile}"
echo; echo "eg.: `basename $0` PSP320.dat
CBR319.dat"; echo;
echo "NOTE: first argument: PSP file, second: CBR file";
echo "  this script _does_ no_ input validation!"
exit 1
fi;

# fix prefix; cost: 12.587 secs
cat $1 | sed -e 's/^0*/966/' > $1.good
cat $2 | sed -e 's/^0*/966/' > $2.good

# sort/save files; for the 4,462,603 lines, cost: 36.589 secs
sort $1.good > $1.sorted
sort $2.good > $2.sorted

# diff -y {PSP} {CBR}, grab the ones in PSPfile; cost: 31.817 secs
diff -y $1.sorted $2.sorted | grep "<" > $1.filtered

 # remove trailing junk [spaces & <]; cost: 1 min 3 secs
cat $1.filtered | sed -e 's/\([0-9]*\) * $1.cleaned

# remove intermediate files, good, sorted, filtered
 rm -f *.good *.sorted *.filtered
#:~

...but strangely though, there's a discrepancy, the reason for which I
can't figure out!

Needless to say, I'm utterly new to python and my programming skills &
know-how are rudimentary.

Any help will be genuinely appreciated.

--
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
The code it down to 5 lines!

#!/usr/bin/python

barred = set(open('/home/sajid/python/wip/CBR319.dat'))

postpaid_file = open('/home/sajid/python/wip/PSP320.dat')
outfile = open('/home/sajid/python/wip/PSP-CBR.dat', 'w')

outfile.writelines(number for number in postpaid_file if number not
in barred)

postpaid_file.close(); outfile.close()

Awesome! (-:  Thanks a ton Fredrik, Steve.

$ time ./cleanup.py

real0m11.048s
user0m5.232s
sys 0m0.584s

But there seem to be that discrepancy; will chk and update back here.

Thank you all once again.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
$ time fgrep -x -v -f CBR333 PSP333 > PSP-CBR.dat.fgrep

real0m31.551s
user0m16.841s
sys 0m0.912s

--
$ time ./cleanup.py

real0m6.080s
user0m4.836s
sys 0m0.408s

--
$ wc -l PSP-CBR.dat.fgrep PSP-CBR.dat.python
  3872421 PSP-CBR.dat.fgrep
  3872421 PSP-CBR.dat.python

Fantastic, at any rate the time is down from my initial ~4 min.!

Thank you Chris.  The fgrep approach is clean and to the point; and one
more reason to love the *nix approach to handling everyday problems.

Fredrik's set|dict approach in Python above gives me one more reason to
love Python.  And it is indeed fast, 5x!

Thank you all for all your help.

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
$ cat cleanup_ray.py
#!/usr/bin/python
import itertools

b = set(file('/home/sajid/python/wip/stc/2/CBR333'))

file('PSP-CBR.dat,ray','w').writelines(itertools.ifilterfalse(b.__contains__,file('/home/sajid/python/wip/stc/2/PSP333')))

--
$ time ./cleanup_ray.py

real0m5.451s
user0m4.496s
sys 0m0.428s

(-: Damn!  That saves a bit more time!  Bravo!

Thanks to you Raymond.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
--
$ ./cleanup.py
Traceback (most recent call last):
  File "./cleanup.py", line 3, in ?
import itertools
ImportError: No module named itertools

--
$ time ./cleanup.py
  File "./cleanup.py", line 8
outfile.writelines(number for number in postpaid_file if number
not in barred)
^
SyntaxError: invalid syntax

The earlier results I posted were run on my workstation which has
Python 2.4.1,

$ uname -a && python -V
Linux sajid 2.6.13-15.7-smp #1 SMP
Tue Nov 29 14:32:29 UTC 2005 i686 i686 i386 GNU/Linux
Python 2.4.1

but the server on which the actual processing will be done has an older
version )-:

$ uname -a && python -V
Linux cactus 2.4.21-20.ELsmp #1 SMP
Wed Aug 18 20:46:40 EDT 2004 i686 i686 i386 GNU/Linux
Python 2.2.3

Is a rewrite possible of Raymond's or Fredrik's suggestions above which
will still give me the time saving made?

--
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-13 Thread fynali
[bonono]
> Have you tried the explicit loop variant with psyco ?

Sure I wouldn't mind trying; can you suggest some code snippets along
the lines of which I should try...?

    [fynali]
> Needless to say, I'm utterly new to python and my programming
> skills & know-how are rudimentary.

(-:

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup.py
#!/usr/bin/python

postpaid_file = open('/home/oracle/stc/test/PSP333')
outfile = open('/home/oracle/stc/test/PSP-CBR.dat', 'w')

barred = {}

for number in open('/home/oracle/stc/test/CBR333'):
barred[number] = None # just add it as a key

outfile.writelines([number for number in postpaid_file if number
not in barred])

postpaid_file.close(); outfile.close()

--
$ time ./cleanup.py

real0m31.007s
user0m24.660s
sys 0m3.550s

Can we say that using generators & newer Python _is_ faster?

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python

import psyco
psyco.full()

postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')

barred = {}

for number in open('/home/sajid/python/wip/stc/2/CBR333'):
barred[number] = None # just add it as a key

outfile.writelines([number for number in postpaid_file if number
not in barred])

postpaid_file.close(); outfile.close()

--
$ time ./cleanup_use_psyco_and_list_compr.py

real0m39.638s
user0m5.532s
sys 0m0.868s

This was run on my machine (w/ Python 2.4.1), can't install psyco on
the actual server at the moment.

I guess using generators & newer Python is indeed faster|better.

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python

import psyco
psyco.full()

postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')

barred = {}

for number in open('/home/sajid/python/wip/stc/2/CBR333'):
barred[number] = None # just add it as a key

for number in postpaid_file:
if number not in barred: outfile.writelines(number)

postpaid_file.close(); outfile.close()

--
$ time ./cleanup_use_psyco_and_list_compr.py

real0m24.293s
user0m22.633s
    sys 0m0.524s

Saves ~6 secs.

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
Sorry, pls read that ~15 secs.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to remove subset from a file efficiently?

2006-01-14 Thread fynali
$ cat cleanup_use_psyco_and_list_compr.py
#!/usr/bin/python

#import psyco
#psyco.full()

postpaid_file = open('/home/sajid/python/wip/stc/2/PSP333')
outfile = open('/home/sajid/python/wip/stc/2/PSP-CBR.dat.psyco',
'w')

barred = {}

for number in open('/home/sajid/python/wip/stc/2/CBR333'):
barred[number] = None # just add it as a key

for number in postpaid_file:
if number not in barred: outfile.writelines(number)

postpaid_file.close(); outfile.close()

--
$ time ./cleanup_use_psyco_and_list_compr.py

real0m22.587s
user0m21.653s
sys 0m0.440s

Not using psyco is faster!

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how do "real" python programmers work?

2006-01-15 Thread fynali
Love it.

-- 
fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fredrik Lundh [was "Re: explicit self revisited"]

2006-11-13 Thread fynali
> You idiot.  Putting the word "official" in front of something doesn't
> mean it can't be FUD.  Especially when it is written by people such as
> yourself.  Have you not paid attention to anything happening in
> politics around the world during your lifetime?

Ridiculous boo-llshit!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: 2**2**2**2**2 wrong? Bug?

2007-07-10 Thread fynali

>
> 19729
>
> Did you count the 'L'?
>

(-:

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How do I remotely access Scheduled Tasks from Windows XP to Windows Server 2003?

2007-07-10 Thread fynali
On Jul 10, 4:51 am, kj7ny <[EMAIL PROTECTED]> wrote:
> On Jun 30, 10:55 am, "Roger Upole" <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
> > "kj7ny" wrote:
> > > How can I access and manipulateScheduledTasksin Windows using
> > > Python?
>
> > > I have a Windows XP workstation running Python 2.4.4 using the
> > > win32all modules to control the windows services on multiple Windows
> > > 2003 servers.  It works great.
>
> > > However, I also need to remotely collect the settings for the
> > >scheduledtasks(on those same Windows 2003 servers) and then
> > > manipulate those task settings.
>
> > > At the very least, I need to find out which ones are enabled and then
> > > be able to disable and re-enable thosetasksat will.  It would be
> > > better to be able to also detect the account each task runs as so that
> > > I could only disable selectedtasks, but I'll any help I can get.
>
> > > Thanks,
>
> > Pywin32 comes with a module that lets you do this, win32com.taskscheduler.
> > You can use PyITaskScheduler.SetTargetComputer to accesstaskson remote
> > machines.
>
> >Roger
>
> > == Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet 
> > News==http://www.newsfeeds.comThe#1 Newsgroup Service in the World! 
> > >100,000 Newsgroups
> > ---= East/West-Coast Server Farms - Total Privacy via Encryption =---
>
> I FINALLY found taskscheduler (with the help of your post).  I found
> it under
>
> ...\Python243\Lib\site-packages\win32comext\taskscheduler
>
> ... and, there seems to be a /test/ directory with some examples.
> Haven't tried them yet, but they should get me started.
>
> Thanks,- Hide quoted text -
>
> - Show quoted text -

kj7ny, could you post back here to learn from?

Thanks.

s|a fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: win32com ppt embedded object

2007-07-10 Thread fynali
On Jul 10, 8:40 pm, Lance Hoffmeyer <[EMAIL PROTECTED]> wrote:
> Hey all,
>
> I am trying to create some python code to edit embedded ppt slides and need 
> some help.
>
> import win32com.client
> from win32com.client import constants
> import re
> import codecs,win32com.client
> import time
> import datetime
> import win32com.client.dynamic
> ##
>  VARIOUS VARIABLES TO SET 
> path = "C:\temp/"
> ##
> ##
>
> PPT=win32com.client.Dispatch("PowerPoint.Application")
> WB=PPT.Presentations.Open(path + "File.ppt")
> PPT.Visible=1
> PPTSLIDE= 29
>
> for Z in WB.Slides(29).Shapes:
> if (Z.Type== 7):
> ZZ=Z.OLEFormat.Object
> WSHEET = ZZ.Worksheets(1)
> WSHEET.Range("A1").Value = .50
> WSHEET.Range("A1").NumberFormat="0%"
>
> Gives error:
>
> Traceback (most recent call last):
>   File "P:\Burke\TRACKERS\Ortho-McNeil\04 2007, 04-10 WAVE 
> 4\Automation\Document1.py", line 23, in ?
> WSHEET = ZZ.Worksheets(1)
>   File "C:\Program 
> Files\Python\lib\site-packages\win32com\client\dynamic.py", line 489, in 
> __getattr__
> raise AttributeError, "%s.%s" % (self._username_, attr)
> AttributeError: .Worksheets
>
> Tool completed with exit code 1
>
> Why is ZZ unknown and how to I correct this?
>
> Thanks in advance,
>
> Lance


"""
How do I know which methods and properties are available?
Good question. This is hard! You need to use the documentation with
the products, or possibly a COM browser. Note however that COM
browsers typically rely on these objects registering themselves in
certain ways, and many objects to not do this. You are just expected
to know.

The Python COM browser
PythonCOM comes with a basic COM browser that may show you the
information you need. Note that this package requires Pythonwin (ie,
the MFC GUI environment) to be installed for this to work.

There are far better COM browsers available - I tend to use the one
that comes with MSVC, or this one!

To run the browser, simply select it from the Pythonwin Tools menu, or
double-click on the file win32com\client\combrowse.py

"""

--
s|a fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


How to programmatically insert pages into MDI.

2007-07-24 Thread fynali iladijas
Hi, this query is regarding automating page insertions in Microsoft
Document Imaging.

I have two sets of MDIs generated fortnightly: Invoices and their
corresponding Broadcast Certificates; about 150 of each.

My billing application can generate one big MDI with all 150 invoices
and another with all the Broadcast Certificates.  At the moment I take
the pages of, say invoice #1 and insert them into a new MDI (calling
it in001.mdi).  Then I grab the corresponding Broadcast-Certificate-
pages of invoice #1 and insert them into in001.mdi.  I proceed to
complete the rest the same way (quite a pain).

Once done, I print each inx.mdi, setting various printer options
such as binding direction, staple & hole-punch etc.

What I would like to automate is the coupling of an Invoice & its
corresponding Broadcast-Certificate-pages into a new appropriately
named MDI & then printing it into one step; iterating over all the
x until done.

My billing app can be set to generate each invoice & broadcast
certificate separately with a convenient naming convention to aid in
program logic (for eg. inx.mdi & bcx.mdi) where x
indicates corresponding invoice & respective broadcast certificates.

All help and advice will be most appreciated.

Thank you.

s|a fynali

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to programmatically insert pages into MDI.

2007-07-28 Thread fynali iladijas
On Jul 24, 4:36 pm, fynali iladijas <[EMAIL PROTECTED]> wrote:
> Hi, this query is regarding automating page insertions in Microsoft
> Document Imaging.
>
> I have two sets of MDIs generated fortnightly: Invoices and their
> corresponding Broadcast Certificates; about 150 of each.
>
> My billing application can generate one big MDI with all 150 invoices
> and another with all the Broadcast Certificates.  At the moment I take
> the pages of, say invoice #1 and insert them into a new MDI (calling
> it in001.mdi).  Then I grab the corresponding Broadcast-Certificate-
> pages of invoice #1 and insert them into in001.mdi.  I proceed to
> complete the rest the same way (quite a pain).
>
> Once done, I print each inx.mdi, setting various printer options
> such as binding direction, staple & hole-punch etc.
>
> What I would like to automate is the coupling of an Invoice & its
> corresponding Broadcast-Certificate-pages into a new appropriately
> named MDI & then printing it into one step; iterating over all the
> x until done.
>
> My billing app can be set to generate each invoice & broadcast
> certificate separately with a convenient naming convention to aid in
> program logic (for eg. inx.mdi & bcx.mdi) where x
> indicates corresponding invoice & respective broadcast certificates.
>
> All help and advice will be most appreciated.
>
> Thank you.
>
> s|a fynali

)-:

--
s|a fynali

-- 
http://mail.python.org/mailman/listinfo/python-list