parse text files in a directory?

2008-01-01 Thread jo3c
hi everybody
im a newbie in python, i have a question

how do u parse a bunch of text files in a directory?

directory: /dir
files: H20080101.txt ,
H20080102.txt,H20080103.txt,H20080104.txt,H20080105.txt etc..

i already got a python script to read and insert a single text files
into a postgres db.

is there anyway i can do it in a batch, cause i got like 2000 txt
files.

thanks in advance

joe
-- 
http://mail.python.org/mailman/listinfo/python-list


linecache and glob

2008-01-03 Thread jo3c
hi everyone happy new year!
im a newbie to python
i have a question
by using linecache and glob
how do i read a specific line from a file in a batch and then insert
it into database?

because it doesn't work! i can't use glob wildcard with linecache

>>> import linecache
>>> linecache.getline(glob.glob('/etc/*', 4)

doens't work

is there any better methods??? thank you very much in advance

jo3c
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: linecache and glob

2008-01-03 Thread jo3c
i have a 2000 files with header and data
i need to get the date information from the header
then insert it into my database
i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
to get the date on line 4 in the txt file i use
linecache.getline('/mydata/myfile.txt/, 4)

but if i use
linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work

i am running out of ideas

thanks in advance for any help

jo3c
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python's great, in a word

2008-01-07 Thread jo3c
On Jan 7, 9:09 pm, [EMAIL PROTECTED] wrote:
> I'm a Java guy who's been doing Python for a month now and I'm
> convinced that
>
> 1) a multi-paradigm language is inherently better than a mono-paradigm
> language
>
> 2) Python writes like a talented figure skater skates.
>
> Would you Python old-timers try to agree on a word or two that
> completes:
>
> The best thing about Python is ___.
>
> Please, no laundry lists, just a word or two. I'm thinking "fluid" or
> "grace" but I'm not sure I've done enough to choose.

skimpythong!!
-- 
http://mail.python.org/mailman/listinfo/python-list


use fileinput to read a specific line

2008-01-07 Thread jo3c
hi everybody
im a newbie in python
i need to read line 4 from a header file
using linecache will crash my computer due to memory loading, because
i am working on 2000 files each is 8mb

fileinput don't load the file into memory first
how do i use fileinput module to read a specific line from a file?

for line in fileinput.Fileinput('sample.txt')


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: linecache and glob

2008-01-07 Thread jo3c
On Jan 4, 5:25 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> jo3c wrote:
> > i have a 2000 files with header and data
> > i need to get the date information from the header
> > then insert it into my database
> > i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt')
> > to get the date on line 4 in the txt file i use
> > linecache.getline('/mydata/myfile.txt/, 4)
>
> > but if i use
> > linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work
>
> glob.glob returns a list of filenames, so you need to call getline once
> for each file in the list.
>
> but using linecache is absolutely the wrong tool for this; it's designed
> for *repeated* access to arbitrary lines in a file, so it keeps all the
> data in memory.  that is, all the lines, for all 2000 files.
>
> if the files are small, and you want to keep the code short, it's easier
> to just grab the file's content and using indexing on the resulting list:
>
>  for filename in glob.glob('/mydata/*/*/*.txt'):
>  line = list(open(filename))[4-1]
>  ... do something with line ...
>
> (note that line numbers usually start with 1, but Python's list indexing
> starts at 0).
>
> if the files might be large, use something like this instead:
>
>  for filename in glob.glob('/mydata/*/*/*.txt'):
>  f = open(filename)
>  # skip first three lines
>  f.readline(); f.readline(); f.readline()
>  # grab the line we want
>  line = f.readline()
>  ... do something with line ...
>
> 

thank you guys, i did hit a wall using linecache, due to large file
loading into memory.. i think this last solution works well for me
thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: use fileinput to read a specific line

2008-01-07 Thread jo3c
On Jan 8, 2:08 pm, "Russ P." <[EMAIL PROTECTED]> wrote:
> > Given that the OP is talking 2000 files to be processed, I think I'd
> > recommend explicit open() and close() calls to avoid having lots of I/O
> > structures floating around...
>
> Good point. I didn't think of that. It could also be done as follows:
>
> for fileN in files:
>
> lnum = 0 # line number
> input = file(fileN)
>
> for line in input:
> lnum += 1
> if lnum >= 4: break
>
> input.close()
>
> # do something with "line"
>
> Six of one or half a dozen of the other, I suppose.

this is what i did using glob

import glob
for files in glob.glob('/*.txt'):
x = open(files)
x.readline()
x.readline()
x.readline()
y = x.readline()
# do something with y
x.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


windows active directory ldap output encoding

2008-07-08 Thread jo3c
Hi..
Im trying to get some information out of a windows sever 2003 chinese
active directory system
so let's say encoding is probably big5 or utf-8

what im doing is simliar to ldapsearch in shell with my python script
using python ldap module

the result is not the correct encoding..

i've look many places and tried many different encoding on the top of
the script #coding=big5 etc..

below is the wrong encoding output ..   any help will be much
appreciated..

*** ldap://2134.localhost.com:389 - SimpleLDAPObject.set_option ((17,
3),{})
CN=江,OU=2134,DC=localhost,DC=com
{'accountExpires': ['9223372036854775807'],
 'badPasswordTime': ['128566014672343750'],
 'badPwdCount': ['0'],
 'cn': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],
 'codePage': ['0'],
 'company': ['\xe8\x8f\xaf\xe8\x81\xaf\xe7\x94\x9f
\xe7\x89\xa9\xe7\xa7\x91\xe6\x8a\x80'],
 'countryCode': ['0'],
 'department': ['\xe7\x94\x9f\xe7\x89\xa9\xe7\xa7\x91\xe6\x8a
\x80\xe8\x99\x95'],
 'displayName': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],
 'distinguishedName': ['CN=\xe6\xb1\x9f\xe6\x9f\x8f
\xe5\xa3\x95,OU=300\xe7\xa7\x91\xe6\x8a
\x80\xe8\x99\x95,DC=localhost,DC=com'],
 'givenName': ['\xe6\x9f\x8f\xe5\xa3\x95'],
 'homeMDB': ['CN=\xe4\xbf\xa1\xe7\xae\xb1\xe5\x84\xb2\xe5\xad
\x98\xe5\x8d\x80 (MAIL),CN=\xe9\xa0\x90\xe8\xa8\xad\xe5\x84\xb2\xe5\xad
\x98\xe7\xbe\xa4\xe7\xb5\x84,CN=InformationStore,CN=MAIL,CN=Servers,CN=
\xe9\xa0\x90\xe8\xa8\xad\xe7\xb3\xbb\xe7\xb5\xb1\xe7\xae
\xa1\xe7\x90\x86\xe7\xbe\xa4\xe7\xb5\x84,CN=Administrative
Groups,CN=localhost,CN=Microsoft
Exchange,CN=Services,CN=Configuration,DC=localhost,DC=com'],
 'homeMTA': ['CN=Microsoft MTA,CN=MAIL,CN=Servers,CN=
\xe9\xa0\x90\xe8\xa8\xad\xe7\xb3\xbb\xe7\xb5\xb1\xe7\xae
\xa1\xe7\x90\x86\xe7\xbe\xa4\xe7\xb5\x84,CN=Administrative
Groups,CN=localhost,CN=Microsoft
Exchange,CN=Services,CN=Configuration,DC=localhost,DC=com'],
 'instanceType': ['4'],
 'lastLogoff': ['0'],
 'lastLogon': ['128598965066718750'],
 'legacyExchangeDN': ['/o=localhost/ou=ExchangAdmin/cn=Recipients/
cn=joechiang'],
 'logonCount': ['33'],
 'mDBUseDefaults': ['TRUE'],
 'mail': ['[EMAIL PROTECTED]'],
 'mailNickname': ['joechiang'],
 'memberOf': ['CN=AllHQStaff,CN=Users,DC=localhost,DC=com'],
 'msExchALObjectVersion': ['60'],
 'msExchHomeServerName': ['/o=localhost/ou=ExchangAdmin/
cn=Configuration/cn=Servers/cn=MAIL'],
 'msExchMailboxGuid': ['2\x04\x116^\xfc%J\x87yi\xbdj^\x1bl'],
 'msExchMailboxSecurityDescriptor': ['\x01\x00\x04\x80x
\x00\x00\x00\x94\x00\x00\x00\x00\x00\x00\x00\x14\x00\x00\x00\x04\x00d
\x00\x01\x00\x00\x00\x00\x02\x14\x00\x03\x00\x02\x00\x01\x01\x00\x00\x00\x00\x00\x05\n
\x00\x00\x00a\x00n\x00x\x00/\x00C\x00N\x00=\x00C\x00o\x00n\x00f\x00i
\x00g\x00u\x00r\x00a\x00t\x00i\x00o\x00n
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x05\x00\x00\x00\x00\x00\x05\x15\x00\x00\x00\xd6\xb2i
\x1a^\xa7P\xcb\xday
\x88\xa9\xf4\x01\x00\x00\x01\x05\x00\x00\x00\x00\x00\x05\x15\x00\x00\x00\xd6\xb2i
\x1a^\xa7P\xcb\xday\x88\xa9\xf4\x01\x00\x00'],
 'msExchPoliciesIncluded': ['{C96E41C5-C5D5-411B-8672-1A3B6602437F},
{3B6813EC-CE89-42BA-9442-D87D4AA30DBC}',
'{C96E41C5-C5D5-411B-8672-1A3B6602437F},
{26491CFC-9E50-4857-861B-0CB8DF22B5D7}'],
 'msExchUserAccountControl': ['0'],
 'name': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],
 'objectCategory':
['CN=Person,CN=Schema,CN=Configuration,DC=localhost,DC=com'],
 'objectClass': ['top', 'person', 'organizationalPerson', 'user'],
 'objectGUID': ['\x13\xfa\xc2\xbb\x9e\xee|C\x9d\xa8_\xea]\xef
\xc6\x90'],
 'objectSid':
['\x01\x05\x00\x00\x00\x00\x00\x05\x15\x00\x00\x00\xd6\xb2i\x1a^\xa7P
\xcb\xday\x88\xa9u\x0c\x00\x00'],
 'primaryGroupID': ['513'],
 'proxyAddresses':
['X400:c=TW;a= ;p=localhost;o=Exchange;s=joechiang;',
'SMTP:[EMAIL PROTECTED]'],
 'pwdLastSet': ['128587670396562500'],
 'sAMAccountName': ['joechiang'],
 'sAMAccountType': ['805306368'],
 'showInAddressBook': ['CN=\xe5\x85\xa8\xe5\x9f\x9f\xe9\x80\x9a
\xe8\xa8\x8a\xe6\xb8\x85\xe5\x96\xae,CN=All Global Address
Lists,CN=Address Lists Container,CN=localhost,CN=Microsoft
Exchange,CN=Services,CN=Configuration,DC=localhost,DC=com',
   'CN=\xe7\x87\x9f\xe9\x81\x8b\xe8\x99\x95,CN=All
Address Lists,CN=Address Lists Container,CN=localhost,CN=Microsoft
Exchange,CN=Services,CN=Configuration,DC=localhost,DC=com',
   'CN=\xe7\x94\x9f\xe7\x89\xa9\xe7\xa7\x91\xe6\x8a
\x80\xe8\x99\x95,CN=All Address Lists,CN=Address Lists
Container,CN=localhost,CN=Microsoft
Exchange,CN=Services,CN=Configuration,DC=localhost,DC=com',
   'CN=\xe6\x89\x80\xe6\x9c\x89\xe4\xbd\xbf
\xe7\x94\xa8\xe8\x80\x85,CN=All Address Lists,CN=Address Lists
Container,CN=localhost,CN=Microsoft
Exchange,CN=Services,CN=Configuration,DC=localhost,DC=com'],
 'sn': ['\xe6\xb1\x9f'],
 'textEncodedORAddress':
['c=TW;a= ;p=localhost;o=Exchange;s=joechiang;'],
 'uSNChanged': ['22943844'],
 'uSNCreated': ['2