RE Help

2007-09-21 Thread byte8bits
Not specific to Python, but it will be implemented in it... how do I
compile a RE to catch everything between two know values? Here's what
I've tried (but failed) to accomplish... the knowns here are START and
END:

data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg"
x = re.compile('START.END', re.DOTALL)

x.findall(data)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: RE Help

2007-09-21 Thread byte8bits
> You'll want to use a non-greedy match:
> x = re.compile(r"START(.*?)END", re.DOTALL)
> Otherwise the . will match END as well.

On Sep 21, 3:23 pm, Steve Holden <[EMAIL PROTECTED]> wrote:

> Only if there's a later END in the string, in which case the user's
> requirements will determine whether greedy matching is appropriate.
>
> regards
>   Steve

There will be lots of START END combinations in the data. This is more
accurate:

sfgdfg*START*dfhdgh*END*dfdgh*START*dfhfdgh*END*dfgsdh*START*sdfhfdhj*END*fdghfdj

The RE should extract the data between each couples of START and END.

Thanks!



-- 
http://mail.python.org/mailman/listinfo/python-list


Converting numbers to unicode charaters

2007-09-24 Thread byte8bits
Here's how I'm doing this right now, It's a bit slow. I've just got
the code working. I was wondering if there is a more efficient way of
doing this... simple example from interactive Python:

>>> word = ''
>>> hexs = ['42', '72', '61', '64']
>>> for h in hexs:
...   char = unichr(int(h, 16))
...   word += char
...   print char
...
B
r
a
d
>>> print word
Brad


Each hex_number is two digits. unichr converts that to a character
that I append to previous ints that have been converted to chars. In
this way, I convert a string of hex numbers to ints to letters, to
words.

Perhaps I'm doing it wrong... any tips?

Thanks,
Brad

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Script to extract text from PDF files

2007-09-25 Thread byte8bits
On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote:
> Googling for 'pdf to text python' and following the first link 
> giveshttp://pybrary.net/pyPdf/

Doesn't work that well, I've tried it, you should too... the author
even admits this:

extractText() [#]

Locate all text drawing commands, in the order they are provided
in the content stream, and extract the text. This works well for some
PDF files, but poorly for others, depending on the generator used.
This will be refined in the future. Do not rely on the order of text
coming out of this function, as it will change if this function is
made more sophisticated. - source 
http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: comparing elements of a list with a string

2007-09-25 Thread byte8bits
On Sep 25, 11:39 am, Shriphani <[EMAIL PROTECTED]> wrote:

> If I have a string "fstab", and I want to list out the files in whose names
> the word fstab appears should I go about like this :
>
> def listAllbackups(file):
> list_of_files = os.listdir("/home/shriphani/backupdir")
> for element in list_of_files:
>  if element.find(file) != -1:
>  date = ###
>  time = 
>   return (date, time)

I would do something like this instead:

>>> for root, dirs, files in os.walk('.'):
...   for f in files:
... if 'text' in f:
...   print f
...
gimp-text-tool
gimp-text-tool.presets
text.py~
textwrap.pyc
textwrap.py
...

You can append the output to a list and return that list if you want
to encapsulate this in a function.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Script to extract text from PDF files

2007-09-26 Thread byte8bits
On Sep 25, 10:19 pm, Lawrence D'Oliveiro <[EMAIL PROTECTED]
central.gen.new_zealand> wrote:

> > Doesn't work that well...
>
> This is inherent in the nature of PDF: it's a page-description language, not
> a document-interchange language. Each text-drawing command can put a block
> of text anywhere on the page, so you have no idea, just from parsing the
> PDF content, how to join these blocks up into lines, paragraphs, columns
> etc.

So (I'm not being a wise guy) how does pdftotext do it so well? The
text I can extract from PDFs is extracted as it appears in the doc.
Although there are various ways to insert and encode text in PDFs,
it's also well documented in the PDF specifications (http://
www.adobe.com/devnet/pdf/pdf_reference.html). Going back to
pdftotext... it works well at extracting text from PDF. I'd like a
native Python library that does the same. This can be done. And, it
can be done in Python. I've made a small start, my hope was that
others would be interested in helping, but I can do it on my own
too... it'll just take a lot longer :)

Brad



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Script to extract text from PDF files

2007-09-26 Thread byte8bits
On Sep 26, 4:49 pm, Svenn Are Bjerkem <[EMAIL PROTECTED]>
wrote:

> I have downloaded this package and installed it and found that the
> text-extraction is more or less useless. Looking into the code and
> comparing with the PDF spec show a very early implementation of text
> extraction. Luckily it is possible to overwrite the textextraction
> method in the base class without having to fiddle with the original
> code. I tried to contact the developer to offer some help on
> implementing text extraction, but he didn't answer my emails.
> --
> Svenn

Well, feel free to send any ideas or help to me! It seems simple... Do
a binary read. Find 'stream' and 'endstream' sections.
zlib.decompress() all the streams. Find BT and ET markers (Begin Text
& End Text) and finally locate the parens within those and string the
text together. This works great on 3 out of 10 PDF documents, but my
main issue seems to be the zlib compressed streams. Some of them don't
seem to be FlateDecodeable (although they claim to be) or the header
is somehow incorrect. But, once I get a good stream and decompress it,
things are OK from that point on. Seriously, if you have ideas, please
let me know. I'll be glad to share what I've got so far.

Not many people seem to be interested. I'll stop adding to this
thread... I don't want to beat a dead horse. Anyone interested in
helping, can contact me via emial.

Thanks,

Brad

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unit testing

2007-10-05 Thread byte8bits
On Oct 5, 5:38 am, Craig Howard <[EMAIL PROTECTED]> wrote:
> Brad:
>
> If the program is more than 100 lines or is a critical system, I
> write a unit test. I hate asking myself, "Did I break something?"
> every time I decide to refactor a small section of code. For
> instance, I wrote an alarm system in Python for a water treatment
> plant. If the chlorine, pH, or turbidity are out of spec, an email
> message is sent to the plant operator's pager. Because of the nature
> of the alarm system, extensive field testing was out of the question.
> Unit testing was the only way to ensure it worked without disrupting
> the plant operation.
>
> Craig

Thanks to all for the opinions. Just to clarify, I have nothing
against testing. I like doing it. I catch a lot of bugs! I dislike the
formality of the unittest module. It's unyielding. It makes testing
difficult unless your code is written with testing in mind from the
start.

I maintain old code... code written a long time ago, before unittest
was popular. Getting unittest to work on that is difficult at best. So
we do informal testing ourselfs. The end result is the same... bugs
are squashed before the code is placed into production. Many times, we
find bugs as soon as we write a test!

Thanks again for the advice.

Brad


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Finding Peoples' Names in Files

2007-10-11 Thread byte8bits
On Oct 11, 12:49 pm, Matimus <[EMAIL PROTECTED]> wrote:
> On Oct 11, 9:11 am, brad <[EMAIL PROTECTED]> wrote:
>
>
>
> > [EMAIL PROTECTED] wrote:
> > > However...how can you know it is a name...
>
> > OK, I admitted in my first post that it was a crazy question, but if one
> > could find an answer, one would be onto something. Maybe it's not a 100%
> > answerable question, but I would guess that it is an 80% answerable
> > question... I just don't know how... yet :)
>
> > Besides admitting that it's a crazy question, I should stop and explain
> > how it would be useful to me at least. Is a credit card number itself
> > valuable? I would think not. One can easily re and luhn check for credit
> > card numbers located in files with a great degree of accuracy, but a
> > number without a name is not very useful to me. So, if one could
> > associate names to luhn checked numbers automatically, then one would be
> > onto something. Or at least say, "hey, this file has luhn validated CCs
> > *AND* it seems to have people's names in it as well." Now then, I'd have
> > less to review or perhaps as much as I have now, but I could push the
> > files with numbers and names to the top of the list so that they would
> > be reviewed first.
>
> > Brad
>
> What the hell are you doing? Your post sounds to me like you have a
> huge amount of stolen, or at the very least misapprehended, data. Now
> you want to search it for credit card numbers and names so that you
> can use them.
>
> I am not cool with this! This is a public forum about a programming
> language. What makes you think that anybody in this forum will be cool
> with that. Perhaps you aren't doing anything illegal, but it sure is
> coming off that way. If you are doing something illegal I hope you get
> caught.
>
> At the very least, you might want to clarify why you are looking for
> such capability so that you don't get effectively black-listed (well,
> by me at least).
>
> Matt

Go have a beer and calm down a bit :) It's a legitimate purpose,
although it could (and probably is being used by bad guys right now).
My intent, as you can see from the links below, is to catch it before
the bad guys do.

http://filebox.vt.edu/users/rtilley/public/find_ccns/
http://filebox.vt.edu/users/rtilley/public/find_ssns/

Brad



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Entering username & password automatically using urllib.urlopen

2007-10-14 Thread byte8bits
On Oct 13, 11:41 pm, rodrigo <[EMAIL PROTECTED]> wrote:
> I am trying to retrieve a password protected page using:
>
> get = urllib.urlopen('http://password.protected.url";').read()
>
> While doing this interactively, I'm asked for  the username, then the
> password at the terminal.
> Is there any way to do this non-interactively? To hardcode the user/
> pass into the script so I can get the page automatically?
>
> (This is not a cracking attempt, I am trying to retrieve a page I have
> legitimate access to, just doing it automatically when certain
> conditions are met.)
>
> Thanks,
>
> Rodrigo

The pexpect module works nicely for automating tasks that normally
require user interaction.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python on imac

2007-10-14 Thread byte8bits
On Oct 14, 1:27 am, James Stroud <[EMAIL PROTECTED]> wrote:

> For OS X 10.4, wx has come as part of the stock python install. You may
> want to consider going that route if you develop exclusively for OS
> X--it will keep the size of your distribution down.
>
> James

wx works well on Macs... Linux and Windows too. I second this
suggestion.

-- 
http://mail.python.org/mailman/listinfo/python-list


Understanding tempfile.TemporaryFile

2007-12-27 Thread byte8bits
Wondering if someone would help me to better understand tempfile. I
attempt to create a tempfile, write to it, read it, but it is not
behaving as I expect. Any tips?

>>> x = tempfile.TemporaryFile()
>>> print x
', mode 'w+b' at 0xab364968>
>>> print x.read()

>>> print len(x.read())
0
>>> x.write("1234")
>>> print len(x.read())
0
>>> x.flush()
>>> print len(x.read())
0
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Understanding tempfile.TemporaryFile

2007-12-27 Thread byte8bits
On Dec 27, 10:12 pm, John Machin <[EMAIL PROTECTED]> wrote:

> Check out the seek method.

Ah yes... thank you:

>>> import tempfile
>>> x = tempfile.TemporaryFile()
>>> x.write("test")
>>> print x.read()

>>> x.seek(0)
>>> print x.read()
test


-- 
http://mail.python.org/mailman/listinfo/python-list


pipes python cgi and gnupg

2007-12-28 Thread byte8bits
I think this is more a GnuPG issue than a Python issue, but I wanted
to post it here as well in case others could offer suggestions:

I can do this from a python cgi script from a browser:

os.system("gpg --version > gpg.out")

However, I cannot do this from a browser:

os.system("echo %s | gpg --batch --password-fd 0 -d %s > d.out"
%(pass, filename))

The output file is produced, but it's zero byte. I want the decrypted
file's content, but the pipe seems to mess things up. The script works
fine when executed from command line. The output file is produced as
expected. When executed by a browser, it does not work as expected...
only produces a zero byte output file. Any tips? I've googled a bit
and experimented for a few nights, still no go.

Thanks,
Brad

Here's the entire script:

#!/usr/local/bin/python

import cgi
import cgitb; cgitb.enable()
import os
import tempfile

print "Content-Type: text/html"
print
print "T"
print "H"

form = cgi.FieldStorage()
if not form.has_key("pass"):
   print "Enter password"

filename = "test.gpg"
pass = form.getvalue("pass").strip()
os.system("gpg --version > gpg.out")
os.system("echo %s | gpg --batch --password-fd 0 --decrypt %s > d.out"
%(pass,filename))
-- 
http://mail.python.org/mailman/listinfo/python-list


Python Frontend/GUI for C Program

2008-01-11 Thread byte8bits
I have a C program that works very well. However, being C it has no
GUI. Input and Output are stdin and stdout... works great from a
terminal. Just wondering, has anyone every written a Python GUI for an
existing C program? Any notes or documentation available?

I have experience using wxPython from within Python apps and I like it
a lot for its cross-platform capabilities. I was hoping to use
wxPython for this as well.

Thanks,
Brad
-- 
http://mail.python.org/mailman/listinfo/python-list