RE Help
Not specific to Python, but it will be implemented in it... how do I compile a RE to catch everything between two know values? Here's what I've tried (but failed) to accomplish... the knowns here are START and END: data = "asdfasgSTARTpruyerfghdfjENDhfawrgbqfgsfgsdfg" x = re.compile('START.END', re.DOTALL) x.findall(data) -- http://mail.python.org/mailman/listinfo/python-list
Re: RE Help
> You'll want to use a non-greedy match: > x = re.compile(r"START(.*?)END", re.DOTALL) > Otherwise the . will match END as well. On Sep 21, 3:23 pm, Steve Holden <[EMAIL PROTECTED]> wrote: > Only if there's a later END in the string, in which case the user's > requirements will determine whether greedy matching is appropriate. > > regards > Steve There will be lots of START END combinations in the data. This is more accurate: sfgdfg*START*dfhdgh*END*dfdgh*START*dfhfdgh*END*dfgsdh*START*sdfhfdhj*END*fdghfdj The RE should extract the data between each couples of START and END. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Converting numbers to unicode charaters
Here's how I'm doing this right now, It's a bit slow. I've just got the code working. I was wondering if there is a more efficient way of doing this... simple example from interactive Python: >>> word = '' >>> hexs = ['42', '72', '61', '64'] >>> for h in hexs: ... char = unichr(int(h, 16)) ... word += char ... print char ... B r a d >>> print word Brad Each hex_number is two digits. unichr converts that to a character that I append to previous ints that have been converted to chars. In this way, I convert a string of hex numbers to ints to letters, to words. Perhaps I'm doing it wrong... any tips? Thanks, Brad -- http://mail.python.org/mailman/listinfo/python-list
Re: Script to extract text from PDF files
On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote: > Googling for 'pdf to text python' and following the first link > giveshttp://pybrary.net/pyPdf/ Doesn't work that well, I've tried it, you should too... the author even admits this: extractText() [#] Locate all text drawing commands, in the order they are provided in the content stream, and extract the text. This works well for some PDF files, but poorly for others, depending on the generator used. This will be refined in the future. Do not rely on the order of text coming out of this function, as it will change if this function is made more sophisticated. - source http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html -- http://mail.python.org/mailman/listinfo/python-list
Re: comparing elements of a list with a string
On Sep 25, 11:39 am, Shriphani <[EMAIL PROTECTED]> wrote: > If I have a string "fstab", and I want to list out the files in whose names > the word fstab appears should I go about like this : > > def listAllbackups(file): > list_of_files = os.listdir("/home/shriphani/backupdir") > for element in list_of_files: > if element.find(file) != -1: > date = ### > time = > return (date, time) I would do something like this instead: >>> for root, dirs, files in os.walk('.'): ... for f in files: ... if 'text' in f: ... print f ... gimp-text-tool gimp-text-tool.presets text.py~ textwrap.pyc textwrap.py ... You can append the output to a list and return that list if you want to encapsulate this in a function. -- http://mail.python.org/mailman/listinfo/python-list
Re: Script to extract text from PDF files
On Sep 25, 10:19 pm, Lawrence D'Oliveiro <[EMAIL PROTECTED] central.gen.new_zealand> wrote: > > Doesn't work that well... > > This is inherent in the nature of PDF: it's a page-description language, not > a document-interchange language. Each text-drawing command can put a block > of text anywhere on the page, so you have no idea, just from parsing the > PDF content, how to join these blocks up into lines, paragraphs, columns > etc. So (I'm not being a wise guy) how does pdftotext do it so well? The text I can extract from PDFs is extracted as it appears in the doc. Although there are various ways to insert and encode text in PDFs, it's also well documented in the PDF specifications (http:// www.adobe.com/devnet/pdf/pdf_reference.html). Going back to pdftotext... it works well at extracting text from PDF. I'd like a native Python library that does the same. This can be done. And, it can be done in Python. I've made a small start, my hope was that others would be interested in helping, but I can do it on my own too... it'll just take a lot longer :) Brad -- http://mail.python.org/mailman/listinfo/python-list
Re: Script to extract text from PDF files
On Sep 26, 4:49 pm, Svenn Are Bjerkem <[EMAIL PROTECTED]> wrote: > I have downloaded this package and installed it and found that the > text-extraction is more or less useless. Looking into the code and > comparing with the PDF spec show a very early implementation of text > extraction. Luckily it is possible to overwrite the textextraction > method in the base class without having to fiddle with the original > code. I tried to contact the developer to offer some help on > implementing text extraction, but he didn't answer my emails. > -- > Svenn Well, feel free to send any ideas or help to me! It seems simple... Do a binary read. Find 'stream' and 'endstream' sections. zlib.decompress() all the streams. Find BT and ET markers (Begin Text & End Text) and finally locate the parens within those and string the text together. This works great on 3 out of 10 PDF documents, but my main issue seems to be the zlib compressed streams. Some of them don't seem to be FlateDecodeable (although they claim to be) or the header is somehow incorrect. But, once I get a good stream and decompress it, things are OK from that point on. Seriously, if you have ideas, please let me know. I'll be glad to share what I've got so far. Not many people seem to be interested. I'll stop adding to this thread... I don't want to beat a dead horse. Anyone interested in helping, can contact me via emial. Thanks, Brad -- http://mail.python.org/mailman/listinfo/python-list
Re: unit testing
On Oct 5, 5:38 am, Craig Howard <[EMAIL PROTECTED]> wrote: > Brad: > > If the program is more than 100 lines or is a critical system, I > write a unit test. I hate asking myself, "Did I break something?" > every time I decide to refactor a small section of code. For > instance, I wrote an alarm system in Python for a water treatment > plant. If the chlorine, pH, or turbidity are out of spec, an email > message is sent to the plant operator's pager. Because of the nature > of the alarm system, extensive field testing was out of the question. > Unit testing was the only way to ensure it worked without disrupting > the plant operation. > > Craig Thanks to all for the opinions. Just to clarify, I have nothing against testing. I like doing it. I catch a lot of bugs! I dislike the formality of the unittest module. It's unyielding. It makes testing difficult unless your code is written with testing in mind from the start. I maintain old code... code written a long time ago, before unittest was popular. Getting unittest to work on that is difficult at best. So we do informal testing ourselfs. The end result is the same... bugs are squashed before the code is placed into production. Many times, we find bugs as soon as we write a test! Thanks again for the advice. Brad -- http://mail.python.org/mailman/listinfo/python-list
Re: Finding Peoples' Names in Files
On Oct 11, 12:49 pm, Matimus <[EMAIL PROTECTED]> wrote: > On Oct 11, 9:11 am, brad <[EMAIL PROTECTED]> wrote: > > > > > [EMAIL PROTECTED] wrote: > > > However...how can you know it is a name... > > > OK, I admitted in my first post that it was a crazy question, but if one > > could find an answer, one would be onto something. Maybe it's not a 100% > > answerable question, but I would guess that it is an 80% answerable > > question... I just don't know how... yet :) > > > Besides admitting that it's a crazy question, I should stop and explain > > how it would be useful to me at least. Is a credit card number itself > > valuable? I would think not. One can easily re and luhn check for credit > > card numbers located in files with a great degree of accuracy, but a > > number without a name is not very useful to me. So, if one could > > associate names to luhn checked numbers automatically, then one would be > > onto something. Or at least say, "hey, this file has luhn validated CCs > > *AND* it seems to have people's names in it as well." Now then, I'd have > > less to review or perhaps as much as I have now, but I could push the > > files with numbers and names to the top of the list so that they would > > be reviewed first. > > > Brad > > What the hell are you doing? Your post sounds to me like you have a > huge amount of stolen, or at the very least misapprehended, data. Now > you want to search it for credit card numbers and names so that you > can use them. > > I am not cool with this! This is a public forum about a programming > language. What makes you think that anybody in this forum will be cool > with that. Perhaps you aren't doing anything illegal, but it sure is > coming off that way. If you are doing something illegal I hope you get > caught. > > At the very least, you might want to clarify why you are looking for > such capability so that you don't get effectively black-listed (well, > by me at least). > > Matt Go have a beer and calm down a bit :) It's a legitimate purpose, although it could (and probably is being used by bad guys right now). My intent, as you can see from the links below, is to catch it before the bad guys do. http://filebox.vt.edu/users/rtilley/public/find_ccns/ http://filebox.vt.edu/users/rtilley/public/find_ssns/ Brad -- http://mail.python.org/mailman/listinfo/python-list
Re: Entering username & password automatically using urllib.urlopen
On Oct 13, 11:41 pm, rodrigo <[EMAIL PROTECTED]> wrote: > I am trying to retrieve a password protected page using: > > get = urllib.urlopen('http://password.protected.url";').read() > > While doing this interactively, I'm asked for the username, then the > password at the terminal. > Is there any way to do this non-interactively? To hardcode the user/ > pass into the script so I can get the page automatically? > > (This is not a cracking attempt, I am trying to retrieve a page I have > legitimate access to, just doing it automatically when certain > conditions are met.) > > Thanks, > > Rodrigo The pexpect module works nicely for automating tasks that normally require user interaction. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python on imac
On Oct 14, 1:27 am, James Stroud <[EMAIL PROTECTED]> wrote: > For OS X 10.4, wx has come as part of the stock python install. You may > want to consider going that route if you develop exclusively for OS > X--it will keep the size of your distribution down. > > James wx works well on Macs... Linux and Windows too. I second this suggestion. -- http://mail.python.org/mailman/listinfo/python-list
Understanding tempfile.TemporaryFile
Wondering if someone would help me to better understand tempfile. I attempt to create a tempfile, write to it, read it, but it is not behaving as I expect. Any tips? >>> x = tempfile.TemporaryFile() >>> print x ', mode 'w+b' at 0xab364968> >>> print x.read() >>> print len(x.read()) 0 >>> x.write("1234") >>> print len(x.read()) 0 >>> x.flush() >>> print len(x.read()) 0 -- http://mail.python.org/mailman/listinfo/python-list
Re: Understanding tempfile.TemporaryFile
On Dec 27, 10:12 pm, John Machin <[EMAIL PROTECTED]> wrote: > Check out the seek method. Ah yes... thank you: >>> import tempfile >>> x = tempfile.TemporaryFile() >>> x.write("test") >>> print x.read() >>> x.seek(0) >>> print x.read() test -- http://mail.python.org/mailman/listinfo/python-list
pipes python cgi and gnupg
I think this is more a GnuPG issue than a Python issue, but I wanted to post it here as well in case others could offer suggestions: I can do this from a python cgi script from a browser: os.system("gpg --version > gpg.out") However, I cannot do this from a browser: os.system("echo %s | gpg --batch --password-fd 0 -d %s > d.out" %(pass, filename)) The output file is produced, but it's zero byte. I want the decrypted file's content, but the pipe seems to mess things up. The script works fine when executed from command line. The output file is produced as expected. When executed by a browser, it does not work as expected... only produces a zero byte output file. Any tips? I've googled a bit and experimented for a few nights, still no go. Thanks, Brad Here's the entire script: #!/usr/local/bin/python import cgi import cgitb; cgitb.enable() import os import tempfile print "Content-Type: text/html" print print "T" print "H" form = cgi.FieldStorage() if not form.has_key("pass"): print "Enter password" filename = "test.gpg" pass = form.getvalue("pass").strip() os.system("gpg --version > gpg.out") os.system("echo %s | gpg --batch --password-fd 0 --decrypt %s > d.out" %(pass,filename)) -- http://mail.python.org/mailman/listinfo/python-list
Python Frontend/GUI for C Program
I have a C program that works very well. However, being C it has no GUI. Input and Output are stdin and stdout... works great from a terminal. Just wondering, has anyone every written a Python GUI for an existing C program? Any notes or documentation available? I have experience using wxPython from within Python apps and I like it a lot for its cross-platform capabilities. I was hoping to use wxPython for this as well. Thanks, Brad -- http://mail.python.org/mailman/listinfo/python-list