Overwrite single line of file

2008-12-05 Thread chrispoliquin
Hi, I have about 900 text files (about 2 GB of data) and I need to make some very specific changes to the last line of each file. I'm wondering if there is a way to just overwrite the last line of a file or replace the spots I want (I even know the position of the characters I need to replace).

Re: Case tagging and python

2008-07-31 Thread chrispoliquin
I second the idea of just using the islower(), isupper(), and istitle() methods. So, you could have a function - let's call it checkCase() - that returns a string with the tag you want... def checkCase(word): if word.islower(): tag = 'nocap' elif word.isupper(): tag = 'al

find and replace with regular expressions

2008-07-31 Thread chrispoliquin
I am using regular expressions to search a string (always full sentences, maybe more than one sentence) for common abbreviations and remove the periods. I need to break the string into different sentences but split('.') doesn't solve the whole problem because of possible periods in the middle of a

very large graph

2008-06-23 Thread chrispoliquin
I need to represent the hyperlinks between a large number of HTML files as a graph. My non-directed graph will have about 63,000 nodes and and probably close to 500,000 edges. I have looked into igraph (http://cneurocvs.rmki.kfki.hu/igraph/doc/ python/index.html) and networkX (https://networkx.la

Re: urllib (54, 'Connection reset by peer') error

2008-06-18 Thread chrispoliquin
Thanks for the help. The error handling worked to a certain extent but after a while the server does seem to stop responding to my requests. I have a list of about 7,000 links to pages I want to parse the HTML of (it's basically a web crawler) but after a certain number of urlretrieve() or urlope

urllib (54, 'Connection reset by peer') error

2008-06-13 Thread chrispoliquin
Hi, I have a small Python script to fetch some pages from the internet. There are a lot of pages and I am looping through them and then downloading the page using urlretrieve() in the urllib module. The problem is that after 110 pages or so the script sort of hangs and then I get the following tr