Thanks for all the responses.
It looks like none of the BeautifulSoup objects have __del__ methods, so I
don't think that can be the problem.
To answer your other question, guppy was the best match I came up with when
looking for a memory profile for Python (or more specifically "Heapy"):
http
I am writing a screen scraping application using BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/
(which is fantastic, by the way).
I have an object that has two methods, each of which loads an HTML document and
scrapes out some information, putting strings from the HTML documents i
This may be an algorithmic question, but I'm trying to code it in
Python, so...
I have a list of pairwise regions, each with an integer start and end
and a float data point. There may be overlaps between the regions. I
want to resolve this into an ordered list with no overlapping
regions.
My init
>
> rows = fh.read().split()
> coords = numpy.array(map(int, rows[1::3]), dtype=int)
> points = numpy.array(map(float, rows[2::3]), dtype=float)
> chromio.writelines(map(chrommap.__getitem__, rows[::3]))
>
My original version is about 15 seconds. This version is about 9. The
chunks version posted
I'm trying to get to grips with the multiprocessing module, having
only used ParallelPython before.
based on this example:
http://docs.python.org/library/multiprocessing.html#using-a-pool-of-workers
what happens if I want my "f" to take more than one argument? I want
to have a list of tuples of
Thanks for your replies. Many apologies for not including the right
information first time around. More information is below.
I have tried running it just on the csv read:
import time
import csv
afile = "largefile.txt"
t0 = time.clock()
print "working at file", afile
reader = csv.reader(open(a
I'm using the CSV library to process a large amount of data - 28
files, each of 130MB. Just reading in the data from one file and
filing it into very simple data structures (numpy arrays and a
cstringio) takes around 10 seconds. If I just slurp one file into a
string, it only takes about a second,
I have a mod_python application that takes a POST file upload from a
form. It works fine from my machine, other machines in my office and
my home machine. It does not work from my bosses machine in a
different city - he gets "You don't have permission to access this on
this server".
In the logs, i
I'm filing 160 million data points into a set of bins based on their
position. At the moment, this takes just over an hour using interval
trees. I would like to parallelise this to take advantage of my quad
core machine. I have some experience of Parallel Python, but PP seems
to only really work fo
In the end, I used a cStringIO object to store the chromosomes -
because there are only 23, I can use one character for each chromosome
and represent the whole lot with a giant string and a dictionary to
say what each character means. Then I used numpy arrays for the data
and coordinates. This sque
Thanks for all the replies.
First of all, can anybody recommend a good way to show memory usage? I
tried heapy, but couldn't make much sense of the output and it didn't
seem to change too much for different usages. Maybe I was just making
the h.heap() call in the wrong place. I also tried getrusag
I'm reading in some rather large files (28 files each of 130MB). Each
file is a genome coordinate (chromosome (string) and position (int))
and a data point (float). I want to read these into a list of
coordinates (each a tuple of (chromosome, position)) and a list of
data points.
This has taught m
On 9 Feb, 12:24, Gerhard Häring wrote:
> http://objectmix.com/python/631346-parallel-python.html
>
Hmm. In fact, this doesn't seem to work for pp. When I run the code
below, it says everything is running on the one core.
import pp
import random
import time
from string import lowercase
ncpus = 3
I'm building a pipeline involving a number of shell tools. In each
case, I create a temporary file using tempfile.mkstmp() and invoke a
command ("cmd < /tmp/tmpfile") on it using subprocess.Popen.
At the end of each section, I call close() on the file handles and use
os.remove() to delete them. Ev
On 9 Feb, 12:24, Gerhard Häring wrote:
> Looks like I have answered a similar question once, btw. ;-)
>
Ah, yes - thanks. I did Google for it, but obviously didn't have the
right search term.
Cheers,
Peter
--
http://mail.python.org/mailman/listinfo/python-list
Is there some way I can get at this information at run-time? I'd like
to use it to tag diagnostic output dumped during runs using Parallel
Python.
Peter
--
http://mail.python.org/mailman/listinfo/python-list
On 12 Jan, 15:33, mk wrote:
>
> Better use communicate() method:
>
Oh yes - it's right there in the documentation. That worked perfectly.
Many thanks,
Peter
--
http://mail.python.org/mailman/listinfo/python-list
I'm building a bioinformatics application using the ipcress tool:
http://www.ebi.ac.uk/~guy/exonerate/ipcress.man.html
I'm using subprocess.Popen to execute ipcress, which takes a group of
files full of DNA sequences and returns some analysis on them. Here's
a code fragment:
cmd = "/usr/bin/ipcr
On 6 Jan, 23:31, Graham Dumpleton wrote:
> Thus, any changes to modules/packages installed on sys.path require a
> full restart of Apache to ensure they are loaded by all Apache child
> worker processes.
>
That will be it. I'm pulling in some libraries of my own from
elsewhere, which are still b
Maybe this is an apache question, in which case apologies.
I am running mod_python 3.3.1-3 on apache 2.2.9-7. It works fine, but
I find that when I alter a source file during development, it
sometimes takes 5 seconds or so for the changes to be seen. This might
sound trivial, but when debugging te
On 17 Dec, 20:33, "Chris Rebert" wrote:
> superclass = TraceablePointSet if tracing else PointSet
>
Perfect - many thanks. Good to know I'm absolved from evil, also ;)
Peter
--
http://mail.python.org/mailman/listinfo/python-list
This might be a pure OO question, but I'm doing it in Python so I'll
ask here.
I'm writing a number crunching bioinformatics application. Read lots
of numbers from files; merge, median and munge; draw plots. I've found
that the most critical part of this work is validation and
traceability - "wher
22 matches
Mail list logo