mxODBC problems
I have just installed mxODBC on my x86_64 suse linux machine, where I use unixODBC for connection. Running queries from isql or DataManager works fine for the DSN that I am using. However, under mxODBC, I can get a connection object and a cursor object, but all attempts to execute even the simplest selects result in empty resultsets. Any ideas on what might be wrong? >>> from mx.ODBC.unixODBC import * >>> con = connect('Postgresql',user='username',password='passwd') >>> cur = con.cursor() >>> cur.execute('SELECT * FROM g_rif') >>> rs = cur.execute('SELECT * FROM g_rif') >>> rs >>> cur.execute('SELECT * FROM g_rif').fetchall() Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'fetchall' Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
odbc module for python
What are the alternatives for accessing an ODBC source from python (linux 64-bit, python 2.5)? It looks like mxODBC is the only one available? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
pyodbc on linux
I have read a couple of blogs suggesting that pyodbc is buildable under linux. I am running suse 10.2 on a 64-bit intel machine with unixODBC installed. Upon building, I get a slew of pretty horrid looking errors that make me wonder if this is supposed to work. Can anyone at least confirm that this is possible before I try to pursue things further? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: Fast kNN from python
On Aug 14, 6:16 am, Janto Dreijer <[EMAIL PROTECTED]> wrote: > Hi! > > I am looking for a Python implementation or bindings to a library that > can quickly find k-Nearest Neighbors given an arbitrary distance > metric between objects. Specifically, I have an "edit distance" > between objects that is written in Python. > > I haven't looked at the methods in detail but I think I'm looking for > one of the data structures listed onhttp://en.wikipedia.org/wiki/Metric_trees > (i.e. vp-trees, cover trees, m-trees or bk trees). But I might be > wrong. An approximate kNN would also work. > > If there doesn't exist such an implementation yet, any advice on a > library I can wrap myself would also be appreciated. > > Thanks! > Janto Have you looked at using Rpy and R? There are probably several knn implementations that then become accessible to you (although I haven't checked recently). Sean -- http://mail.python.org/mailman/listinfo/python-list
Making a file-like object for manipulating a large file
This should be a relatively simple problem, but I haven't quite got the idea of how to go about it. I have a VERY large file that I would like to load a line at a time, do some manipulations on it, and then make it available to as a file-like object for use as input to a database module (psycopg2) that wants a file-like object (with read and readlines methods). I could write the manipulated file out to disk and then read it back in, but that seems wasteful. So, it seems like I need a buffer, a way to fill the buffer and a way to have read and readlines use the buffer. What I can't do is to load the ENTIRE file into a stringio object, as the file is much too large. Any suggestions? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Deferred jobs server as backend for web application
In the past, I have put together web applications that process tasks serially, either with short algorithms to manipulate user-submitted data or to return database queries. However, now I am faced with the task of having a long-running process being started by a web submission. I want to process an uploaded file. One way to do this is to simply start a process each time someone submits a job and then email when complete. Instead, I would like to have a way of submitting the job to a persistent backend queue that can process the job and answer queries about the status of the job (if it is still running) or return results. I have looked at threadpool, which seems fine if I a want to submit jobs from a single process (such as a Qt application, etc), but it won't work directly for a web platform where I will likely have multiple threads/processes handling http requests. Any suggestions on how to go about this? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Web Ontology Language (OWL) parsing
I would like to parse some OWL files, but I haven't dealt with OWL in python or any other language for that matter. Some quick google searches do not turn up much in the way of possibilities for doing so in python. Any suggestions of available code or using existing libraries for doing so? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Async XMLRPC and job processing
I would like to set up a server that takes XMLRPC requests and processes them asynchronously. The XMLRPC server part is trivial in python. The job processing part is the part that I am having trouble with. I have been looking at how to use threadpool, but I can't see how to get that working. I would like to have the XMLRPC part of things do something like: def method1(a,b,c): jobid=workRequest(long_method1,[a,b,c]) return(jobid) def method2(a,b,c): jobid=workRequest(long_method2,[a,b,c]) return(jobid) def long_method1(a,b,c) do lots of heavy computation, etc. store results in files in a given directory, etc return result for any number of methods Again, pretty straightforward. However, I run into problems with the threadpool and xmlrpc server both waiting. In particular, if I do something like: server = SimpleXMLRPCServer.SimpleXMLRPCServer(.) server.serve_forever() Where can tell the threadpool that I have set up to wait indefinitely? Both are blocking. Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
XML DOM, but in chunks
I have some very large XML files that are basically recordsets. I would like to access each record, one-at-a-time, and I particularly like the ElementTree library for accessing the data. Is there a way to have ElementTree read only one record of the data at a time? Alternatively, are there other ways that would allow one to parse out a record at a time and maintain some nice ways of accessing the elements within the record? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Read from database, write to another database, simultaneously
I am working on a simple script to read from one database (oracle) and write to another (postgresql). I retrieve the data from oracle in chunks and drop the data to postgresql continuously. The author of one of the python database clients mentioned that using one thread to retrieve the data from the oracle database and another to insert the data into postgresql with something like a pipe between the two threads might make sense, keeping both IO streams busy. Any hints on how to get started? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: Read from database, write to another database, simultaneously
On Jan 10, 9:27 pm, johnf <[EMAIL PROTECTED]> wrote: > Bjoern Schliessmann wrote: > > Sean Davis wrote: > > >> The author of one of the python database clients mentioned that > >> using one thread to retrieve the data from the oracle database and > >> another to insert the data into postgresql with something like a > >> pipe between the two threads might make sense, keeping both IO > >> streams busy. > > > IMHO he's wrong. Network interaction is quite slow compared with CPU > > performance, so there's no gain (maybe even overhead due to thread > > management and locking stuff). That's true even on multiprocessor > > machines, not only because there's almost nothing to compute but > > only IO traffic. CMIIW. > > > Using multiplexing, you'll get good results with simple code without > > the danger of deadlocks. Have a look at asyncore (standard library) > > or the Twisted framework -- personally, I prefer the latter. > > > Regards, > > Sean you can't win - everyone has a different idea! You need to explain > that oracle has millions of records and it's possible to a pipe open to > feed the Postgres side. > > One thing I didn't get - is this a one time transfer or something that is > going to happen often. Yes, some detail about the problem is definitely in order! We have a collaborator that is going to maintain a large genome database that is a component of a postgresql database that we currently maintain. There are going to be consumers of the oracle data using both mysql and postgresql. The oracle database is LARGE with around 100,000,000 rows spread over some 50-70 tables in multiple schemas. The idea is that as publicly available data (from various datasources on the web) become available, the oracle team will update the oracle database, doing all the parsing and necessary data cleanup of the raw data. We then want to be able to update postgres with these oracle data. So the process may be done only once per month on some tables, but as often as once a day on others. As for the specifics, Oracle data is going to be coming in as a DB-API 2 cursor in manageable chunks (and at a relatively slow pace). On the postgres loading side, I wanted to use the pscycopg2 copy_from function, which expects an open file-like object (that has read and readline functionality) and is quite fast for loading data. Note the disconnect here--Oracle is coming in in discrete chunks, while postgresql is looking for a file object. I solved this problem by creating a temporary file as an intermediary, but why wait for Oracle to finish dumping data when I can potentially be loading into postgres at the same time that the data is coming in? So, I am actually looking for a solution to this problem that doesn't require an intermediate file and allows simultaneous reading and writing, with the caveat that the data cannot all be read into memory simultaneously, so will need to be buffered. I hope that clarifies things. Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: Read from database, write to another database, simultaneously
On Jan 11, 3:20 am, Laurent Pointal <[EMAIL PROTECTED]> wrote: > Bjoern Schliessmann a écrit : > > > Sean Davis wrote: > > >> The author of one of the python database clients mentioned that > >> using one thread to retrieve the data from the oracle database and > >> another to insert the data into postgresql with something like a > >> pipe between the two threads might make sense, keeping both IO > >> streams busy. > > > IMHO he's wrong. Network interaction is quite slow compared with CPU > > performance, so there's no gain (maybe even overhead due to thread > > management and locking stuff). That's true even on multiprocessor > > machines, not only because there's almost nothing to compute but > > only IO traffic. CMIIW.Not so sure, there is low CPU in the Python script, > > but there may be > CPU+disk activity on the database sides [with cache management and other > optimizations on disk access]. > So, with a reader thread and a writer thread, he can have a select on a > database performed in parallel with an insert on the other database. > After, he must know if the two databases use same disks, same > controller, same host... or not. Some more detail: The machine running the script is distinct from the Oracle machine which is distinct from the Postgresql machine. So, CPU usage is low and because of the independent machines for the database end, it is definitely possible to read from one database while writing to the other. That is the solution that I am looking for, and Dennis's post seems pretty close to what I need. I will have to use some kind of buffer. A Queue isn't quite right as it stands, as the data is coming in as records, but for postgres loading, a file-like stream is what I need, so there will need to be either a wrapper around the Queue on the get() side. Or is there a better way to go about this detail? What seems to make sense to me is to stringify the incoming oracle data into some kind of buffer and then read on the postgresql side. Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: Graphs, bar charts, etc
On Feb 6, 7:57 am, Jan Danielsson <[EMAIL PROTECTED]> wrote: > Hello all, > >I have some data in a postgresql table which I view through a web > interface (the web interface is written in python -- using mod_python > under apache 2.2). Now I would like to represent this data as graphs, > bar charts, etc. > >I know about matplotlib, and it seemed like exactly what I was > looking for. I tried importing it in my script, but it gave me some > error about a home directory not being writable. I'm not sure I like the > idea of it require to be able to write somewhere. Am I using it wrong? > >Is there something else I can use which can produce graphs easily? > > -- > Kind regards, > Jan Danielsson You might want to look at RPy (http://rpy.sourceforge.net), an interface to R (statistical programming environment). Sean -- http://mail.python.org/mailman/listinfo/python-list
Telnet versus telnetlib
I have used command-line telnet to login to a host, paste into the window a small XML file, and then ^] to leave the window, and quit. This results in a page (described by the XML file) being printed on a printer. When I do an analogous process using telnetlib, I get no debug output, and most importantly, when I send the XML file to the host, I get no printed page. Unfortunately, I do not have access to the host to do troubleshooting there, so I have to "feel" my way around. Any suggestions on what might be going wrong? Thanks, Sean In [1]: import telnetlib In [2]: tn=telnetlib.Telnet() In [3]: tn.set_debuglevel(1) In [4]: tn.open('labmatr',56423) In [12]: tn.write(""" : : : F3B85FCE-55CF-4541-80EB- D1450377F7E0 : BP10004 0701 : : """) Telnet(labmatr,56423): send ' \n\n\nF3B85FCE-55CF-4541-80EB-D1450377F7E0 \nBP10004 0701\n\n' In [13]: tn.write("\n") Telnet(labmatr,56423): send '\n' In [14]: tn.close() -- http://mail.python.org/mailman/listinfo/python-list
Line segments, overlap, and bits
I am working with genomic data. Basically, it consists of many tuples of (start,end) on a line. I would like to convert these tuples of (start,end) to a string of bits where a bit is 1 if it is covered by any of the regions described by the (start,end) tuples and 0 if it is not. I then want to do set operations on multiple bit strings (AND, OR, NOT, etc.). Any suggestions on how to (1) set up the bit string and (2) operate on 1 or more of them? Java has a BitSet class that keeps this kind of thing pretty clean and high-level, but I haven't seen anything like it for python. Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Numpy array to gzip file
I have a set of numpy arrays which I would like to save to a gzip file. Here is an example without gzip: b=numpy.ones(100,dtype=numpy.uint8) a=numpy.zeros(100,dtype=numpy.uint8) fd = file('test.dat','wb') a.tofile(fd) b.tofile(fd) fd.close() This works fine. However, this does not: fd = gzip.open('test.dat','wb') a.tofile(fd) Traceback (most recent call last): File "", line 1, in IOError: first argument must be a string or open file In the bigger picture, I want to be able to write multiple numpy arrays with some metadata to a binary file for very fast reading, and these arrays are pretty compressible (strings of small integers), so I can probably benefit in speed and file size by gzipping. Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: Numpy array to gzip file
On Jun 11, 12:42 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Jun 11, 9:17 am, Sean Davis <[EMAIL PROTECTED]> wrote: > > > > > I have a set of numpy arrays which I would like to save to a gzip > > file. Here is an example without gzip: > > > b=numpy.ones(100,dtype=numpy.uint8) > > a=numpy.zeros(100,dtype=numpy.uint8) > > fd = file('test.dat','wb') > > a.tofile(fd) > > b.tofile(fd) > > fd.close() > > > This works fine. However, this does not: > > > fd = gzip.open('test.dat','wb') > > a.tofile(fd) > > > Traceback (most recent call last): > > File "", line 1, in > > IOError: first argument must be a string or open file > > > In the bigger picture, I want to be able to write multiple numpy > > arrays with some metadata to a binary file for very fast reading, and > > these arrays are pretty compressible (strings of small integers), so I > > can probably benefit in speed and file size by gzipping. > > > Thanks, > > Sean > > Use >fd.write(a) That seems to work fine. Just to add to the answer a bit, one can then use: b=numpy.frombuffer(fd.read(),dtype=numpy.uint8) to get the array back as a numpy uint8 array. Thanks for the help. Sean -- http://mail.python.org/mailman/listinfo/python-list
Importing module with name given as a variable
What is the "best practice" for importing an arbitrary module given that the name is stored in a variable? The context is a simple web application with URL dispatching to a module and function. I know of __import__(), the imp module, and exec. For each of these, is there a way to make them work "just like" the normal import call? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: running python as a dameon
On Sep 5, 5:55 pm, peyman <[EMAIL PROTECTED]> wrote: > Hi > > I have a Natural Language Processing (NLP) code written in python that > reads into memory a large training file and then given a sentence tags > it, using the training data. I want to put this NLP code on a server > which handles all incoming client http requests via PHP. What I want > to do is to provide the python NLP program as a service to any other > PHP/Java/Ruby process request. So the mapping is > > http -> apache -> PHP/Java/Ruby/... -> Python NLP Why not use a simple CGI script or wsgi application? You could make the service online and interactive and with the same application and code make an XMLRPC web service. So, things would look more like: http -> apache -> Python (running NLP and serving requests) You can use apache to proxy requests to any one of a dozen or so python-based webservers. You could also use mod_wsgi to interface with a wsgi application. Sean > > I can not provide this service as a shell script because of the > inefficiencies of having to load into memory a large training data to > every service request. So what I want to do is to provide the NLP > service to other application processes as a python daemon > > http -> apache -> PHP/Java/Ruby/... -> Python Dameon -> Python NLP > > The daemon loads into memory the training data once. then every > service request event invokes the appropriate NLP code in the python > program. I've tried to use play around with twisted but am not making > too much of a headway. What I've done in the NLP code is to do this: > > # filename: NLP.py > def parse(sentence): > structure=getentities(sentence) > print 'subject: \t',' '.join(structure[0]) > print 'predicate: \t',' '.join(structure[1]) > print 'TE: \t\t',' '.join(structure[2]) > > class TLogicClass(): > def __init__(self,prop): > return parse(prop) > > then in the dameon code done this > > # filename: dameon.py > from twisted.application import service > import NLP > > application=service.Application("nlp") > ParseService=NLP.TLogicClass("time flies like an arrow") > ParseService.setServiceParent(application) > > but I get the following error when I run twistd -y daemon.py > > >> Failed to load application: TLogicClass instance has no attribute > >> 'setServiceParent' > > I suspect I need to put twisted in the NLP.py but I don't know what I > need to do in order to get what I want to do which is: > > to load into memory only once a large training data that can then be > queried by another (non-python) event based process (I need "reactor" > class?). > > thank you in advance for your help -- http://mail.python.org/mailman/listinfo/python-list
emulating read and readline methods
I have a large file that I would like to transform and then feed to a function (psycopg2 copy_from) that expects a file-like object (needs read and readline methods). I have a class like so: class GeneInfo(): def __init__(self): #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/ gene_info.gz',"/tmp/gene_info.gz") self.fh = gzip.open("/tmp/gene_info.gz") self.fh.readline() #deal with header line def _read(self,n=1): for line in self.fh: if line=='': break line=line.strip() line=re.sub("\t-","\t",line) rowvals = line.split("\t") yield "\t".join([rowvals[i] for i in [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n" def readline(self,n=1): return self._read().next() def read(self,n=1): return self._read().next() def close(self): self.fh.close() and I use it like so: a=GeneInfo() cur.copy_from(a,"gene_info") a.close() It works well except that the end of file is not caught by copy_from. I get errors like: psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error during .read() call CONTEXT: COPY gene_info, line 1000: "" for a 1000 line test file. Any ideas what is going on? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: emulating read and readline methods
On Sep 10, 7:54 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Sep 11, 8:01 am, MRAB <[EMAIL PROTECTED]> wrote: > > > > > On Sep 10, 6:59 pm, Sean Davis <[EMAIL PROTECTED]> wrote: > > > > I have a large file that I would like to transform and then feed to a > > > function (psycopg2 copy_from) that expects a file-like object (needs > > > read and readline methods). > > > > I have a class like so: > > > > class GeneInfo(): > > > def __init__(self): > > > #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/ > > > gene_info.gz',"/tmp/gene_info.gz") > > > self.fh = gzip.open("/tmp/gene_info.gz") > > > self.fh.readline() #deal with header line > > > > def _read(self,n=1): > > > for line in self.fh: > > > if line=='': > > > break > > > line=line.strip() > > > line=re.sub("\t-","\t",line) > > > rowvals = line.split("\t") > > > yield "\t".join([rowvals[i] for i in > > > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n" > > > > def readline(self,n=1): > > > return self._read().next() > > > > def read(self,n=1): > > > return self._read().next() > > > Each time readline() and read() call self._read() they are creating a > > new generator. They then get one value from the newly-created > > generator and then discard that generator. What you should do is > > create the generator in __init__ and then use it in readline() and > > read(). > > > > def close(self): > > > self.fh.close() > > > > and I use it like so: > > > > a=GeneInfo() > > > cur.copy_from(a,"gene_info") > > > a.close() > > > > It works well except that the end of file is not caught by copy_from. > > > I get errors like: > > > > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error > > > during .read() call > > > CONTEXT: COPY gene_info, line 1000: "" > > > > for a 1000 line test file. Any ideas what is going on? > > > I wonder whether it's expecting readline() and read() to return an > > empty string at the end of the file instead of raising StopIteration. > > Don't wonder; ReadTheFantasticManual: > > read( [size]) > > ... An empty string is returned when EOF is encountered > immediately. ... > > readline( [size]) > > ... An empty string is returned only when EOF is encountered > immediately. Thanks. This was indeed my problem--not reading the manual closely enough. And the points about the iterator being re-instantiated were also right on point. Interestingly, in this case, the code was working because read() and readline() were still returning the next line each time since the file handle was being read one line at a time. Sean -- http://mail.python.org/mailman/listinfo/python-list
lxml and adding a stylesheet
I have an xml document and simply need to add an xml-stylesheet to it. I am using lxml to parse the xml document and then would like to insert the xml-stylesheet tag using the etree api. Any suggestions? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: Detecting the first time I open/append to a file
On Sep 23, 2:02 pm, [EMAIL PROTECTED] wrote: > I have a simulation that runs many times with different parameters, > and I want to aggregate the output into a single file with one rub: I > want a header to be written only the first time. My program looks a > bit like this: > > def main(): > for param in range(10): > simulate(param) > > def simulate(parameter): > 'Lots of code followed by: > with open(summaryFn, 'ab') as f: > writer = csv.writer(f) > writer.writerow(header) > writer.writerow(Sigma) > > If I can sense that the file is being created in the first iteration, > I can then use an if statement to decide whether or not I need to > write the header. Question: how can I tell if the file is being > created or if this its the first iteration? It's unrealistic to test > the value of the parameter as in the real problem, there are many > nested loops in main, and the bounds on the loop indices may change. You could use os.path.exists() to check if the file is there. However, the file could have been left over from a previous execution, etc. What might make sense is to open the file only once, store the file handle, and write to that throughout the execution. Sean -- http://mail.python.org/mailman/listinfo/python-list
including pygments code_block directive in rst2* from docutils
I would like to simply extend the rst2* scripts bundled with docutils to include a code_block directive. I have found a number of sites that discuss the topic, but I guess I am new enough to docutils to still be wondering how to make it actually happen. I'm looking to convert a single .rst file to bother html and pdf (via latex). Any suggestions? Thanks, Sean -- http://mail.python.org/mailman/listinfo/python-list