Re: Fuzzy matching of postal addresses
Andrew, > Basically, I have two databases containing lists of postal addresses and > need to look for matching addresses in the two databases. More > precisely, for each address in database A I want to find a single > matching address in database B. What percent of addresses in A have a unique corresponding address in B? (i.e. how many addresses will have some match in B?) This is a standard document retrieval task. Whole books could be written about the topic. (In fact, many have been). I suggest you don't waste your time trying to solve this problem from scratch, and instead capitalize on the effort of others. Hence, my proposal is pretty simple: 1. Regularize the punctuation of the text (e.g. convert it all to uppercase), since it is uninformative and---at best---a confounding variable. 2. Use a free information retrieval package to find matches. e.g. LEMUR: http://www-2.cs.cmu.edu/~lemur/ In this case, a "document" is an address in Database B. A "query" is an address in Database A. (Alternately, you could switch A and B to see if that affects accuracy.) Good luck. Joseph -- http://mail.python.org/mailman/listinfo/python-list
Safest manner to extend search path for modules?
Hi, What is the safest manner to extend search path for modules, minimizing the likelihood of shooting oneself in the foot? The system (which includes scripts and their shared modules) may be checked out in several different locations, but a script in a particular checked-out version of the system should only use modules from that checkout location. e.g. if the system contains a directory scripts/foo/ where all the scripts are housed, and scripts/modules/ where all the modules are housed, then is it correct for each script in scripts/foo/ to begin with: import sys, os.path sys.path.append(os.path.join(sys.path[0], "../modules")) If so, is there a cleaner way of doing this than including the above text in all scripts? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Python code for 2.5 and 2.4?
I was given code that was written for python 2.5, and uses simple functions like 'all' which are not present in 2.4 I want to make the code 2.4 compatible. What is the best way to do this? If I define function 'all', then won't I break 2.5 compatability? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: Python code for 2.5 and 2.4?
On 25 fév, 16:02, Robert Kern <[EMAIL PROTECTED]> wrote: > Joseph Turian wrote: > > I was given code that was written for python 2.5, and uses simple > > functions like 'all' which are not present in 2.4 > > > I want to make the code 2.4 compatible. What is the best way to do > > this? > > If it's a single file, put something like the following code near the top. If > you have multiple modules, put it into a separate module, say > compatibility.py, > and change the other modules to import these functions from there. > > import sys > if sys.version_info[:2] < (2,5): > def all(*args): > ... > def any(*args): > ... > else: > # Only bother with this else clause and the __all__ line if you are > putting > # this in a separate file. > import __builtin__ > all = __builtin__.all > any = __builtin__.any > > __all__ = ['all', 'any'] > > > If I define function 'all', then won't I break 2.5 compatability? > > No. Defining a function named the same thing as a builtin function will not > break anything. You just wouldn't be using the efficient implementation > already > in Python 2.5. Using the if: else: suite above lets you have both at the > expense > of some clunkiness. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it > had > an underlying truth." >-- Umberto Eco This is what I was looking for. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Python 2.5 adoption
How widely adopted is python 2.5? We are doing some development, and have a choice to make: a) Use all the 2.5 features we want. b) Maintain backwards compatability with 2.4. So I guess the question is, does anyone have a sense of what percent of python users don't have 2.5? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 2.5 adoption
Basically, we're planning on releasing it as open-source, and don't want to alienate a large percentage of potential users. -- http://mail.python.org/mailman/listinfo/python-list
Property in derived class
If I have a property in a derived class, it is difficult to override the get and set functions: the property's function object had early binding, whereas the overriden method was bound late. This was previously discussed: http://groups.google.com/group/comp.lang.python/browse_thread/thread/e13a1bd46b858dc8/9d32049aad12e1c1?lnk=gst#9d32049aad12e1c1 Could someone demonstrate how to implement the proposed solutions that allow the property to be declared in the abstract base class, and refer to a get function which is only implemented in derived classes? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Find more than one error at once
Is it possible to coax python to find more than one error at once? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: Property in derived class
On May 9, 9:05 pm, George Sakkis <[EMAIL PROTECTED]> wrote: > Using the overridable property recipe [1], > [1]http://infinitesque.net/articles/2005/enhancing%20Python's%20property... Thanks, this is a great solution! Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: Find more than one error at once
On May 10, 8:13 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > What kind of errors? Syntax-errors? Then use one of the python source > code analyzers, such as pylint or pychecker. Great! -- http://mail.python.org/mailman/listinfo/python-list
Importing two module variables and having one clobber the other?
Can I simulate the behavior of "from foo import *" using imp.load_module()? Here's the functionality I need: We allow the user to define parameter values, which are imported and can be accessed directly as variables within Python. These are defined in "parameters.py". More specifically, let's say the user creates "dir/parameters.py" and "dir/subdir/parameters.py" and then invokes the program from within dir/subdir. We first import the values from dir/parameters.py and then import the values from dirs/subdirs/parameters.py, potentially clobbering any values from the first import. How can I achieve this functionality? Here's what I have in mind: * Find every directory from os.environ["HOME"] through os.getcwd() * Find all such directories in which parameters.py exists. * "from parameters import *", starting from the highest-level directory through the current directory. Thanks! Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: Importing two module variables and having one clobber the other?
Fredrik Lundh wrote: > if you prefer to use a "parameters.value" syntax, you can wrap the resulting > dictionary in a class. That sounds good. How do I do that? > I assume "from" means "beneath" and "getcwd" means "walk" ? Actually: def superdirs(d): lst = [d] while d != os.environ["HOME"]: (d, tl) = os.path.split(d) lst += [d] lst.reverse() return lst Joseph -- http://mail.python.org/mailman/listinfo/python-list
Wrap a dictionary in a class?
In another thread, it was recommended that I wrap a dictionary in a class. How do I do so? Joseph that thread: http://groups.google.com/group/comp.lang.python/browse_frm/thread/9a0fbdca450469a1/b18455aa8dbceb8a?q=turian&rnum=1#b18455aa8dbceb8a -- http://mail.python.org/mailman/listinfo/python-list
SimpleXMLRPCServer clobbering sys.stderr? (2.5.2)
I was having a mysterious problem with SimpleXMLRPCServer. (I am using Python 2.5.2) The request handlers were sometimes failing without any error message to the log output. What I discovered was perplexing. I had some 'print' statements in the handers that, assuming the request would be handled, would print just fine. When I switched to 'print >> sys.stderr', the request handlers would just fail completely, and not make the sys.stderr output that I desired. It seems that SimpleXMLRPCServer is clobbering stderr in some bizarre and silent-error-causing way. I can't really find any documentation of explanation of this phenomenon. Could someone please illuminate it for me? Best, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: SimpleXMLRPCServer clobbering sys.stderr? (2.5.2)
> > I was having a mysterious problem with SimpleXMLRPCServer. (I am using > > Python 2.5.2) > > I'd start updating Python to the latest 2.5 release: 2.5.4 Ubuntu doesn't have 2.5.4. :( > > The request handlers were sometimes failing without any error message > > to the log output. > > > What I discovered was perplexing. > > I had some 'print' statements in the handers that, assuming the > > request would be handled, would print just fine. When I switched to > > 'print >> sys.stderr', the request handlers would just fail > > completely, and not make the sys.stderroutput that I desired. > > Perhaps you need to flush the file also? sys.stderr.flush() Flushing will not help, because it will be too late. It is almost like when I write to sys.stderr, the XMLRPC connection just hangs up or something like that. > XMLRPCServer doesn't reassign or alter sys.stderr, just uses it in the > log_message method. I'd look in some other place... Here's what I see: * If I use logging to write the output, I don't see any output in the server log, but the client gets correct results. * If I use sys.stderr to write the output, I don't see any output in the server log AND the client gets INcorrect results. * If I use sys.stdout to write the output, I DO see any output in the server log AND the client gets correct results. Why does only sys.stdout work within XMLRPCServer registered methods? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: SimpleXMLRPCServer clobbering sys.stderr? (2.5.2)
> Here's what I see: > * If I use logging to write the output, I don't see any output in the > server log, but the client gets correct results. > * If I use sys.stderrto write the output, I don't see any output in > the server log AND the client gets INcorrect results. > * If I use sys.stdout to write the output, I DO see any output in the > server log AND the client gets correct results. Oh, one more thing. If I write sys.stdout and then issue: sys.stdout.flush() It appears that the handler aborts at that point and gets no further into the method. Why would this happen? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list
Re: handeling very large dictionaries
You could also try using a key-value store. I am using pytc, a Python API for Tokyo Cabinet. It seems to figure out quite nicely when to go to disk, and when to use memory. But I have not done extensive tests. Here is some example code for using pytc: http://github.com/turian/pytc-example/tree/master Joseph On Jun 28, 7:13 pm, mclovin wrote: > Hello all, > > I need to have a dictionary of about 8 gigs (well the data it is > processing is around 4gb). so naturally i am running into memory > errors. > > So i looked around and found bsddb which acts like a dictionary object > only offloads the data from the RAM to the HDD, however that only > supports strings. > > my dictionaries hold my own class objects > > Is there something like it that is more flexible? -- http://mail.python.org/mailman/listinfo/python-list
Abort SimpleXMLRPCServer request prematurely?
With SimpleXMLRPCServer, if the server is taking too long, how can I use the client to kill the request and have the server abort prematurely? Thanks, Joseph -- http://mail.python.org/mailman/listinfo/python-list