Re: Fuzzy matching of postal addresses

2005-01-23 Thread Joseph Turian
Andrew,

> Basically, I have two databases containing lists of postal addresses
and
> need to look for matching addresses in the two databases. More
> precisely, for each address in database A I want to find a single
> matching address in database B.

What percent of addresses in A have a unique corresponding address in
B? (i.e. how many addresses will have some match in B?)

This is a standard document retrieval task. Whole books could be
written about the topic. (In fact, many have been).

I suggest you don't waste your time trying to solve this problem from
scratch, and instead capitalize on the effort of others. Hence, my
proposal is pretty simple:
1. Regularize the punctuation of the text (e.g. convert it all to
uppercase), since it is uninformative and---at best---a confounding
variable.
2. Use a free information retrieval package to find matches.
e.g. LEMUR: http://www-2.cs.cmu.edu/~lemur/

In this case, a "document" is an address in Database B. A "query" is an
address in Database A. (Alternately, you could switch A and B to see if
that affects accuracy.)

Good luck.

   Joseph

-- 
http://mail.python.org/mailman/listinfo/python-list


Safest manner to extend search path for modules?

2005-07-25 Thread Joseph Turian
Hi,

What is the safest manner to extend search path for modules, minimizing
the likelihood of shooting oneself in the foot?

The system (which includes scripts and their shared modules) may be
checked out in several different locations, but a script in a
particular checked-out version of the system should only use modules
from that checkout location.

e.g. if the system contains a directory scripts/foo/ where all the
scripts are housed, and scripts/modules/ where all the modules are
housed, then is it correct for each script in scripts/foo/ to begin
with:
   import sys, os.path
   sys.path.append(os.path.join(sys.path[0], "../modules"))

If so, is there a cleaner way of doing this than including the above
text in all scripts?

Thanks,

Joseph

-- 
http://mail.python.org/mailman/listinfo/python-list


Python code for 2.5 and 2.4?

2008-02-25 Thread Joseph Turian
I was given code that was written for python 2.5, and uses simple
functions like 'all' which are not present in 2.4

I want to make the code 2.4 compatible. What is the best way to do
this?
If I define function 'all', then won't I break 2.5 compatability?

Thanks,
  Joseph
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python code for 2.5 and 2.4?

2008-02-25 Thread Joseph Turian
On 25 fév, 16:02, Robert Kern <[EMAIL PROTECTED]> wrote:
> Joseph Turian wrote:
> > I was given code that was written for python 2.5, and uses simple
> > functions like 'all' which are not present in 2.4
>
> > I want to make the code 2.4 compatible. What is the best way to do
> > this?
>
> If it's a single file, put something like the following code near the top. If
> you have multiple modules, put it into a separate module, say 
> compatibility.py,
> and change the other modules to import these functions from there.
>
> import sys
> if sys.version_info[:2] < (2,5):
>  def all(*args):
>  ...
>  def any(*args):
>  ...
> else:
>  # Only bother with this else clause and the __all__ line if you are 
> putting
>  # this in a separate file.
>  import __builtin__
>  all = __builtin__.all
>  any = __builtin__.any
>
> __all__ = ['all', 'any']
>
> > If I define function 'all', then won't I break 2.5 compatability?
>
> No. Defining a function named the same thing as a builtin function will not
> break anything. You just wouldn't be using the efficient implementation 
> already
> in Python 2.5. Using the if: else: suite above lets you have both at the 
> expense
> of some clunkiness.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
>   that is made terrible by our own mad attempt to interpret it as though it 
> had
>   an underlying truth."
>-- Umberto Eco

This is what I was looking for. Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Python 2.5 adoption

2008-04-18 Thread Joseph Turian
How widely adopted is python 2.5?

We are doing some development, and have a choice to make:
a) Use all the 2.5 features we want.
b) Maintain backwards compatability with 2.4.

So I guess the question is, does anyone have a sense of what percent
of python users don't have 2.5?

Thanks,
   Joseph
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2.5 adoption

2008-04-18 Thread Joseph Turian
Basically, we're planning on releasing it as open-source, and don't
want to alienate a large percentage of potential users.
-- 
http://mail.python.org/mailman/listinfo/python-list


Property in derived class

2008-05-09 Thread Joseph Turian
If I have a property in a derived class, it is difficult to override
the get and set functions: the property's function object had early
binding, whereas the overriden method was bound late.
This was previously discussed:
   
http://groups.google.com/group/comp.lang.python/browse_thread/thread/e13a1bd46b858dc8/9d32049aad12e1c1?lnk=gst#9d32049aad12e1c1

Could someone demonstrate how to implement the proposed solutions that
allow the property to be declared in the abstract base class, and
refer to a get function which is only implemented in derived classes?

Thanks,
  Joseph
--
http://mail.python.org/mailman/listinfo/python-list


Find more than one error at once

2008-05-09 Thread Joseph Turian
Is it possible to coax python to find more than one error at once?

Thanks,
  Joseph
--
http://mail.python.org/mailman/listinfo/python-list


Re: Property in derived class

2008-05-10 Thread Joseph Turian
On May 9, 9:05 pm, George Sakkis <[EMAIL PROTECTED]> wrote:
> Using the overridable property recipe [1],
> [1]http://infinitesque.net/articles/2005/enhancing%20Python's%20property...

Thanks, this is a great solution!

  Joseph
--
http://mail.python.org/mailman/listinfo/python-list


Re: Find more than one error at once

2008-05-10 Thread Joseph Turian
On May 10, 8:13 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:

> What kind of errors? Syntax-errors? Then use one of the python source
> code analyzers, such as pylint or pychecker.

Great!
--
http://mail.python.org/mailman/listinfo/python-list


Importing two module variables and having one clobber the other?

2006-03-21 Thread Joseph Turian
Can I simulate the behavior of "from foo import *" using
imp.load_module()?

Here's the functionality I need:

We allow the user to define parameter values, which are imported and
can be accessed directly as variables within Python. These are defined
in "parameters.py".

More specifically, let's say the user creates "dir/parameters.py" and
"dir/subdir/parameters.py" and then invokes the program from within
dir/subdir. We first import the values from dir/parameters.py and then
import the values from dirs/subdirs/parameters.py, potentially
clobbering any values from the first import.

How can I achieve this functionality?
Here's what I have in mind:
* Find every directory from os.environ["HOME"] through os.getcwd()
* Find all such directories in which parameters.py exists.
* "from parameters import *", starting from the highest-level directory
through the current directory.

Thanks!
   Joseph

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Importing two module variables and having one clobber the other?

2006-03-21 Thread Joseph Turian

Fredrik Lundh wrote:

> if you prefer to use a "parameters.value" syntax, you can wrap the resulting
> dictionary in a class.

That sounds good. How do I do that?

> I assume "from" means "beneath" and "getcwd" means "walk" ?

Actually:
def superdirs(d):
  lst = [d]
  while d != os.environ["HOME"]:
(d, tl) = os.path.split(d)
lst += [d]
  lst.reverse()
  return lst


   Joseph

-- 
http://mail.python.org/mailman/listinfo/python-list


Wrap a dictionary in a class?

2006-03-22 Thread Joseph Turian
In another thread, it was recommended that I wrap a dictionary in a
class.
How do I do so?

   Joseph

that thread:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/9a0fbdca450469a1/b18455aa8dbceb8a?q=turian&rnum=1#b18455aa8dbceb8a

-- 
http://mail.python.org/mailman/listinfo/python-list


SimpleXMLRPCServer clobbering sys.stderr? (2.5.2)

2009-10-17 Thread Joseph Turian
I was having a mysterious problem with SimpleXMLRPCServer. (I am using
Python 2.5.2)
The request handlers were sometimes failing without any error message
to the log output.

What I discovered was perplexing.
I had some 'print' statements in the handers that, assuming the
request would be handled, would print just fine. When I switched to
'print >> sys.stderr', the request handlers would just fail
completely, and not make the sys.stderr output that I desired.

It seems that SimpleXMLRPCServer is clobbering stderr in some bizarre
and silent-error-causing way.
I can't really find any documentation of explanation of this
phenomenon.

Could someone please illuminate it for me?

Best,
   Joseph
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SimpleXMLRPCServer clobbering sys.stderr? (2.5.2)

2009-10-18 Thread Joseph Turian
> > I was having a mysterious problem with SimpleXMLRPCServer. (I am using
> > Python 2.5.2)
>
> I'd start updating Python to the latest 2.5 release: 2.5.4

Ubuntu doesn't have 2.5.4. :(

> > The request handlers were sometimes failing without any error message
> > to the log output.
>
> > What I discovered was perplexing.
> > I had some 'print' statements in the handers that, assuming the
> > request would be handled, would print just fine. When I switched to
> > 'print >> sys.stderr', the request handlers would just fail
> > completely, and not make the sys.stderroutput that I desired.
>
> Perhaps you need to flush the file also? sys.stderr.flush()

Flushing will not help, because it will be too late.
It is almost like when I write to sys.stderr, the XMLRPC connection
just hangs up or something like that.

> XMLRPCServer doesn't reassign or alter sys.stderr, just uses it in the
> log_message method. I'd look in some other place...

Here's what I see:
* If I use logging to write the output, I don't see any output in the
server log, but the client gets correct results.
* If I use sys.stderr to write the output, I don't see any output in
the server log AND the client gets INcorrect results.
* If I use sys.stdout to write the output, I DO see any output in the
server log AND the client gets correct results.

Why does only sys.stdout work within XMLRPCServer registered methods?

Thanks,
   Joseph
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SimpleXMLRPCServer clobbering sys.stderr? (2.5.2)

2009-10-18 Thread Joseph Turian

> Here's what I see:
> * If I use logging to write the output, I don't see any output in the
> server log, but the client gets correct results.
> * If I use sys.stderrto write the output, I don't see any output in
> the server log AND the client gets INcorrect results.
> * If I use sys.stdout to write the output, I DO see any output in the
> server log AND the client gets correct results.

Oh, one more thing.

If I write sys.stdout and then issue:
  sys.stdout.flush()
It appears that the handler aborts at that point and gets no further
into the method.
Why would this happen?

Thanks,
   Joseph
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: handeling very large dictionaries

2009-06-28 Thread Joseph Turian
You could also try using a key-value store.
I am using pytc, a Python API for Tokyo Cabinet. It seems to figure
out quite nicely when to go to disk, and when to use memory. But I
have not done extensive tests.

Here is some example code for using pytc:
  http://github.com/turian/pytc-example/tree/master

 Joseph

On Jun 28, 7:13 pm, mclovin  wrote:
> Hello all,
>
> I need to have a dictionary of about 8 gigs (well the data it is
> processing is around 4gb). so naturally i am running into memory
> errors.
>
> So i looked around and found bsddb which acts like a dictionary object
> only offloads the data from the RAM to the HDD, however that only
> supports strings.
>
> my dictionaries hold my own class objects
>
> Is there something like it that is more flexible?

-- 
http://mail.python.org/mailman/listinfo/python-list


Abort SimpleXMLRPCServer request prematurely?

2009-06-29 Thread Joseph Turian
With SimpleXMLRPCServer, if the server is taking too long, how can I
use the client to kill the request and have the server abort
prematurely?

Thanks,

  Joseph
-- 
http://mail.python.org/mailman/listinfo/python-list