Help beautify ugly heuristic code

2004-12-08 Thread Stuart D. Gathman
I have a function that recognizes PTR records for dynamic IPs.  There is
no hard and fast rule for this - every ISP does it differently, and may
change their policy at any time, and use different conventions in
different places.  Nevertheless, it is useful to apply stricter
authentication standards to incoming email when the PTR for the IP
indicates a dynamic IP (namely, the PTR record is ignored since it doesn't
mean anything except to the ISP).  This is because Windoze Zombies are the
favorite platform of spammers.

Here is the very ugly code so far.  It offends me to look at it, but
haven't had any better ideas.  I have lots of test data from mail logs.

# examples we don't yet recognize:
#
# 1Cust65.tnt4.atl4.da.uu.net at ('67.192.40.65', 4588)
# 1Cust200.tnt8.bne1.da.uu.net at ('203.61.67.200', 4144)
# 1Cust141.tnt30.rtm1.nld.da.uu.net at ('213.116.154.141', 2036)
# user64.net2045.mo.sprint-hsd.net at ('67.77.185.64', 3901)
# wiley-268-8196.roadrunner.nf.net at ('205.251.174.46', 4810)
# 221.fib163.satnet.net at ('200.69.163.221', 3301)
# cpc2-ches1-4-0-cust8.lutn.cable.ntl.com at ('80.4.105.8', 61099)
# user239.res.openband.net at ('65.246.82.239', 1392)
# xdsl-2449.zgora.dialog.net.pl at ('81.168.237.145', 1238)
# spr1-runc1-4-0-cust25.bagu.broadband.ntl.com at ('80.5.10.25', 1684)
# user-0c6s7hv.cable.mindspring.com at ('24.110.30.63', 3720)
# user-0c8hvet.cable.mindspring.com at ('24.136.253.221', 4529)
# user-0cdf5j8.cable.mindspring.com at ('24.215.150.104', 3783)
# mmds-dhcp-11-143.plateautel.net at ('63.99.131.143', 4858)
# ca-santaanahub-cuda3-c6b-134.anhmca.adelphia.net at ('68.67.152.134', 62047)
# cbl-sd-02-79.aster.com.do at ('200.88.62.79', 4153)
# h105n6c2o912.bredband.skanova.com at ('213.67.33.105', 3259)

import re

ip3 = re.compile('([0-9]{1,3})[.x-]([0-9]{1,3})[.x-]([0-9]{1,3})')
rehmac = re.compile(
 'h[0-9a-f]{12}[.]|pcp[0-9]{6,10}pcs[.]|no-reverse|S[0-9a-f]{16}[.][a-z]{2}[.]'
)

def is_dynip(host,addr):
  """Return True if hostname is for a dynamic ip.
  Examples:

  >>> is_dynip('post3.fabulousdealz.com','69.60.99.112')
  False
  >>> is_dynip('adsl-69-208-201-177.dsl.emhril.ameritech.net','69.208.201.177')
  True
  >>> is_dynip('[1.2.3.4]','1.2.3.4')
  True
  """
  if host.startswith('[') and host.endswith(']'):
return True
  if addr:
if host.find(addr) >= 0: return True
a = addr.split('.')
ia = map(int,a)
m = ip3.search(host)
if m:
  g = map(int,m.groups())
  if g == ia[1:] or g == ia[:3]: return True
  if g[0] == ia[3] and g[1:] == ia[:2]: return True
  g.reverse()
  if g == ia[1:] or g == ia[:3]: return True
if rehmac.search(host): return True
if host.find("%s." % '-'.join(a[2:])) >= 0: return True
if host.find("w%s." % '-'.join(a[:2])) >= 0: return True
if host.find("dsl%s-" % '-'.join(a[:2])) >= 0: return True
if host.find(''.join(a[:3])) >= 0: return True
if host.find(''.join(a[1:])) >= 0: return True
x = "%02x%02x%02x%02x" % tuple(ia)
if host.lower().find(x) >= 0: return True
z = [n.zfill(3) for n in a]
if host.find('-'.join(z)) >= 0: return True
if host.find("-%s." % '-'.join(z[2:])) >= 0: return True
if host.find("%s." % ''.join(z[2:])) >= 0: return True
if host.find(''.join(z)) >= 0: return True
a.reverse()
if host.find("%s." % '-'.join(a[:2])) >= 0: return True
if host.find("%s." % '.'.join(a[:2])) >= 0: return True
if host.find("%s." % a[0]) >= 0 and \
  host.find('.adsl.') > 0 or host.find('.dial-up.') > 0: return True
  return False

if __name__ == '__main__':
  import fileinput
  for ln in fileinput.input():
a = ln.split()
if len(a) == 2:
  ip,host = a
  if host.startswith('[') and host.endswith(']'):
continue# no PTR
  if is_dynip(host,ip):
print ip,host
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help beautify ugly heuristic code

2004-12-08 Thread Stuart D. Gathman
On Wed, 08 Dec 2004 18:00:06 -0500, Mitja wrote:

> On Wed, 08 Dec 2004 16:09:43 -0500, Stuart D. Gathman <[EMAIL PROTECTED]>
> wrote:
> 
>> I have a function that recognizes PTR records for dynamic IPs Here
>> is the very ugly code so far.
>> ...
>> # examples we don't yet recognize:
>> ...
> 
> This doesn't help much; post example of all the possible patterns you
> have to match (kind of like the docstring at the beginning, only more
> elaborate), otherwise it's hard to know what kind of code you're trying
> to implement.

This is a heuristic, so there is no exhaustive list or hard rule.
However, I have posted 23K+ examples at http://bmsi.com/python/dynip.samp
with DYN appended for examples which the current algorithm classifies
as dynamic.

Here are the last 20 (which my subjective judgement says are correct):

65.112.76.15usfshlxmx01.myreg.net
201.128.108.41  dsl-201-128-108-41.prod-infinitum.com.mx DYN
206.221.177.128 mail128.tanthillyingyang.com
68.234.254.147  68-234-254-147.stmnca.adelphia.net DYN
63.110.30.30mx81.goingwiththe-flow.info
62.178.226.189  chello062178226189.14.15.vie.surfer.at DYN
80.179.107.85   80.179.107.85.ispeednet.net DYN
200.204.68.52   200-204-68-52.dsl.telesp.net.br DYN
12.203.156.234  12-203-156-234.client.insightBB.com DYN
200.83.68.217   CM-lconC1-68-217.cm.vtr.net DYN
81.57.115.43pauguste-3-81-57-115-43.fbx.proxad.net DYN
64.151.91.225   sv2a.entertainmentnewsclips.com
64.62.197.31teenfreeway.sparklist.com
201.9.136.235   201009136235.user.veloxzone.com.br DYN
66.63.187.9191.asandox.com
83.69.188.198   st11h07.ptambre.com
66.192.199.217  66-192-199-217.pyramidcollection.org DYN
69.40.166.49h49.166.40.69.ip.alltel.net DYN
203.89.206.62   smtp.immigrationexpert.ca
80.143.79.97p508F4F61.dip0.t-ipconnect.de DYN
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help beautify ugly heuristic code

2004-12-08 Thread Stuart D. Gathman
On Wed, 08 Dec 2004 18:39:15 -0500, Lonnie Princehouse wrote:

> Regular expressions.
> 
> It takes a while to craft the expressions, but this will be more
> elegant, more extensible, and considerably faster to compute (matching
> compiled re's is fast).

I'm already doing that with the rehmac regex.  I like your idea for making
it more readable, though.  Looking for permutations of the IP address
gives much more bang for the line of code than most host only regexes
since it is ISP independent.  At least one ISP uses roman numerals to code
the IP for their dynamic addresses!  I tried matching a custom regex
computed from the IP, but compiling the regex for each test was too slow.

I could keep adding more patterns, but I was hoping for a tool that
"learns" from a database of preclassified examples how to recognize the
pattern.  And the resulting data would be reasonably compact.  I don't ask
for much, do I?  A Bayesian classifier would have too big of a database, I
think.  I've seen neural nets do amazing things with only 100 or so
neurons - a small weight database. But they are slow in software.

I have posted 10K preclassified (by current algorithm) examples here:
http://bmsi.com/python/dynip.samp
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help beautify ugly heuristic code

2004-12-08 Thread Stuart D. Gathman
On Wed, 08 Dec 2004 19:52:53 -0500, Lonnie Princehouse wrote:

> I don't think a Bayesian classifier is going to be very helpful here,
> unless you have tens of thousands of examples to feed it, or unless it

We do have tens of thousands of examples to feed it.

> The series of if host.find(...) lines in is_dynip() is equivalent to a
> regular expression, but much more expensive to execute because of all

It is not equivalent, because the patterns are based on the IP address.
As I mentioned before, I tried building a custom regex from the IP for
each test - but compiling the regex is way too slow to be done for each
test.

> For IP addresses, you really just need a mechanism to filter blocks of
> IP addresses.  It might be easiest to first convert them into hex and
> then make liberal use of [0-f] in regular expressions.

The point of the ip address is *not* to recognize ip addresses.  The
point is to look for transformations of the ip address in the hostname.
This gives a *huge* bang for the buck.  I have been working on this
problem for a while.  If the hostname has a transformation of the ip
address - it is (almost certainly) a dynamic address.  The ISPs are very
creative in their transformations, using the parts of the ip in various
orders and encoding in hex, base64, decimal with or without zerofill, and
even roman numerals.

The regex engine is just not powerful enough to handle parameterized
regexe (that I know of).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating Fixed Length Records

2004-12-08 Thread Stuart D. Gathman
On Wed, 08 Dec 2004 17:29:19 -0600, Greg Lindstrom wrote:


> One thought I had, which might lead to an addition to the language, was 
> to use the struct module.  If I could feed the pack method a format 
> string then a tuple of values (instead of individual values), then I 
> could create the format string once, then pass it a tuple with the 
> values for that record.  Just a thought. 

The language extension has been around since the early days. It was 
originally called 'apply'.  But now is called '*'.

>>> import struct
>>> fmt = '8s6s2s'
>>> v = ('foo','bar','AA')
>>> struct.pack(fmt,*v)
'foo\x00\x00\x00\x00\x00bar\x00\x00\x00AA'

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help beautify ugly heuristic code

2004-12-10 Thread Stuart D. Gathman
On Thu, 09 Dec 2004 00:01:36 -0800, Lonnie Princehouse wrote:

> I believe you can still do this with only compiling a regex once and
> then performing a few substitutions on the hostname.

Cool idea.  Convert ip matches to fixed patterns before matching a fixed
regex. The leftovers like shaw cable (which has the MAC address of the
cable modem instead of the IP) can still be handled with regex patterns.

I had an idea last night to compile 254 regexes, one for each possible
last IP byte - but I think your idea is better.

Mitja suggested a socring system reminiscent of SpamAssassin.

That gives me a few things to try.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help beautify ugly heuristic code

2004-12-10 Thread Stuart D. Gathman
On Fri, 10 Dec 2004 22:03:20 +, JanC wrote:

> Stuart D. Gathman schreef:
> 
>> I have a function that recognizes PTR records for dynamic IPs.  There

> Did you also think about ISPs that use such a PTR record for both dynamic 
> and fixed IPs?

There seems to be a lot of misunderstanding about this.  I am not blocking
anyones mail because they have a dynamic looking PTR.  I simply don't
accept such a PTR as MTA authentication.  You see, MTAs *SHOULD* provide a
fully qualified domain as their HELO name which resolves to the IP of the
MTA.  Sadly, however, many internet facing MTAs don't do this, but I
accept a meaningful PTR as a substitute.  I also waive the requirement for
MTA authentication if the MAIL FROM has an SPF record
(http://spf.pobox.com).

So, if your MTA complies with RFC HELO recommendations, you'll have no
trouble sending me mail. You can even use a dynamic IP with a dynamic DNS
service.

I 'do* block PTR names of "." or "localhost".  I would like to block all
single word HELO names - but there are too many clueless mail admins out
there.  People seem to be unsure of what to send for HELO.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sorting with expensive compares?

2005-12-28 Thread Stuart D. Gathman
On Sat, 24 Dec 2005 15:47:17 +1100, Steven D'Aprano wrote:

> On Fri, 23 Dec 2005 17:10:22 +, Dan Stromberg wrote:
>
>> I'm treating each file as a potentially very large string, and "sorting
>> the strings".
> 
> Which is a very strange thing to do, but I'll assume you have a good
> reason for doing so.

I believe what the original poster wants to do is eliminate duplicate
content from a collection of ogg/whatever files with different names. 
E.g., he has a python script that goes out and collects all the free music
it can find on the web.  The same song may appear on many sites under
different names, and he wants only one copy of a given song.

In any case, as others have pointed out, sorting by MD5 is sufficient
except in cases far less probable than hardware failure - and deliberate
collisions.  E.g., the RIAA creates collision pairs of MP3 files where one
member carries a freely redistributable license, and the other a "copy
this and we'll sue your ass off" license in an effort to trap the unwary.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Spiritual Programming (OT, but Python-inspired)

2006-01-03 Thread Stuart D. Gathman
On Mon, 02 Jan 2006 19:05:04 -0500, Steven D'Aprano wrote:


> I don't dare ask where your evidence for this hypothesis is, but I will
> ask what are your reasons for imagining this? What is the chain of
> thought that leads from:
> 
> Step 1: We live in a temporal world.
> 
> to:
> 
> Step N: Our ghost/soul must therefore live in a timeless state.

I can throw in some historical evidence against (assuming you accept the
Gospels as historical, that is - at least they are documents).  Christian
doctrine paraphrased in the programming mindset is that this temporal
world will be rebooted - destroyed and replaced with a "new heavens and
new earth".  The new earth will have time, but is purged of all evil.  The
goal of Christian practice is to cooperate with God as He cleans the
wickedness out of our souls so that we can inhabit the new creation.  The
cleaning experience is not always pleasant.  Taking a hard objective look
at the "goodness" of your behaviour can be humbling and embarrassing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: inline function call

2006-01-04 Thread Stuart D. Gathman
On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote:

> I'm googeling since some time, but can't find an answer - maybe because 
> the answer is 'No!'.
> 
> Can I call a function in python inline, so that the python byte compiler 
> does actually call the function, but sort of inserts it where the inline 
> call is made? Therefore avoiding the function all overhead.

In standard python, the answer is no.  The reason is that all python
functions are effectively "virtual", and you don't know *which* version to
inline.

HOWEVER, there is a slick product called Psyco:

http://psyco.sourceforge.net/

which gets around this by creating multiple versions of functions which
contain inlined (or compiled) code.  For instance, if foo(a,b) is often
called with a and b of int type, then a special version of foo is compiled
that is equivalent performance wise to foo(int a,int b).  Dynamically
finding the correct version of foo at runtime is no slower than normal
dynamic calls, so the result is a very fast foo function.  The only
tradeoff is that every specialized version of foo eats memory.  Psyco
provides controls allowing you to specialize only those functions that
need it after profiling your application.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is 'everything' a refrence or isn't it?

2006-01-04 Thread Stuart D. Gathman
On Wed, 04 Jan 2006 10:54:17 -0800, KraftDiner wrote:

> I was under the assumption that everything in python was a refrence...
> 
> so if I code this:
> lst = [1,2,3]
> for i in lst:
>if i==2:
>   i = 4
> print lst
> 
> I though the contents of lst would be modified.. (After reading that
> 'everything' is a refrence.)
> ...
> Have I misunderstood something?

It might help to do a translation to equivalent C:

int _i1 = 1;
int _i2 = 2;
int _i3 = 3;
int _i4 = 4;
int* lst[NLST] = { &_i1,&_i2,&_i3 };
int _idx;   /* internal iterator */
for (_idx = 0; _idx < NLST; ++_idx) {
  int *i = lst[_idx];
  if (*i == *_i2)
i = &_i4;
}
for (_idx = 0; _idx < NLST; ++_idx)
  printf("%d\n",*lst[_idx]);

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help beautify ugly heuristic code

2006-07-24 Thread Stuart D. Gathman
On Thu, 09 Dec 2004 00:01:36 -0800, Lonnie Princehouse wrote:
 
> I believe you can still do this with only compiling a regex once and
> then performing a few substitutions on the hostname.

That is a interesting idea.  Convert ip matches to fixed patterns, and
*then* match the regex.  I think I would convert hex matches to the same
pattern as decimal (and roman numeral).  How would you handle zero fill?

1.2.3.4 001002003004foo.isp.com

An idea I had last night is to precompile 254 regexes - one for each of
the possible last ip bytes.  However, your idea is cleaner - except, how
would it handle ip bytes that are the same: 1.2.2.2

Mitja has proposed a scoring system reminiscent of SpamAssassin.

This gives me a few things to try.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


smtplib timeout

2006-07-25 Thread Stuart D. Gathman
I am doing SMTP callbacks in a Python milter
(http://pymilter.sourceforge.net) using the smtplib module.  For some
spammer MXes, it takes days (!) before smtplib.sendmail will return. 
Since the spammer connects to us every few seconds, this quickly leads to
a problem :-)

I need to set a timelimit for the operation of
smtplib.sendmail.  It has to be thread based, because pymilter uses
libmilter which is thread based.  There are some cookbook recipies which
run a function in a new thread and call Thread.join(timeout).  This
doesn't help, because although the calling thread gets a nice timeout
exception, the thread running the function continues to run.  In fact, the
problem is worse, because even more threads are created.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: smtplib timeout

2006-07-25 Thread Stuart D. Gathman
On Tue, 25 Jul 2006 09:21:40 -0700, Alan Kennedy wrote:

> [Stuart D. Gathman]
>> I need to set a timelimit for the operation of
>> smtplib.sendmail.  It has to be thread based, because pymilter uses
>> libmilter which is thread based.
> 
> Have you tried setting a default socket timeout, which applies to all
> socket operations?

Does this apply to all threads, is it inherited when creating threads, or
does each thread need to specify it separately?

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: NFS server

2006-11-27 Thread Stuart D. Gathman
On Fri, 24 Nov 2006 05:00:53 -0800, srj wrote:

> i wish to develop an NFS server usin python from scratch( some wise guy
> told me i'ts easy!).
> can i get any kinda tutorial for this??
> 
> any suggestions on how 2 begin?

NFS is an RPC based protocol.  The first step is to be able to do
SunRCP/ONCRPC. Python has an 'xdrlib' which is how parameters are
marshalled, unmarshalled in sunrpc.  If allowed under "from scratch", you
could wrap the C rpc lib for python - they handle retries and other low
level stuff.  You could look "Remote Tea" for Java, and translate to
python to get a pure python oncrpc lib.  Or you could look for such a
package already written (a quick search didn't reveal any).

Once you have rpc, then it is "just" a matter of having your python server
implement the set of calls specified for NFS.  BTW, apparently python was
used for quickly building test rigs while developing NFS v4.

Having a framework for python NFS server could be useful - think custom
filesystem.  Although a python binding for fuse + C NFS server would
be more general (use locally as well as remotely).

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Driver selection

2006-12-08 Thread Stuart D. Gathman
The pyspf package [http://cheeseshop.python.org/pypi/pyspf/] can use
either pydns, or dnspython.  The pyspf module has a simple driver
function, DNSLookup(), that defaults to the pydns version.  It can be
assigned to a dnspython version, or to a test driver for in memory DNS. 
Or you can modify the source to "from drivermodule import DNSLookup".

What is the friendliest way to make this configurable?  Currently, users
are modifying the source to supply the desired driver.  Yuck.  I would
like to supply several drivers, and have a simple way to select one at
installation or run time.

-- 
      Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Driver selection

2006-12-09 Thread Stuart D. Gathman
On Fri, 08 Dec 2006 21:35:41 -0800, Gabriel Genellina wrote:

> On 9 dic, 00:53, "Stuart D. Gathman" <[EMAIL PROTECTED]> wrote:

>> Or you can modify the source to "from drivermodule import DNSLookup".
>>
>> What is the friendliest way to make this configurable?  Currently, users
>> are modifying the source to supply the desired driver.  Yuck.  I would
>> like to supply several drivers, and have a simple way to select one at
>> installation or run time.
> 
> You can store the user's choice in a configuration file (like the ones
> supported by ConfigParser).
> Then you can put all the logic to select and import the right function
> in a separate module, and export the DNSLookup name; make all the other
> parts of the application say "from mymodule import DNSLookup"

pyspf is a library, not an application.  It would be nasty to make it have
a config file.  We already have "from pydns_driver import DNSLookup" in
the pyspf module.  If only users could import *for* a module from
*outside* the module.  I suppose you can do something like this:

app.py:

import spf
# select dnspython driver for spf module
from spf.dnspython_driver import DNSLookup
spf.DNSLookup = DNSLookup
del DNSLookup

...

That is ugly.  I'm looking for better ideas.  Is there a clean way to make
a setdriver function?  Used like this, say:

app.py:

import spf
spf.set_driver('dnspython')
...

Can a function replace itself?  For instance, in spf.py, could DNSLookup
do this to provide a default:

def set_driver(d):
  if d == 'pydns':
from pydns_driver import DNSLookup
  elif d == 'dnspython':
from dnspython_driver import DNSLookup
  else: raise Exception('Unknown DNS driver')

def DNSLookup(name,t):
  from pydns_driver import DNSLookup
  return DNSLookup(name,t)

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Automatic debugging of copy by reference errors?

2006-12-09 Thread Stuart D. Gathman
On Sat, 09 Dec 2006 05:58:22 -0800, Niels L Ellegaard wrote:

> I wanted  a each object to know whether or not it was being referred to
> by a living object, and I wanted to warn the user whenever he tried to
> change an object that was being refered to by a living object.  As far
> as I can see the garbage collector module would allow to do some of
> this, but one would still have to edit the assignment operators of each
> of the standard data structures:

I think what you want is a namespace that requires each object to have
exactly one reference - the namespace.  Of course, additional references
will be created during evaluation of expressions.  So the best you can do
is provide a function that checks reference counts for a namespace when
called, and warns about objects with multiple references.  If that could
be called for every statement (i.e. not during expression evaluation -
something like C language "sequence points"), it would probably catch the
type of error you are looking for. Checking such a thing efficiently would
require deep changes to the interpreter.

The better approach is to revel in the ease with which data can be
referenced rather than copied.  I'm not sure it's worth turning python
into fortran - even for selected namespaces.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Package vs. module

2006-12-16 Thread Stuart D. Gathman
On Mon, 11 Dec 2006 20:59:27 -0300, Gabriel Genellina wrote:

> The above code *almost* works, but DNSLookup is a local name inside 
> the function. Use the global statement.
> As an example, see how getpass.py (in the standard library) manages 
> the various getpass implementations.

Ok, I have a working package:

spf/
  __init__.py
  pyspf.py
  pydns.py
  dnspython.py

__init__.py:
from pyspf import *
from pyspf import __author__,__email__,__version__ 

def set_dns_driver(f):
  global DNSLookup
  DNSLookup = f
  pyspf.DNSLookup = f

def DNSLookup(name,qtype,strict=True):
  import pydns
  return DNSLookup(name,qtype,strict)

set_dns_driver(DNSLookup)

Importing a driver module activates that driver.
For instance, in pydns.py:

import DNS# http://pydns.sourceforge.net
import spf

...
def DNSLookup(...):
  ...

spf.set_dns_driver(DNSLookup)

NOW, this is all very nice and modular. BUT, the original module was a
single file, which could be run as a script as well as imported as a
module. The script features provided useful command line functionality.
(Using if __name__ == '__main__':).  Now that 'spf' is a package, the
command line feature is gone!  Even using -m, I get:

python2.4 -m spf
python2.4: module spf has no associated file

Looking at getpass.py as advised, I see they put all the drivers in the
module.  I could do that with spf.py, I suppose.  But I like how with the
package, the driver code is not loaded unless needed.

One other idea I had was an arrangement like this:

SPF/
  pydns.py
  dnspython.py

spf.py

This would keep the module as a single file usable from the command line,
but still make driver available as separately loaded modules.

So which of the three options,

1) single file module with all drivers, ala getpass
2) package that cannot be run directly from command line
3) single file module with associated driver package

is the most pythonic?


-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I'm looking for a pythonic red-black tree...

2006-12-16 Thread Stuart D. Gathman
On Fri, 15 Dec 2006 01:20:34 +, Just Another Victim of the Ambient
Morality wrote:

> I need a red-black tree in Python and I was wondering if there was one 
> built in or if there's a good implementation out there.  Something that, 
> lets face it, does whatever the C++ std::map<> allows you to do...

Are you really looking specifically for a red-black tree, or do you want a
container where iterators return values in sorted order?  For instance, my
favorite sorted container is the skip-list:

 * inherently thread safe even during container updates
 * updates as fast as searches - significantly faster than red-black tree
 * search as fast as trees on average - but probablistic (like hashtable)
 * sequential access as fast as a linked list
 * Uses 33% less memory than binary trees (like red-black tree)
 * in general, performs like a hashtable, but sorted

Java example: http://bmsi.com/java/SkipList.java

In Python, the performance of a pure Python container will be
disappointing unless it leverages a native container.  For many
applications, you can use a dict and sort when iterating (using heapq to
amortize the sorting).

If I ever get the time, I would seriously consider trying to modify Python
to replace the built-in dict with a skip-list algorithm and compare the
performance.  Failing that, an extension module implementing a sorted
container of some description would be useful.

-- 
          Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Debugging SocketServer.ThreadingTCPServer

2007-01-15 Thread Stuart D. Gathman
I have a ThreadingTCPServer application (pygossip, part of 
http://sourceforge.net/projects/pymilter).  It mostly runs well, but
occasionally goes into a loop.  How can I get a stack trace of running
threads to figure out where the loop is?  Is there some equivalent of
sending SIGQUIT to Java to get a thread dump?  If needed, I can import pdb
and set options at startup, but there needs to be some external way of
triggering the dump since I can't reproduce it at will.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to document Python code...

2007-01-22 Thread Stuart D. Gathman
On Mon, 22 Jan 2007 20:40:57 +, Adonis Vargas wrote:

> But a quick look at pydoc (not to be confused with epydoc) 
> which is part of the standard library allows you to generate 
> documentation in HTML format, and/or serve it over web with its built-in 
> HTTP server.
> 
> pydoc: http://docs.python.org/lib/module-pydoc.html
> 
> Hope this helps.
> 
> Adonis

The HTML generated by pydoc doesn't link to standard modules properly. 
They are generated as relative links.  So it can't be used without
modification for generating docs for a web page about a python package. 

I'm struggling with the same issue.  Coding Python is so much easier than
Java.  However documenting Java is so much easier than Python.  Just
include doc comments, run javadoc, and voila!

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to document Python code...

2007-01-22 Thread Stuart D. Gathman
On Mon, 22 Jan 2007 17:35:18 -0500, Stuart D. Gathman wrote:

> The HTML generated by pydoc doesn't link to standard modules properly. 
> They are generated as relative links.  So it can't be used without
> modification for generating docs for a web page about a python package. 
> 
> I'm struggling with the same issue.  Coding Python is so much easier than
> Java.  However documenting Java is so much easier than Python.  Just
> include doc comments, run javadoc, and voila!

Wow!  I just tried epydoc, and it is every bit as easy as javadoc and
with similar output. Too bad it isn't standard.  But the comments and
docstrings it parses work fine with pydoc also.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


doctest problem with null byte

2007-01-25 Thread Stuart D. Gathman
I am trying to create a doctest test case for the following:

def quote_value(s):
"""Quote the value for a key-value pair in Received-SPF header field
if needed.  No quoting needed for a dot-atom value.

>>> quote_value(r'abc\def')
'"abcdef"'
>>> quote_value('abc..def')
'"abc..def"'
>>> quote_value('')
'""'
>>> quote_value('-all\x00')
'"-allx00"'
...

However, doctest does *not* like the null byte in the example (yes, this
happens with real life input):
**
File "/home/stuart/pyspf/spf.py", line 1453, in spf.quote_value
Failed example:
quote_value('-all')
Exception raised:
Traceback (most recent call last):
  File "/var/tmp/python2.4-2.4.4c1-root/usr/lib/python2.4/doctest.py",
line 1248, in __run
compileflags, 1) in test.globs
    TypeError: compile() expected string without null bytes
**

How can I construct a test cast for this?

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Debugging SocketServer.ThreadingTCPServer

2007-02-03 Thread Stuart D. Gathman
On Tue, 16 Jan 2007 09:11:38 -0500, Jean-Paul Calderone wrote:

> On Tue, 16 Jan 2007 00:23:35 -0500, "Stuart D. Gathman"
> <[EMAIL PROTECTED]> wrote:
>>I have a ThreadingTCPServer application (pygossip, part of
>>http://sourceforge.net/projects/pymilter).  It mostly runs well, but
>>occasionally goes into a loop.  How can I get a stack trace of running
>>threads to figure out where the loop is?  Is there some equivalent of
>>sending SIGQUIT to Java to get a thread dump?  If needed, I can import
>>pdb and set options at startup, but there needs to be some external way
>>of triggering the dump since I can't reproduce it at will.
> 
> Grab the gdbinit out of Python SVN Misc/ directory.  Apply this patch:
> 
>   http://jcalderone.livejournal.com/28224.html
> 
> Attach to the process using gdb.  Make sure you have debugging symbols
> in your build of Python.  Run 'thread apply all pystack'.

Did this.  gdb displays main thread fine (waiting on accept(), duh).  But
gdb goes into a loop displaying the first worker thread.  There are no
extension modules other than the batteries included ones. In this
application, I believe, only _socket.  (I.e. a pure python server.)

I will try for a C stack trace next time it loops.

Also, the looping server needs kill -9.  SIGTERM and SIGINT won't stop it.
And after it dies with SIGKILL, the port is still in use for 5 minutes or
so (and the server cannot be restarted).

This is really making me appreciate Java.

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Vim scripting with python

2007-02-03 Thread Stuart D. Gathman
On Sat, 03 Feb 2007 05:02:54 -0800, Tool69 wrote:

> Does anyone have any advice, and more genraly how to script Vim with
> Python ?

:py import sys
:py print sys.version
:help :py

> I know I can put some python functions inside my vimrc file like
> this :
> 
> function! My_function()
> python << EOF
> import vim, string
> ...blablabla
> EOF
> endfunction
> 
> but I would like to use external ".py" files.

:py import myfile

Use :py inside your vimrc - don't run python externally.
-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python Right for Me?

2007-02-03 Thread Stuart D. Gathman
On Fri, 02 Feb 2007 15:09:20 -0500, Mister Newbie wrote:

> I want to make small, 2D games. I have no programming experience. Is
> Python a good choice?

Definitely.  I teach a class for 7th to 12th grade where I use this
tutorial to introduce programming:

http://www.livewires.org.uk/python/
http://www.livewires.org.uk/python/pdfsheets.html

As an adult, just skip rapidly through the elementary material.  The final
module (Games sheets) walks you through creating 3 2D games with pygame!

-- 
  Stuart D. Gathman <[EMAIL PROTECTED]>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-- 
http://mail.python.org/mailman/listinfo/python-list