Help beautify ugly heuristic code
I have a function that recognizes PTR records for dynamic IPs. There is no hard and fast rule for this - every ISP does it differently, and may change their policy at any time, and use different conventions in different places. Nevertheless, it is useful to apply stricter authentication standards to incoming email when the PTR for the IP indicates a dynamic IP (namely, the PTR record is ignored since it doesn't mean anything except to the ISP). This is because Windoze Zombies are the favorite platform of spammers. Here is the very ugly code so far. It offends me to look at it, but haven't had any better ideas. I have lots of test data from mail logs. # examples we don't yet recognize: # # 1Cust65.tnt4.atl4.da.uu.net at ('67.192.40.65', 4588) # 1Cust200.tnt8.bne1.da.uu.net at ('203.61.67.200', 4144) # 1Cust141.tnt30.rtm1.nld.da.uu.net at ('213.116.154.141', 2036) # user64.net2045.mo.sprint-hsd.net at ('67.77.185.64', 3901) # wiley-268-8196.roadrunner.nf.net at ('205.251.174.46', 4810) # 221.fib163.satnet.net at ('200.69.163.221', 3301) # cpc2-ches1-4-0-cust8.lutn.cable.ntl.com at ('80.4.105.8', 61099) # user239.res.openband.net at ('65.246.82.239', 1392) # xdsl-2449.zgora.dialog.net.pl at ('81.168.237.145', 1238) # spr1-runc1-4-0-cust25.bagu.broadband.ntl.com at ('80.5.10.25', 1684) # user-0c6s7hv.cable.mindspring.com at ('24.110.30.63', 3720) # user-0c8hvet.cable.mindspring.com at ('24.136.253.221', 4529) # user-0cdf5j8.cable.mindspring.com at ('24.215.150.104', 3783) # mmds-dhcp-11-143.plateautel.net at ('63.99.131.143', 4858) # ca-santaanahub-cuda3-c6b-134.anhmca.adelphia.net at ('68.67.152.134', 62047) # cbl-sd-02-79.aster.com.do at ('200.88.62.79', 4153) # h105n6c2o912.bredband.skanova.com at ('213.67.33.105', 3259) import re ip3 = re.compile('([0-9]{1,3})[.x-]([0-9]{1,3})[.x-]([0-9]{1,3})') rehmac = re.compile( 'h[0-9a-f]{12}[.]|pcp[0-9]{6,10}pcs[.]|no-reverse|S[0-9a-f]{16}[.][a-z]{2}[.]' ) def is_dynip(host,addr): """Return True if hostname is for a dynamic ip. Examples: >>> is_dynip('post3.fabulousdealz.com','69.60.99.112') False >>> is_dynip('adsl-69-208-201-177.dsl.emhril.ameritech.net','69.208.201.177') True >>> is_dynip('[1.2.3.4]','1.2.3.4') True """ if host.startswith('[') and host.endswith(']'): return True if addr: if host.find(addr) >= 0: return True a = addr.split('.') ia = map(int,a) m = ip3.search(host) if m: g = map(int,m.groups()) if g == ia[1:] or g == ia[:3]: return True if g[0] == ia[3] and g[1:] == ia[:2]: return True g.reverse() if g == ia[1:] or g == ia[:3]: return True if rehmac.search(host): return True if host.find("%s." % '-'.join(a[2:])) >= 0: return True if host.find("w%s." % '-'.join(a[:2])) >= 0: return True if host.find("dsl%s-" % '-'.join(a[:2])) >= 0: return True if host.find(''.join(a[:3])) >= 0: return True if host.find(''.join(a[1:])) >= 0: return True x = "%02x%02x%02x%02x" % tuple(ia) if host.lower().find(x) >= 0: return True z = [n.zfill(3) for n in a] if host.find('-'.join(z)) >= 0: return True if host.find("-%s." % '-'.join(z[2:])) >= 0: return True if host.find("%s." % ''.join(z[2:])) >= 0: return True if host.find(''.join(z)) >= 0: return True a.reverse() if host.find("%s." % '-'.join(a[:2])) >= 0: return True if host.find("%s." % '.'.join(a[:2])) >= 0: return True if host.find("%s." % a[0]) >= 0 and \ host.find('.adsl.') > 0 or host.find('.dial-up.') > 0: return True return False if __name__ == '__main__': import fileinput for ln in fileinput.input(): a = ln.split() if len(a) == 2: ip,host = a if host.startswith('[') and host.endswith(']'): continue# no PTR if is_dynip(host,ip): print ip,host -- http://mail.python.org/mailman/listinfo/python-list
Re: Help beautify ugly heuristic code
On Wed, 08 Dec 2004 18:00:06 -0500, Mitja wrote: > On Wed, 08 Dec 2004 16:09:43 -0500, Stuart D. Gathman <[EMAIL PROTECTED]> > wrote: > >> I have a function that recognizes PTR records for dynamic IPs Here >> is the very ugly code so far. >> ... >> # examples we don't yet recognize: >> ... > > This doesn't help much; post example of all the possible patterns you > have to match (kind of like the docstring at the beginning, only more > elaborate), otherwise it's hard to know what kind of code you're trying > to implement. This is a heuristic, so there is no exhaustive list or hard rule. However, I have posted 23K+ examples at http://bmsi.com/python/dynip.samp with DYN appended for examples which the current algorithm classifies as dynamic. Here are the last 20 (which my subjective judgement says are correct): 65.112.76.15usfshlxmx01.myreg.net 201.128.108.41 dsl-201-128-108-41.prod-infinitum.com.mx DYN 206.221.177.128 mail128.tanthillyingyang.com 68.234.254.147 68-234-254-147.stmnca.adelphia.net DYN 63.110.30.30mx81.goingwiththe-flow.info 62.178.226.189 chello062178226189.14.15.vie.surfer.at DYN 80.179.107.85 80.179.107.85.ispeednet.net DYN 200.204.68.52 200-204-68-52.dsl.telesp.net.br DYN 12.203.156.234 12-203-156-234.client.insightBB.com DYN 200.83.68.217 CM-lconC1-68-217.cm.vtr.net DYN 81.57.115.43pauguste-3-81-57-115-43.fbx.proxad.net DYN 64.151.91.225 sv2a.entertainmentnewsclips.com 64.62.197.31teenfreeway.sparklist.com 201.9.136.235 201009136235.user.veloxzone.com.br DYN 66.63.187.9191.asandox.com 83.69.188.198 st11h07.ptambre.com 66.192.199.217 66-192-199-217.pyramidcollection.org DYN 69.40.166.49h49.166.40.69.ip.alltel.net DYN 203.89.206.62 smtp.immigrationexpert.ca 80.143.79.97p508F4F61.dip0.t-ipconnect.de DYN -- http://mail.python.org/mailman/listinfo/python-list
Re: Help beautify ugly heuristic code
On Wed, 08 Dec 2004 18:39:15 -0500, Lonnie Princehouse wrote: > Regular expressions. > > It takes a while to craft the expressions, but this will be more > elegant, more extensible, and considerably faster to compute (matching > compiled re's is fast). I'm already doing that with the rehmac regex. I like your idea for making it more readable, though. Looking for permutations of the IP address gives much more bang for the line of code than most host only regexes since it is ISP independent. At least one ISP uses roman numerals to code the IP for their dynamic addresses! I tried matching a custom regex computed from the IP, but compiling the regex for each test was too slow. I could keep adding more patterns, but I was hoping for a tool that "learns" from a database of preclassified examples how to recognize the pattern. And the resulting data would be reasonably compact. I don't ask for much, do I? A Bayesian classifier would have too big of a database, I think. I've seen neural nets do amazing things with only 100 or so neurons - a small weight database. But they are slow in software. I have posted 10K preclassified (by current algorithm) examples here: http://bmsi.com/python/dynip.samp -- http://mail.python.org/mailman/listinfo/python-list
Re: Help beautify ugly heuristic code
On Wed, 08 Dec 2004 19:52:53 -0500, Lonnie Princehouse wrote: > I don't think a Bayesian classifier is going to be very helpful here, > unless you have tens of thousands of examples to feed it, or unless it We do have tens of thousands of examples to feed it. > The series of if host.find(...) lines in is_dynip() is equivalent to a > regular expression, but much more expensive to execute because of all It is not equivalent, because the patterns are based on the IP address. As I mentioned before, I tried building a custom regex from the IP for each test - but compiling the regex is way too slow to be done for each test. > For IP addresses, you really just need a mechanism to filter blocks of > IP addresses. It might be easiest to first convert them into hex and > then make liberal use of [0-f] in regular expressions. The point of the ip address is *not* to recognize ip addresses. The point is to look for transformations of the ip address in the hostname. This gives a *huge* bang for the buck. I have been working on this problem for a while. If the hostname has a transformation of the ip address - it is (almost certainly) a dynamic address. The ISPs are very creative in their transformations, using the parts of the ip in various orders and encoding in hex, base64, decimal with or without zerofill, and even roman numerals. The regex engine is just not powerful enough to handle parameterized regexe (that I know of). -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating Fixed Length Records
On Wed, 08 Dec 2004 17:29:19 -0600, Greg Lindstrom wrote: > One thought I had, which might lead to an addition to the language, was > to use the struct module. If I could feed the pack method a format > string then a tuple of values (instead of individual values), then I > could create the format string once, then pass it a tuple with the > values for that record. Just a thought. The language extension has been around since the early days. It was originally called 'apply'. But now is called '*'. >>> import struct >>> fmt = '8s6s2s' >>> v = ('foo','bar','AA') >>> struct.pack(fmt,*v) 'foo\x00\x00\x00\x00\x00bar\x00\x00\x00AA' -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Help beautify ugly heuristic code
On Thu, 09 Dec 2004 00:01:36 -0800, Lonnie Princehouse wrote: > I believe you can still do this with only compiling a regex once and > then performing a few substitutions on the hostname. Cool idea. Convert ip matches to fixed patterns before matching a fixed regex. The leftovers like shaw cable (which has the MAC address of the cable modem instead of the IP) can still be handled with regex patterns. I had an idea last night to compile 254 regexes, one for each possible last IP byte - but I think your idea is better. Mitja suggested a socring system reminiscent of SpamAssassin. That gives me a few things to try. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Help beautify ugly heuristic code
On Fri, 10 Dec 2004 22:03:20 +, JanC wrote: > Stuart D. Gathman schreef: > >> I have a function that recognizes PTR records for dynamic IPs. There > Did you also think about ISPs that use such a PTR record for both dynamic > and fixed IPs? There seems to be a lot of misunderstanding about this. I am not blocking anyones mail because they have a dynamic looking PTR. I simply don't accept such a PTR as MTA authentication. You see, MTAs *SHOULD* provide a fully qualified domain as their HELO name which resolves to the IP of the MTA. Sadly, however, many internet facing MTAs don't do this, but I accept a meaningful PTR as a substitute. I also waive the requirement for MTA authentication if the MAIL FROM has an SPF record (http://spf.pobox.com). So, if your MTA complies with RFC HELO recommendations, you'll have no trouble sending me mail. You can even use a dynamic IP with a dynamic DNS service. I 'do* block PTR names of "." or "localhost". I would like to block all single word HELO names - but there are too many clueless mail admins out there. People seem to be unsure of what to send for HELO. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: sorting with expensive compares?
On Sat, 24 Dec 2005 15:47:17 +1100, Steven D'Aprano wrote: > On Fri, 23 Dec 2005 17:10:22 +, Dan Stromberg wrote: > >> I'm treating each file as a potentially very large string, and "sorting >> the strings". > > Which is a very strange thing to do, but I'll assume you have a good > reason for doing so. I believe what the original poster wants to do is eliminate duplicate content from a collection of ogg/whatever files with different names. E.g., he has a python script that goes out and collects all the free music it can find on the web. The same song may appear on many sites under different names, and he wants only one copy of a given song. In any case, as others have pointed out, sorting by MD5 is sufficient except in cases far less probable than hardware failure - and deliberate collisions. E.g., the RIAA creates collision pairs of MP3 files where one member carries a freely redistributable license, and the other a "copy this and we'll sue your ass off" license in an effort to trap the unwary. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Spiritual Programming (OT, but Python-inspired)
On Mon, 02 Jan 2006 19:05:04 -0500, Steven D'Aprano wrote: > I don't dare ask where your evidence for this hypothesis is, but I will > ask what are your reasons for imagining this? What is the chain of > thought that leads from: > > Step 1: We live in a temporal world. > > to: > > Step N: Our ghost/soul must therefore live in a timeless state. I can throw in some historical evidence against (assuming you accept the Gospels as historical, that is - at least they are documents). Christian doctrine paraphrased in the programming mindset is that this temporal world will be rebooted - destroyed and replaced with a "new heavens and new earth". The new earth will have time, but is purged of all evil. The goal of Christian practice is to cooperate with God as He cleans the wickedness out of our souls so that we can inhabit the new creation. The cleaning experience is not always pleasant. Taking a hard objective look at the "goodness" of your behaviour can be humbling and embarrassing. -- http://mail.python.org/mailman/listinfo/python-list
Re: inline function call
On Wed, 04 Jan 2006 13:18:32 +0100, Riko Wichmann wrote: > I'm googeling since some time, but can't find an answer - maybe because > the answer is 'No!'. > > Can I call a function in python inline, so that the python byte compiler > does actually call the function, but sort of inserts it where the inline > call is made? Therefore avoiding the function all overhead. In standard python, the answer is no. The reason is that all python functions are effectively "virtual", and you don't know *which* version to inline. HOWEVER, there is a slick product called Psyco: http://psyco.sourceforge.net/ which gets around this by creating multiple versions of functions which contain inlined (or compiled) code. For instance, if foo(a,b) is often called with a and b of int type, then a special version of foo is compiled that is equivalent performance wise to foo(int a,int b). Dynamically finding the correct version of foo at runtime is no slower than normal dynamic calls, so the result is a very fast foo function. The only tradeoff is that every specialized version of foo eats memory. Psyco provides controls allowing you to specialize only those functions that need it after profiling your application. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is 'everything' a refrence or isn't it?
On Wed, 04 Jan 2006 10:54:17 -0800, KraftDiner wrote: > I was under the assumption that everything in python was a refrence... > > so if I code this: > lst = [1,2,3] > for i in lst: >if i==2: > i = 4 > print lst > > I though the contents of lst would be modified.. (After reading that > 'everything' is a refrence.) > ... > Have I misunderstood something? It might help to do a translation to equivalent C: int _i1 = 1; int _i2 = 2; int _i3 = 3; int _i4 = 4; int* lst[NLST] = { &_i1,&_i2,&_i3 }; int _idx; /* internal iterator */ for (_idx = 0; _idx < NLST; ++_idx) { int *i = lst[_idx]; if (*i == *_i2) i = &_i4; } for (_idx = 0; _idx < NLST; ++_idx) printf("%d\n",*lst[_idx]); -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Help beautify ugly heuristic code
On Thu, 09 Dec 2004 00:01:36 -0800, Lonnie Princehouse wrote: > I believe you can still do this with only compiling a regex once and > then performing a few substitutions on the hostname. That is a interesting idea. Convert ip matches to fixed patterns, and *then* match the regex. I think I would convert hex matches to the same pattern as decimal (and roman numeral). How would you handle zero fill? 1.2.3.4 001002003004foo.isp.com An idea I had last night is to precompile 254 regexes - one for each of the possible last ip bytes. However, your idea is cleaner - except, how would it handle ip bytes that are the same: 1.2.2.2 Mitja has proposed a scoring system reminiscent of SpamAssassin. This gives me a few things to try. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
smtplib timeout
I am doing SMTP callbacks in a Python milter (http://pymilter.sourceforge.net) using the smtplib module. For some spammer MXes, it takes days (!) before smtplib.sendmail will return. Since the spammer connects to us every few seconds, this quickly leads to a problem :-) I need to set a timelimit for the operation of smtplib.sendmail. It has to be thread based, because pymilter uses libmilter which is thread based. There are some cookbook recipies which run a function in a new thread and call Thread.join(timeout). This doesn't help, because although the calling thread gets a nice timeout exception, the thread running the function continues to run. In fact, the problem is worse, because even more threads are created. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: smtplib timeout
On Tue, 25 Jul 2006 09:21:40 -0700, Alan Kennedy wrote: > [Stuart D. Gathman] >> I need to set a timelimit for the operation of >> smtplib.sendmail. It has to be thread based, because pymilter uses >> libmilter which is thread based. > > Have you tried setting a default socket timeout, which applies to all > socket operations? Does this apply to all threads, is it inherited when creating threads, or does each thread need to specify it separately? -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: NFS server
On Fri, 24 Nov 2006 05:00:53 -0800, srj wrote: > i wish to develop an NFS server usin python from scratch( some wise guy > told me i'ts easy!). > can i get any kinda tutorial for this?? > > any suggestions on how 2 begin? NFS is an RPC based protocol. The first step is to be able to do SunRCP/ONCRPC. Python has an 'xdrlib' which is how parameters are marshalled, unmarshalled in sunrpc. If allowed under "from scratch", you could wrap the C rpc lib for python - they handle retries and other low level stuff. You could look "Remote Tea" for Java, and translate to python to get a pure python oncrpc lib. Or you could look for such a package already written (a quick search didn't reveal any). Once you have rpc, then it is "just" a matter of having your python server implement the set of calls specified for NFS. BTW, apparently python was used for quickly building test rigs while developing NFS v4. Having a framework for python NFS server could be useful - think custom filesystem. Although a python binding for fuse + C NFS server would be more general (use locally as well as remotely). -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Driver selection
The pyspf package [http://cheeseshop.python.org/pypi/pyspf/] can use either pydns, or dnspython. The pyspf module has a simple driver function, DNSLookup(), that defaults to the pydns version. It can be assigned to a dnspython version, or to a test driver for in memory DNS. Or you can modify the source to "from drivermodule import DNSLookup". What is the friendliest way to make this configurable? Currently, users are modifying the source to supply the desired driver. Yuck. I would like to supply several drivers, and have a simple way to select one at installation or run time. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Driver selection
On Fri, 08 Dec 2006 21:35:41 -0800, Gabriel Genellina wrote: > On 9 dic, 00:53, "Stuart D. Gathman" <[EMAIL PROTECTED]> wrote: >> Or you can modify the source to "from drivermodule import DNSLookup". >> >> What is the friendliest way to make this configurable? Currently, users >> are modifying the source to supply the desired driver. Yuck. I would >> like to supply several drivers, and have a simple way to select one at >> installation or run time. > > You can store the user's choice in a configuration file (like the ones > supported by ConfigParser). > Then you can put all the logic to select and import the right function > in a separate module, and export the DNSLookup name; make all the other > parts of the application say "from mymodule import DNSLookup" pyspf is a library, not an application. It would be nasty to make it have a config file. We already have "from pydns_driver import DNSLookup" in the pyspf module. If only users could import *for* a module from *outside* the module. I suppose you can do something like this: app.py: import spf # select dnspython driver for spf module from spf.dnspython_driver import DNSLookup spf.DNSLookup = DNSLookup del DNSLookup ... That is ugly. I'm looking for better ideas. Is there a clean way to make a setdriver function? Used like this, say: app.py: import spf spf.set_driver('dnspython') ... Can a function replace itself? For instance, in spf.py, could DNSLookup do this to provide a default: def set_driver(d): if d == 'pydns': from pydns_driver import DNSLookup elif d == 'dnspython': from dnspython_driver import DNSLookup else: raise Exception('Unknown DNS driver') def DNSLookup(name,t): from pydns_driver import DNSLookup return DNSLookup(name,t) -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Automatic debugging of copy by reference errors?
On Sat, 09 Dec 2006 05:58:22 -0800, Niels L Ellegaard wrote: > I wanted a each object to know whether or not it was being referred to > by a living object, and I wanted to warn the user whenever he tried to > change an object that was being refered to by a living object. As far > as I can see the garbage collector module would allow to do some of > this, but one would still have to edit the assignment operators of each > of the standard data structures: I think what you want is a namespace that requires each object to have exactly one reference - the namespace. Of course, additional references will be created during evaluation of expressions. So the best you can do is provide a function that checks reference counts for a namespace when called, and warns about objects with multiple references. If that could be called for every statement (i.e. not during expression evaluation - something like C language "sequence points"), it would probably catch the type of error you are looking for. Checking such a thing efficiently would require deep changes to the interpreter. The better approach is to revel in the ease with which data can be referenced rather than copied. I'm not sure it's worth turning python into fortran - even for selected namespaces. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Package vs. module
On Mon, 11 Dec 2006 20:59:27 -0300, Gabriel Genellina wrote: > The above code *almost* works, but DNSLookup is a local name inside > the function. Use the global statement. > As an example, see how getpass.py (in the standard library) manages > the various getpass implementations. Ok, I have a working package: spf/ __init__.py pyspf.py pydns.py dnspython.py __init__.py: from pyspf import * from pyspf import __author__,__email__,__version__ def set_dns_driver(f): global DNSLookup DNSLookup = f pyspf.DNSLookup = f def DNSLookup(name,qtype,strict=True): import pydns return DNSLookup(name,qtype,strict) set_dns_driver(DNSLookup) Importing a driver module activates that driver. For instance, in pydns.py: import DNS# http://pydns.sourceforge.net import spf ... def DNSLookup(...): ... spf.set_dns_driver(DNSLookup) NOW, this is all very nice and modular. BUT, the original module was a single file, which could be run as a script as well as imported as a module. The script features provided useful command line functionality. (Using if __name__ == '__main__':). Now that 'spf' is a package, the command line feature is gone! Even using -m, I get: python2.4 -m spf python2.4: module spf has no associated file Looking at getpass.py as advised, I see they put all the drivers in the module. I could do that with spf.py, I suppose. But I like how with the package, the driver code is not loaded unless needed. One other idea I had was an arrangement like this: SPF/ pydns.py dnspython.py spf.py This would keep the module as a single file usable from the command line, but still make driver available as separately loaded modules. So which of the three options, 1) single file module with all drivers, ala getpass 2) package that cannot be run directly from command line 3) single file module with associated driver package is the most pythonic? -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: I'm looking for a pythonic red-black tree...
On Fri, 15 Dec 2006 01:20:34 +, Just Another Victim of the Ambient Morality wrote: > I need a red-black tree in Python and I was wondering if there was one > built in or if there's a good implementation out there. Something that, > lets face it, does whatever the C++ std::map<> allows you to do... Are you really looking specifically for a red-black tree, or do you want a container where iterators return values in sorted order? For instance, my favorite sorted container is the skip-list: * inherently thread safe even during container updates * updates as fast as searches - significantly faster than red-black tree * search as fast as trees on average - but probablistic (like hashtable) * sequential access as fast as a linked list * Uses 33% less memory than binary trees (like red-black tree) * in general, performs like a hashtable, but sorted Java example: http://bmsi.com/java/SkipList.java In Python, the performance of a pure Python container will be disappointing unless it leverages a native container. For many applications, you can use a dict and sort when iterating (using heapq to amortize the sorting). If I ever get the time, I would seriously consider trying to modify Python to replace the built-in dict with a skip-list algorithm and compare the performance. Failing that, an extension module implementing a sorted container of some description would be useful. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Debugging SocketServer.ThreadingTCPServer
I have a ThreadingTCPServer application (pygossip, part of http://sourceforge.net/projects/pymilter). It mostly runs well, but occasionally goes into a loop. How can I get a stack trace of running threads to figure out where the loop is? Is there some equivalent of sending SIGQUIT to Java to get a thread dump? If needed, I can import pdb and set options at startup, but there needs to be some external way of triggering the dump since I can't reproduce it at will. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to document Python code...
On Mon, 22 Jan 2007 20:40:57 +, Adonis Vargas wrote: > But a quick look at pydoc (not to be confused with epydoc) > which is part of the standard library allows you to generate > documentation in HTML format, and/or serve it over web with its built-in > HTTP server. > > pydoc: http://docs.python.org/lib/module-pydoc.html > > Hope this helps. > > Adonis The HTML generated by pydoc doesn't link to standard modules properly. They are generated as relative links. So it can't be used without modification for generating docs for a web page about a python package. I'm struggling with the same issue. Coding Python is so much easier than Java. However documenting Java is so much easier than Python. Just include doc comments, run javadoc, and voila! -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to document Python code...
On Mon, 22 Jan 2007 17:35:18 -0500, Stuart D. Gathman wrote: > The HTML generated by pydoc doesn't link to standard modules properly. > They are generated as relative links. So it can't be used without > modification for generating docs for a web page about a python package. > > I'm struggling with the same issue. Coding Python is so much easier than > Java. However documenting Java is so much easier than Python. Just > include doc comments, run javadoc, and voila! Wow! I just tried epydoc, and it is every bit as easy as javadoc and with similar output. Too bad it isn't standard. But the comments and docstrings it parses work fine with pydoc also. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
doctest problem with null byte
I am trying to create a doctest test case for the following: def quote_value(s): """Quote the value for a key-value pair in Received-SPF header field if needed. No quoting needed for a dot-atom value. >>> quote_value(r'abc\def') '"abcdef"' >>> quote_value('abc..def') '"abc..def"' >>> quote_value('') '""' >>> quote_value('-all\x00') '"-allx00"' ... However, doctest does *not* like the null byte in the example (yes, this happens with real life input): ** File "/home/stuart/pyspf/spf.py", line 1453, in spf.quote_value Failed example: quote_value('-all') Exception raised: Traceback (most recent call last): File "/var/tmp/python2.4-2.4.4c1-root/usr/lib/python2.4/doctest.py", line 1248, in __run compileflags, 1) in test.globs TypeError: compile() expected string without null bytes ** How can I construct a test cast for this? -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Debugging SocketServer.ThreadingTCPServer
On Tue, 16 Jan 2007 09:11:38 -0500, Jean-Paul Calderone wrote: > On Tue, 16 Jan 2007 00:23:35 -0500, "Stuart D. Gathman" > <[EMAIL PROTECTED]> wrote: >>I have a ThreadingTCPServer application (pygossip, part of >>http://sourceforge.net/projects/pymilter). It mostly runs well, but >>occasionally goes into a loop. How can I get a stack trace of running >>threads to figure out where the loop is? Is there some equivalent of >>sending SIGQUIT to Java to get a thread dump? If needed, I can import >>pdb and set options at startup, but there needs to be some external way >>of triggering the dump since I can't reproduce it at will. > > Grab the gdbinit out of Python SVN Misc/ directory. Apply this patch: > > http://jcalderone.livejournal.com/28224.html > > Attach to the process using gdb. Make sure you have debugging symbols > in your build of Python. Run 'thread apply all pystack'. Did this. gdb displays main thread fine (waiting on accept(), duh). But gdb goes into a loop displaying the first worker thread. There are no extension modules other than the batteries included ones. In this application, I believe, only _socket. (I.e. a pure python server.) I will try for a C stack trace next time it loops. Also, the looping server needs kill -9. SIGTERM and SIGINT won't stop it. And after it dies with SIGKILL, the port is still in use for 5 minutes or so (and the server cannot be restarted). This is really making me appreciate Java. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Vim scripting with python
On Sat, 03 Feb 2007 05:02:54 -0800, Tool69 wrote: > Does anyone have any advice, and more genraly how to script Vim with > Python ? :py import sys :py print sys.version :help :py > I know I can put some python functions inside my vimrc file like > this : > > function! My_function() > python << EOF > import vim, string > ...blablabla > EOF > endfunction > > but I would like to use external ".py" files. :py import myfile Use :py inside your vimrc - don't run python externally. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is Python Right for Me?
On Fri, 02 Feb 2007 15:09:20 -0500, Mister Newbie wrote: > I want to make small, 2D games. I have no programming experience. Is > Python a good choice? Definitely. I teach a class for 7th to 12th grade where I use this tutorial to introduce programming: http://www.livewires.org.uk/python/ http://www.livewires.org.uk/python/pdfsheets.html As an adult, just skip rapidly through the elementary material. The final module (Games sheets) walks you through creating 3 2D games with pygame! -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list