Re: [RELEASED] Python 2.7.5
On 5/15/2013 9:19 PM, Benjamin Peterson wrote: > It is my greatest pleasure to announce the release of Python 2.7.5. > > 2.7.5 is the latest maintenance release in the Python 2.7 series. Thanks very much. It's important that Python 2.x be maintained. 3.x is a different language, with different libraries, and lots of things that still don't work. Many old applications will never be converted. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Why has python3 been created as a seperate language where there is still python2.7 ?
On 6/25/2012 1:36 AM, Stefan Behnel wrote: gmspro, 24.06.2012 05:46: Why has python3 been created as a seperate language where there is still python2.7 ? The intention of Py3 was to deliberately break backwards compatibility in order to clean up the language. The situation is not as bad as you seem to think, a huge amount of packages have been ported to Python 3 already and/or work happily with both language dialects. The syntax changes in Python 3 are a minor issue for serious programmers. The big headaches come from packages that aren't being ported to Python 3 at all. In some cases, there's a replacement package from another author that performs the same function, but has a different API. Switching packages involves debugging some new package with, probably, one developer and a tiny user community. The Python 3 to MySQL connection is still a mess. The original developer of MySQLdb doesn't want to support Python 3. There's "pymysql", but it hasn't been updated since 2010 and has a long list of unfixed bugs. There was a "MySQL-python-1.2.3-py3k" port by a third party, but the domain that hosted it ("http://www.elecmor.mooo.com/python/MySQL-python-1.2.3-py3k.zip";) is dead. There's MySQL for Python 3 (https://github.com/davispuh/MySQL-for-Python-3) but it doesn't work on Windows. MySQL Connector (https://code.launchpad.net/myconnpy) hasn't been updated in a while, but at least has some users. OurSQL has a different API than MySQLdb, and isn't quite ready for prime time yet. That's why I'm still on Python 2.7. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: PySerial could not open port COM4: [Error 5] Access is denied - please help
On 6/26/2012 9:12 PM, Adam wrote: Host OS:Ubuntu 10.04 LTS Guest OS:Windows XP Pro SP3 I am able to open port COM4 with Terminal emulator. So, what can cause PySerial to generate the following error ... C:\Wattcher>python wattcher.py Traceback (most recent call last): File "wattcher.py", line 56, in ser.open() File "C:\Python25\Lib\site-packages\serial\serialwin32.py", line 56, in open raise SerialException("could not open port %s: %s" % (self.portstr, ctypes.WinError())) serial.serialutil.SerialException: could not open port COM4: [Error 5] Access is denied. Are you trying to access serial ports from a virtual machine? Which virtual machine environment? Xen? VMware? QEmu? VirtualBox? I wouldn't expect that to work in most of those. What is "COM4", anyway? Few machines today actually have four serial ports. Is some device emulating a serial port? John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: when "normal" parallel computations in CPython will be implemented at last?
On 7/1/2012 10:51 AM, dmitrey wrote: hi all, are there any information about upcoming availability of parallel computations in CPython without modules like multiprocessing? I mean something like parallel "for" loops, or, at least, something without forking with copying huge amounts of RAM each time and possibility to involve unpiclable data (vfork would be ok, but AFAIK it doesn't work with CPython due to GIL). AFAIK in PyPy some progress have been done ( http://morepypy.blogspot.com/2012/06/stm-with-threads.html ) Thank you in advance, D. It would be "un-Pythonic" to have real concurrency in Python. You wouldn't be able to patch code running in one thread from another thread. Some of the dynamic features of Python would break. If you want fine-grained concurrency, you need controlled isolation between concurrent tasks, so they interact only at well-defined points. That's un-Pythonic. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: simpler increment of time values?
On 7/4/2012 5:29 PM, Vlastimil Brom wrote: Hi all, I'd like to ask about the possibilities to do some basic manipulation on timestamps - such as incrementing a given time (hour.minute - string) by some minutes. Very basic notion of "time" is assumed, i.e. dateless, timezone-unaware, DST-less etc. I first thought, it would be possible to just add a timedelta to a time object, but, it doesn't seem to be the case. That's correct. A datetime.time object is a time within a day. A datetime.date object is a date without a time. A datetime.datetime object contains both. You can add a datetime.timedelta object to a datetime.datetime object, which will yield a datetime.datetime object. You can also call time.time(), and get the number of seconds since the epoch (usually 1970-01-01 00:00:00 UTC). That's just a number, and you can do arithmetic on that. Adding a datetime.time to a datetime.timedelta isn't that useful. It would have to return a value error if the result crossed a day boundary. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Socket code not executing properly in a thread (Windows)
On 7/8/2012 3:55 AM, Andrew D'Angelo wrote: Hi, I've been writing an IRC chatbot that an relay messages it receives as an SMS. We have no idea what IRC module you're using. As it stands, I can retrieve and parse SMSs from Google Voice perfectly The Google Voice code you have probably won't work once you have enough messages stored that Google Voice returns them on multiple pages. You have to read all the pages. If there's any significant amount of traffic, the completed messages have to be moved or deleted, or each polling cycle returns more data than the last one. Google Voice isn't a very good SMS gateway. I used to use it, but switched to Twilio (which costs, but works) two years ago. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: How to safely maintain a status file
On 7/8/2012 2:52 PM, Christian Heimes wrote: You are contradicting yourself. Either the OS is providing a fully atomic rename or it doesn't. All POSIX compatible OS provide an atomic rename functionality that renames the file atomically or fails without loosing the target side. On POSIX OS it doesn't matter if the target exists. Rename on some file system types (particularly NFS) may not be atomic. You don't need locks or any other fancy stuff. You just need to make sure that you flush the data and metadata correctly to the disk and force a re-write of the directory inode, too. It's a standard pattern on POSIX platforms and well documented in e.g. the maildir RFC. You can use the same pattern on Windows but it doesn't work as good. That's because you're using the wrong approach. See how to use ReplaceFile under Win32: http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx Renaming files is the wrong way to synchronize a crawler. Use a database that has ACID properties, such as SQLite. Far fewer I/O operations are required for small updates. It's not the 1980s any more. I use a MySQL database to synchronize multiple processes which crawl web sites. The tables of past activity are InnoDB tables, which support transactions. The table of what's going on right now is a MEMORY table. If the database crashes, the past activity is recovered cleanly, the MEMORY table comes back empty, and all the crawler processes lose their database connections, abort, and are restarted. This allows multiple servers to coordinate through one database. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Implicit conversion to boolean in if and while statements
On 7/15/2012 1:34 AM, Andrew Berg wrote: This has probably been discussed before, but why is there an implicit conversion to a boolean in if and while statements? if not None: print('hi') prints 'hi' since bool(None) is False. If this was discussed in a PEP, I would like a link to it. There are so many PEPs, and I wouldn't know which ones to look through. Converting 0 and 1 to False and True seems reasonable, but I don't see the point in converting other arbitrary values. Because Boolean types were an afterthought in Python. See PEP 285. If a language starts out with a Boolean type, it tends towards Pascal/Ada/Java semantics in this area. If a language backs into needing a Boolean type, as Python and C did, it tends to have the somewhat weird semantics of a language which can't quite decide what's a Boolean. C and C++ have the same problem, for exactly the same reason - boolean types were an afterthought there, too. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: On-topic: alternate Python implementations
On 8/4/2012 7:19 PM, Steven D'Aprano wrote: > On Sat, 04 Aug 2012 18:38:33 -0700, Paul Rubin wrote: > >> Steven D'Aprano writes: >>> Runtime optimizations that target the common case, but fall back to >>> unoptimized code in the rare cases that the optimization doesn't apply, >>> offer the opportunity of big speedups for most code at the cost of >>> trivial slowdowns when you do something unusual. >> >> The problem is you can't always tell if the unusual case is being >> exercised without an expensive dynamic check, which in some cases must >> be repeated in every iteration of a critical inner loop, even though it >> turns out that the program never actually uses the unusual case. There are other approaches. PyPy uses two interpreters and a JIT compiler to handle the hard cases. When code does something unexpected to other code, the backup interpreter is used to get control out of the trouble spot so that the JIT compiler can then recompile the code. (I think; I've read the paper but haven't looked at the internals.) This is hard to implement and hard to get right. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: python 6 compilation failure on RHEL
On 8/20/2012 2:50 PM, Emile van Sebille wrote: > On 8/20/2012 1:55 PM Walter Hurry said... >> On Mon, 20 Aug 2012 12:19:23 -0700, Emile van Sebille wrote: >> >>> Package dependencies. If the OP intends to install a package that >>> doesn't support other than 2.6, you install 2.6. >> >> It would be a pretty poor third party package which specified Python 2.6 >> exactly, rather than (say) "Python 2.6 or later, but not Python 3" After a thread of clueless replies, it's clear that nobody responding actually read the build log. Here's the problem: Failed to find the necessary bits to build these modules: bsddb185 dl imageop sunaudiodev What's wrong is that the Python 2.6 build script is looking for some antiquated packages that aren't in a current RHEL. Those need to be turned off. This is a known problem (see http://pythonstarter.blogspot.com/2010/08/bsddb185-sunaudiodev-python-26-ubuntu.html) but, unfortunately, the site with the patch for it (http://www.lysium.de/sw/python2.6-disable-old-modules.patch) is no longer in existence. But someone archived it on Google Code, at http://code.google.com/p/google-earth-enterprise-compliance/source/browse/trunk/googleclient/geo/earth_enterprise/src/third_party/python/python2.6-disable-old-modules.patch so if you apply that patch to the setup.py file for Python 2.6, that ought to help. You might be better off building Python 2.7, but you asked about 2.6. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Parsing ISO date/time strings - where did the parser go?
In Python 2.7: I want to parse standard ISO date/time strings such as 2012-09-09T18:00:00-07:00 into Python "datetime" objects. The "datetime" object offers an output method , datetimeobj.isoformat(), but not an input parser. There ought to be classmethod datetime.fromisoformat(s) but there isn't. I'd like to avoid adding a dependency on a third party module like "dateutil". The "Working with time" section of the Python wiki is so ancient it predates "datetime", and says so. There's an iso8601 module on PyPi, but it's abandoned; it hasn't been updated since 2007 and has many outstanding issues. There are mentions of "xml.utils.iso8601.parse" in various places, but the "xml" module that comes with Python 2.7 doesn't have xml.utils. http://www.seehuhn.de/pages/pdate says: "Unfortunately there is no easy way to parse full ISO 8601 dates using the Python standard library." It looks like this was taken out of "xml" at some point, but not moved into "datetime". John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing ISO date/time strings - where did the parser go?
On 9/6/2012 12:51 PM, Paul Rubin wrote: > John Nagle writes: >> There's an iso8601 module on PyPi, but it's abandoned; it hasn't been >> updated since 2007 and has many outstanding issues. > > Hmm, I have some code that uses ISO date/time strings and just checked > to see how I did it, and it looks like it uses iso8601-0.1.4-py2.6.egg . > I don't remember downloading that module (I must have done it and > forgotten). I'm not sure what its outstanding issues are, as it works > ok in the limited way I use it. > > I agree that this functionality ought to be in the stdlib. Yes, it should. There's no shortage of implementations. PyPi has four. Each has some defect. PyPi offers: iso8601 0.1.4 Simple module to parse ISO 8601 dates iso8601.py 0.1dev Parse utilities for iso8601 encoding. iso8601plus 0.1.6 Simple module to parse ISO 8601 dates zc.iso8601 0.2.0ISO 8601 utility functions Unlike CPAN, PyPi has no quality control. Looking at the first one, it's in Google Code. http://code.google.com/p/pyiso8601/source/browse/trunk/iso8601/iso8601.py The first bug is at line 67. For a timestamp with a "Z" at the end, the offset should always be zero, regardless of the default timezone. See "http://en.wikipedia.org/wiki/ISO_8601";. The code uses the default time zone in that case, which is wrong. So don't call that code with your local time zone as the default; it will return bad times. Looking at the second one, it's on github: https://github.com/accellion/iso8601.py/blob/master/iso8601.py Giant regular expressions! The code to handle the offset is present, but it doesn't make the datetime object a timezone-aware object. It returns a naive object in UTC. The third one is at https://github.com/jimklo/pyiso8601plus This is a fork of the first one, because the first one is abandonware. The bug in the first one, mentioned above, isn't fixed. However, if a time zone is present, it does return an "aware" datetime object. The fourth one is the Zope version. This brings in the pytz module, which brings in the Olsen database of named time zones and their historical conversion data. None of that information is used, or necessary, to parse ISO dates and times. Somebody just wanted the pytz.fixedOffset() function, which does something datetime already does. (For all the people who keep saying "use strptime", that doesn't handle time zone offsets at all.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing ISO date/time strings - where did the parser go?
On 9/8/2012 5:20 PM, John Gleeson wrote: > > On 2012-09-06, at 2:34 PM, John Nagle wrote: >> Yes, it should. There's no shortage of implementations. >> PyPi has four. Each has some defect. >> >> PyPi offers: >> >> iso8601 0.1.4 Simple module to parse ISO 8601 dates >> iso8601.py 0.1dev Parse utilities for iso8601 encoding. >> iso8601plus 0.1.6 Simple module to parse ISO 8601 dates >> zc.iso8601 0.2.0 ISO 8601 utility functions > > > Here are three more on PyPI you can try: > > iso-8601 0.2.3 Flexible ISO 8601 parser... > PySO8601 0.1.7 PySO8601 aims to parse any ISO 8601 date... > isodate 0.4.8An ISO 8601 date/time/duration parser and formater > > All three have been updated this year. There's another one inside feedparser, and there used to be one in the xml module. Filed issue 15873: "datetime" cannot parse ISO 8601 dates and times http://bugs.python.org/issue15873 This really should be handled in the standard library, instead of everybody rolling their own, badly. Especially since in Python 3.x, there's finally a useful "tzinfo" subclass for fixed time zone offsets. That provides a way to directly represent ISO 8601 date/time strings with offsets as "time zone aware" date time objects. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: search google with python
On 1/25/2012 8:38 AM, Jerry Hill wrote: On Wed, Jan 25, 2012 at 5:36 AM, Tracubik wrote: thanks a lot but it say it's deprecated, is there a replacement? Anyway it'll useful for me to study json, thanks :) I don't believe Google is particularly supportive of allowing third-parties (like us) to use their search infrastructure. All of the search-related APIs they used to provide are slowly going away and not being replaced, as far as I can tell. True. The Google SOAP API disappeared years ago. The AJAX search widget was very restrictive, and is now on end of life (no new users). "Google Custom Search" only lets you search specific sites. The Bing API comes with limitations on what you can do with the results. The Yahoo search API went away, replaced by the Yahoo BOSS API. Then that was replaced by a pay-per-search interface. Bleeko has an API, but you have to ask to use it. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Where to put data
On 1/25/2012 9:26 AM, bvdp wrote: I'm having a disagreement with a buddy on the packaging of a program we're doing in Python. It's got a number of modules and large number of library files. The library stuff is data, not code. How much data? Megabytes? Gigabytes? I have some modules which contain nothing but big constants, written by a program in Python format. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Killing threads, and os.system()
On 1/31/2012 8:04 AM, Dennis Lee Bieber wrote: ({muse: who do we have to kill to persuade OS designers to incorporate something like the Amiga ARexx "rexxport" system}). QNX, which is a real-time microkernel which looks like POSIX to applications. actually got interprocess communication right. It has to; everything in QNX is done by interprocess communication, including all I/O. File systems and drivers are ordinary programs. The kernel just handles message passing, CPU dispatching, and timers. QNX's message passing looks more like a subroutine call than an I/O operation, and this has important implications for efficient CPU dispatching. Any QNX system call that can block is really a message pass. Message passes can be given a timeout, and they can be canceled from another thread. The "system call" then returns with an error status. This provides a way to keep threads from getting "stuck" in a system call. (Unfortunately, QNX, which survived as a separate company for decades, sold out to Harmon (car audio) a few years ago. They had no clue what to do with an OS. They sold it to Research In Motion, the Blackberry company, which is in the process of tanking.) Python's thread model is unusually dumb. You can't send signals to other threads, you can't force an exception in another thread, and I won't even get into the appalling mess around the Global Interpreter Lock. This has forced the use of subprocesses where, in other languages, you'd use threads. Of course, you load a new copy of the interpreter in each thread, so this bloats memory usage. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: MySQLdb not allowing hyphen
On 2/5/2012 2:46 PM, Chris Rebert wrote: On Sun, Feb 5, 2012 at 2:41 PM, Emeka wrote: Hello All, I noticed that MySQLdb not allowing hyphen may be way to prevent injection attack. I have something like below: "insert into reviews(message, title)values('%s', '%s')" %( "We don't know where to go","We can't wait till morrow" ) ProgrammingError(1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't know where to go. How do I work around this error? Don't use raw SQL strings in the first place. Use a proper parameterized query, e.g.: cursor.execute("insert into reviews(message, title) values (%s, %s)", ("We don't know where to go", "We can't wait till morrow")) Yes. You are doing it wrong. Do NOT use the "%" operator when putting SQL queries together. Let "cursor.execute" fill them in. It knows how to escape special characters in the input fields, which will fix your bug and prevent SQL injection. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: changing sys.path
On 2/1/2012 8:15 AM, Andrea Crotti wrote: So suppose I want to modify the sys.path on the fly before running some code which imports from one of the modules added. at run time I do sys.path.extend(paths_to_add) but it still doesn't work and I get an import error. Do import sys first. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Common LISP-style closures with Python
On 2/3/2012 4:27 PM, Antti J Ylikoski wrote: In Python textbooks that I have read, it is usually not mentioned that we can very easily program Common LISP-style closures with Python. It is done as follows: Most dynamic languages have closures. Even Perl and Javascript have closures. Javascript really needs them, because the "callback" orientation of Javascript means you often need to package up state and pass it into a callback. It really has very little to do with functional programming. If you want to see a different style of closure, check out Rust, Mozilla's new language. Rust doesn't have the "spaghetti stack" needed to implement closures, so it has more limited closure semantics. It's more like some of the C add-ons for closures, but sounder. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: frozendict
On 2/10/2012 10:14 AM, Nathan Rice wrote: Lets also not forget that knowing an object is immutable lets you do a lot of optimizations; it can be inlined, it is safe to convert to a contiguous block of memory and stuff in cache, etc. If you know the input to a function is guaranteed to be frozen you can just go crazy. Being able to freeze(anyobject) seems like a pretty clear win. Whether or not it is pythonic is debatable. I'd argue if the meaning of pythonic in some context is limiting, we should consider updating the term rather than being dogmatic. A real justification for the ability to make anything immutable is to make it safely shareable between threads. If it's immutable, it doesn't have to be locked for access. Mozilla's new "Rust" language takes advantage of this. Take a look at Rust's concurrency semantics. They've made some progress. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: frozendict
On 2/10/2012 9:52 PM, 8 Dihedral wrote: 在 2012年2月11日星期六UTC+8上午2时57分34秒,John Nagle写道: On 2/10/2012 10:14 AM, Nathan Rice wrote: Lets also not forget that knowing an object is immutable lets you do a lot of optimizations; it can be inlined, it is safe to convert to a contiguous block of memory and stuff in cache, etc. If you know the input to a function is guaranteed to be frozen you can just go crazy. Being able to freeze(anyobject) seems like a pretty clear win. Whether or not it is pythonic is debatable. I'd argue if the meaning of pythonic in some context is limiting, we should consider updating the term rather than being dogmatic. A real justification for the ability to make anything immutable is to make it safely shareable between threads. If it's immutable, it doesn't have to be locked for access. Mozilla's new "Rust" language takes advantage of this. Take a look at Rust's concurrency semantics. They've made some progress. John Nagl Lets model the system as an asynchronous set of objects with multiple threads performing operatons on objects as in the above. I'd argue for a concurrency system where everything is either immutable, unshared, synchronized, or owned by a synchronized object. This eliminates almost all explicit locking. Python's use of immutability has potential in that direction, but Python doesn't do anything with that concept. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Looking for PyPi 2.0...
On 2/8/2012 9:47 AM, Chris Rebert wrote: On Wed, Feb 8, 2012 at 8:54 AM, Nathan Rice wrote: As a user: * Finding the right module in PyPi is a pain because there is limited, low quality semantic information, and there is no code indexing. CPAN does it right. They host the code. (PyPi is just a collection of links). They have packaging standards (PyPi does not.) CPAN tends not to be full of low-quality modules that do roughly the same thing. If you want to find a Python module, Google is more useful than PyPi. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Script randomly exits for seemingly no reason with strange traceback
On 2/4/2012 12:43 PM, Chris Angelico wrote: On Sun, Feb 5, 2012 at 3:32 AM, Andrew Berg wrote: On 2/3/2012 9:15 PM, Chris Angelico wrote: Do you call on potentially-buggy external modules? It imports one module that does little more than define a few simple functions. There's certainly no (intentional) interpreter hackery at work. Are you doing a conditional import, one that takes place after load time? If you do an import within a function or class, it is executed when the code around it executes. If you import a file with a syntax error during execution, you could get the error message you're getting. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
"Decoding unicode is not supported" in unusual situation
I'm getting line 79, in tounicode return(unicode(s, errors='replace')) TypeError: decoding Unicode is not supported from this, under Python 2.7: def tounicode(s) : if type(s) == unicode : return(s) return(unicode(s, errors='replace')) That would seem to be impossible. But it's not. "s" is generated from the "suds" SOAP client. The documentation for "suds" says: "Suds leverages python meta programming to provide an intuative API for consuming web services. Runtime objectification of types defined in the WSDL is provided without class generation." I think that somewhere in "suds", they subclass the "unicode" type. That's almost too cute. The proper test is isinstance(s,unicode) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: "Decoding unicode is not supported" in unusual situation
On 3/7/2012 3:42 AM, Steven D'Aprano wrote: I *think* he is complaining that some other library -- suds? -- has a broken test for Unicode, by using: if type(s) is unicode: ... instead of if isinstance(s, unicode): ... Consequently, when the library passes a unicode *subclass* to the tounicode function, the "type() is unicode" test fails. That's a bad bug. No, that was my bug. The library bug, if any, is that you can't apply unicode(s, errors='replace') to a Unicode string. TypeError("Decoding unicode is not supported") is raised. However unicode(s) will accept Unicode input. The Python documentation ("http://docs.python.org/library/functions.html#unicode";) does not mention this. It is therefore necessary to check the type before calling "unicode", or catch the undocumented TypeError exception afterward. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: "Decoding unicode is not supported" in unusual situation
On 3/7/2012 6:18 PM, Ben Finney wrote: Steven D'Aprano writes: On Thu, 08 Mar 2012 08:48:58 +1100, Ben Finney wrote: I think that's a Python bug. If the latter succeeds as a no-op, the former should also succeed as a no-op. Neither should ever get any errors when ‘s’ is a ‘unicode’ object already. No. The semantics of the unicode function (technically: a type constructor) are well-defined, and there are two distinct behaviours: Right. The real problem is that Python 2.7 doesn't have distinct "str" and "bytes" types. type(bytes() returns "str" is assumed to be ASCII 0..127, but that's not enforced. "bytes" and "str" should have been distinct types, but that would have broken much old code. If they were distinct, then constructors could distinguish between string type conversion (which requires no encoding information) and byte stream decoding. So it's possible to get junk characters in a "str", and they won't convert to Unicode. I've had this happen with databases which were supposed to be ASCII, but occasionally a non-ASCII character would slip through. This is all different in Python 3.x, where "str" is Unicode and "bytes" really are a distinct type. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
PyUSB available for current versions of Windows?
I want to enumerate the available USB devices. All I really need is the serial number of the USB devices available to PySerial. (When you plug in a USB device on Windows, it's assigned the next available COM port number. On a reboot, the numbers are reassigned. So if you have multiple USB serial ports, there's a problem.) PyUSB can supposedly do this, but the documentation is misleading. It makes a big point of being "100% Python", but that's because it's just glue code to a platform-specific "back end" provided by someone else. There's an old Windows back-end at "http://www.craftedge.com/products/libusb.html";, but it was written for Windows XP, and can supposedly be run in "compatibility mode" on Windows Vista. Current versions of Windows, who knows? It's not open source, and it comes from someone who sells paper-cutting machines for crafters. There's another Windows back end at https://sourceforge.net/apps/trac/libusb-win32/wiki but it involves installing a low-level driver in Windows. I especially like the instruction "Close all applications which use USB devices before installing." Does this include the keyboard and mouse? They also warn "The device driver can not be easily removed from the system." John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: "Decoding unicode is not supported" in unusual situation
On 3/8/2012 2:58 PM, Prasad, Ramit wrote: Right. The real problem is that Python 2.7 doesn't have distinct "str" and "bytes" types. type(bytes() returns "str" is assumed to be ASCII 0..127, but that's not enforced. "bytes" and "str" should have been distinct types, but that would have broken much old code. If they were distinct, then constructors could distinguish between string type conversion (which requires no encoding information) and byte stream decoding. So it's possible to get junk characters in a "str", and they won't convert to Unicode. I've had this happen with databases which were supposed to be ASCII, but occasionally a non-ASCII character would slip through. bytes and str are just aliases for each other. That's true in Python 2.7, but not in 3.x. From 2.6 forward, "bytes" and "str" were slowly being separated. See PEP 358. Some of the problems in Python 2.7 come from this ambiguity. Logically, "unicode" of "str" should be a simple type conversion from ASCII to Unicode, while "unicode" of "bytes" should require an encoding. But because of the bytes/str ambiguity in Python 2.6/2.7, the behavior couldn't be type-based. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: "Decoding unicode is not supported" in unusual situation
On 3/9/2012 4:57 PM, Steven D'Aprano wrote: On Fri, 09 Mar 2012 10:11:58 -0800, John Nagle wrote: This demonstrates a gross confusion about both Unicode and Python. John, I honestly don't mean to be rude here, but if you actually believe that (rather than merely expressing yourself poorly), then it seems to me that you are desperately misinformed about Unicode and are working on the basis of some serious misapprehensions about the nature of strings. In Python 2.6/2.7, there is no ambiguity between str/bytes. The two names are aliases for each other. The older name, "str", is a misnomer, since it *actually* refers to bytes (and always has, all the way back to the earliest days of Python). At best, it could be read as "byte string" or "8-bit string", but the emphasis should always be on the *bytes*. There's an inherent ambiguity in that "bytes" and "str" are really the same type in Python 2.6/2.7. That's a hack for backwards compatibility, and it goes away in 3.x. The notes for PEP 358 admit this. It's implicit in allowing unicode(s) with no encoding, on type "str", that there is an implicit assumption that s is ASCII. Arguably, "unicode()" should have required an encoding in all cases. Or "str" and "bytes" should have been made separate types in Python 2.7, in which case unicode() of a str would be a safe ASCII to Unicode translation, and unicode() of a bytes object would require an encoding. But that would break too much old code. So we have an ambiguity and a hack. "While Python 2 also has a unicode string type, the fundamental ambiguity of the core string type, coupled with Python 2's default behavior of supporting automatic coercion from 8-bit strings to unicode objects when the two are combined, often leads to UnicodeErrors" - PEP 404 John Nagle -- http://mail.python.org/mailman/listinfo/python-list
html5lib not thread safe. Is the Python SAX library thread-safe?
"html5lib" is apparently not thread safe. (see "http://code.google.com/p/html5lib/issues/detail?id=189";) Looking at the code, I've only found about three problems. They're all the usual "cached in a global without locking" bug. A few locks would fix that. But html5lib calls the XML SAX parser. Is that thread-safe? Or is there more trouble down at the bottom? (I run a multi-threaded web crawler, and currently use BeautifulSoup, which is thread safe, although dated. I'm looking at converting to html5lib.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: html5lib not thread safe. Is the Python SAX library thread-safe?
On 3/11/2012 2:45 PM, Cameron Simpson wrote: On 11Mar2012 13:30, John Nagle wrote: | "html5lib" is apparently not thread safe. | (see "http://code.google.com/p/html5lib/issues/detail?id=189";) | Looking at the code, I've only found about three problems. | They're all the usual "cached in a global without locking" bug. | A few locks would fix that. | | But html5lib calls the XML SAX parser. Is that thread-safe? | Or is there more trouble down at the bottom? | | (I run a multi-threaded web crawler, and currently use BeautifulSoup, | which is thread safe, although dated. I'm looking at converting to | html5lib.) IIRC, BeautifulSoup4 may do that for you: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ http://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser "Beautiful Soup 4 uses html.parser by default, but you can plug in lxml or html5lib and use that instead." I want to use HTML5 standard parsing of bad HTML. (HTML5 formally defines how to parse bad comments, for example.) I currently have a modified version of BeautifulSoup that's more robust than the standard one, but it doesn't handle errors the same way browsers do. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: html5lib not thread safe. Is the Python SAX library thread-safe?
On 3/12/2012 3:05 AM, Stefan Behnel wrote: John Nagle, 11.03.2012 21:30: "html5lib" is apparently not thread safe. (see "http://code.google.com/p/html5lib/issues/detail?id=189";) Looking at the code, I've only found about three problems. They're all the usual "cached in a global without locking" bug. A few locks would fix that. But html5lib calls the XML SAX parser. Is that thread-safe? Or is there more trouble down at the bottom? (I run a multi-threaded web crawler, and currently use BeautifulSoup, which is thread safe, although dated. I'm looking at converting to html5lib.) You may also consider moving to lxml. BeautifulSoup supports it as a parser backend these days, so you wouldn't even have to rewrite your code to use it. And performance-wise, well ... http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ Stefan I want to move to html5lib because it handles HTML errors as specified by the HTML5 spec, which is what all newer browsers do. The HTML5 spec actually specifies, in great detail, how to parse common errors in HTML. It's amusing seeing that formalized. Malformed comments ( <- instead of <-- ) are now handled in a standard way, for example. So I'm trying to get html5parser fixed for thread safety. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: are int, float, long, double, side-effects of computer engineering?
On 3/7/2012 2:02 PM, Russ P. wrote: On Mar 6, 7:25 pm, rusi wrote: On Mar 6, 6:11 am, Xah Lee wrote: I might add that Mathematica is designed mainly for symbolic computation, whereas IEEE floating point numbers are intended for numerical computation. Those are two very different endeavors. I played with Mathematica a bit several years ago, and I know it can do numerical computation too. I wonder if it resorts to IEEE floating point numbers when it does. Mathematica has, for some computations, algorithms to determine the precision of results. This is different than trying to do infinite precision arithmetic, which doesn't help as soon as you get to trig functions. It's about bounding the error. It's possible to do bounded arithmetic, where you carry along an upper and lower bound on each number. The problem is what to do about comparisons. Comparisons between bounded numbers are ambiguous when the ranges overlap. Algorithms have to be designed to deal with that. Mathematica has such algorithms for some operations, especially numerical integration. It's a very real issue. I had to deal with this when I was writing the first "ragdoll physics" system that worked right, back in the 1990s. Everybody else's system blew up on the hard cases; mine just slowed down. Correct integration over a force function that's changing over 18 orders of magnitude is difficult, but quite possible. (Here it is, from 1997: "http://www.youtube.com/watch?v=5lHqEwk7YHs";) (A test with a heavy object: "http://www.youtube.com/watch?v=-DaWIHc1VLY";. Most physics engines don't do heavy objects well. Everything looks too light. We call this the "boink problem.") John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Enchancement suggestion for argparse: intuit type from default
On 3/13/2012 2:08 PM, Roy Smith wrote: Using argparse, if I write: parser.add_argument('--foo', default=100) it seems like it should be able to intuit that the type of foo should be int (i.e. type(default)) without my having to write: parser.add_argument('--foo', type=int, default=100) Does this seem like a reasonable enhancement to argparse? default=None presents some problems. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Does anyone actually use PyPy in production?
Does anyone run PyPy in production? John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming D. E. Knuth in Python with the Deterministic Finite Automaton construct
On 3/17/2012 9:31 AM, Antti J Ylikoski wrote: On 17.3.2012 17:47, Roy Smith wrote: In article, Antti J Ylikoski wrote: I came across the problem, which would be the clearest way to program such algorithms with a programming language such as Python, which has no GOTO statement. Oh, my, I can't even begin to get my head around all the nested conditionals. And that for a nearly trivial machine with only 5 states. Down this path lies madness. Right. Few programs should be written as state machines. As a means of rewriting Knuth's algorithms, it's inappropriate. Some should. LALR(1) parsers, such as what YACC and Bison generate, are state machines. They're huge collections of nested switch statements. Python doesn't have a "switch" or "case" statement. Which is surprising, for a language that loves dictionary lookups. You can create a dict full of function names and lambdas, but it's clunky looking. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib.urlretrieve never returns???
On 3/17/2012 9:34 AM, Chris Angelico wrote: 2012/3/18 Laszlo Nagy: In the later case, "log.txt" only contains "#1" and nothing else. If I look at pythonw.exe from task manager, then its shows +1 thread every time I click the button, and "#1" is appended to the file. Does it fail to retrieve on all URLs, or only on some of them? Running a web crawler, I've seen some pathological cases. There are a very few sites that emit data very, very slowly, but don't time out because they are making progress. There are also some sites where attempting to negotiate a SSL connection results in the SSL protocol reaching a point where the host end is supposed to finish the handshake, but it doesn't. The odds are against this being the problem. I see problems like that in maybe 1 in 100,000 URLs. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Fetching data from a HTML file
On 3/23/2012 10:12 PM, Jon Clements wrote: ROBOT Framework Would people please stop using robotic names for things that aren't robots? Thank you. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: "convert" string to bytes without changing data (encoding)
On 3/28/2012 10:43 AM, Peter Daum wrote: On 2012-03-28 12:42, Heiko Wundram wrote: Am 28.03.2012 11:43, schrieb Peter Daum: The longer story of my question is: I am new to python (obviously), and since I am not familiar with either one, I thought it would be advisory to go for python 3.x. The biggest problem that I am facing is, that I am often dealing with data, that is basically text, but it can contain 8-bit bytes. In this case, I can not safely assume any given encoding, but I actually also don't need to know - for my purposes, it would be perfectly good enough to deal with the ascii portions and keep anything else unchanged. So why let the data get into a "str" type at all? Do everything end to end with "bytes" or "bytearray" types. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Will MySQL ever be supported for Python 3.x?
The MySQLdb entry on SourceForge (http://sourceforge.net/projects/mysql-python/) web site still says the last supported version of Python is 2.6. PyPi says the last supported version is Python 2.5. The last download is from 2007. I realize there are unsupported fourth-party versions from other sources. (http://www.lfd.uci.edu/~gohlke/pythonlibs/) But those are just blind builds; they haven't been debugged. MySQL Connector (http://forge.mysql.com/projects/project.php?id=302) is still pre-alpha. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Will MySQL ever be supported for Python 3.x?
On 3/30/2012 2:32 PM, Irmen de Jong wrote: Try Oursql instead http://packages.python.org/oursql/ "oursql is a new set of MySQL bindings for python 2.4+, including python 3.x" Not even close to being compatible with existing code. Every SQL statement has to be rewritten, with the parameters expressed differently. It's a good approach, but very incompatible. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
Some versions of CentOS 6 seem to have a potential getaddrinfo exploit. See To test, try this from a command line: ping example If it fails, good. If it returns pings from "example.com", bad. The getaddrinfo code is adding ".com" to the domain. If that returns pings, please try ping noexample.com There is no "noexample.com" domain in DNS. This should time out. But if you get ping replies from a CNET site, let me know. Some implementations try "noexample.com", get a NXDOMAIN error, and try again, adding ".com". This results in a ping of "noexample.com,com". "com.com" is a real domain, run by a unit of CBS, and they have their DNS set up to catch all subdomains and divert them to, inevitably, an ad-oriented junk search page. (You can view the junk page at "http://slimeball.com.com";. Replace "slimeball" with anything else you like; it will still resolve.) If you find a case where "ping noexample.com" returns a reply, then try it in Python: import socket socket.getaddrinfo("noexample.com", 80) That should return an error. If it returns the IP address of CNET's ad server, there's trouble. This isn't a problem with the upstream DNS. Usually, this sort of thing means you're using some sleazy upstream DNS provider like Comcast. That's not the case here. "host" and "nslookup" aren't confused. Only programs that use getaddrinfo, like "ping", "wget", and Python, have this ".com" appending thing. Incidentally, if you try "noexample.net", there's no problem, because the owner of "net.com" hasn't set up their DNS to exploit this. And, of course, it has nothing to do with browser toolbars. This is at a much lower level. If you can make this happen, report back the CentOS version and the library version, please. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 3/31/2012 9:26 PM, Owen Jacobson wrote: On 2012-03-31 22:58:45 +, John Nagle said: Some versions of CentOS 6 seem to have a potential getaddrinfo exploit. See To test, try this from a command line: ping example If it fails, good. If it returns pings from "example.com", bad. The getaddrinfo code is adding ".com" to the domain. There is insufficient information in your diagnosis to make that conclusion. For example: what network configuration services (DHCP clients and whatnot, along with various desktop-mode configuration tools and services) are running? What kernel and libc versions are you running? What are the contents of /etc/nsswitch.conf? Of /etc/resolv.conf (particularly, the 'search' entries)? What do /etc/hosts, LDAP, NIS+, or other hostname services say about the names you're resolving? Does a freestanding C program that directly calls getaddrinfo and that runs in a known-good loader environment exhibit the same surprises? Name resolution is not so simple that you can conclude "getaddrinfo is misbehaving" from the behaviour of ping, or of your Python sample, alone. In any case, this seems more appropriate for a Linux or a CentOS newsgroup/mailing list than a Python one. Please do not reply to this post in comp.lang.python. -o I expected that some noob would have a reply like that. A more detailed discussion appears here: http://serverfault.com/questions/341383/possible-nxdomain-hijacking John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: [OT] getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 4/1/2012 9:26 AM, Michael Torrie wrote: On 03/31/2012 04:58 PM, John Nagle wrote: If you can make this happen, report back the CentOS version and the library version, please. CentOS release 6.2 (Final) glibc-2.12-1.47.el6_2.9.x86_64 example does not ping example.com does not resolve to example.com.com Removed all "search" and "domain" entries from /etc/resolve.conf It's a design bug in glibc. I just submitted a bug report. http://sourceware.org/bugzilla/show_bug.cgi?id=13935 It only appears if you have a machine with a two-component domain name ending in ".com" as the actual machine name. Most hosting services generate some long arbitrary name as the primary name, but I happen to have a server set up as "companyname.com". The default rule for looking up domains in glibc is that the "domain" is everything after the FIRST ".". Failed lookups are retried with that "domain" appended. The idea, back in the 1980s, was that if you're on "foo.bigcompany.com", and look up "bar", it's looked up as "bar.bigcompany.com". This idea backfires when the actual hostname only has two components, and the search just appends ".com". There is a "com.com" domain, and this gets them traffic. They exploit this to send you (where else) to an ad-heavy page. Try "python.com.com", for example,and you'll get an ad for a Java database. The workaround in Python is to add the AI_CANONNAME flag to getaddrinfo calls, then check that the returned domain name matches the one put in. Good case: >>> s = "python.org" >>> socket.getaddrinfo(s, 80, 0,0, 0, socket.AI_CANONNAME) [(2, 1, 6, 'python.org', ('82.94.164.162', 80)), (2, 2, 17, '', ('82.94.164.162', 80)), (2, 3, 0, '', ('82.94.164.162', 80)), (10, 1, 6, '', ('2001:888:2000:d::a2', 80, 0, 0)), (10, 2, 17, '', ('2001:888:2000:d::a2', 80, 0, 0)), (10, 3, 0, '', ('2001:888:2000:d::a2', 80, 0, 0))] Bad case: >>> s = "noexample.com" >>> socket.getaddrinfo(s, 80, 0,0, 0, socket.AI_CANONNAME) [(2, 1, 6, 'phx1-ss-2-lb.cnet.com', ('64.30.224.112', 80)), (2, 2, 17, '', ('64.30.224.112', 80)), (2, 3, 0, '', ('64.30.224.112', 80))] Note that what went in isn't what came back. getaddrinfo has been pwned. Again, you only get this if you're on a machine whose primary host name is "something.com", with exactly two components ending in ".com". John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Will MySQL ever be supported for Python 3.x?
On 3/31/2012 10:54 PM, Tim Roberts wrote: John Nagle wrote: On 3/30/2012 2:32 PM, Irmen de Jong wrote: Try Oursql instead http://packages.python.org/oursql/ "oursql is a new set of MySQL bindings for python 2.4+, including python 3.x" Not even close to being compatible with existing code. Every SQL statement has to be rewritten, with the parameters expressed differently. It's a good approach, but very incompatible. Those changes can be automated, given an adequate editor. "Oursql" is a far better product than the primitive MySQLdb wrapper. It is worth the trouble. It's an interesting approach. As it matures, and a few big sites use it. it will become worth looking at. The emphasis on server-side buffering seems strange. Are there benchmarks indicating this is worth doing? Does it keep transactions locked longer? This bug report https://answers.launchpad.net/oursql/+question/191256 indicates a performance problem. I'd expect server side buffering to slow things down. Usually, you want to drain results out of the server as fast as possible, then close out the command, releasing server resources and locks. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 4/1/2012 1:41 PM, John Nagle wrote: On 4/1/2012 9:26 AM, Michael Torrie wrote: On 03/31/2012 04:58 PM, John Nagle wrote: Removed all "search" and "domain" entries from /etc/resolve.conf It's a design bug in glibc. I just submitted a bug report. http://sourceware.org/bugzilla/show_bug.cgi?id=13935 It only appears if you have a machine with a two-component domain name ending in ".com" as the actual machine name. Most hosting services generate some long arbitrary name as the primary name, but I happen to have a server set up as "companyname.com". The default rule for looking up domains in glibc is that the "domain" is everything after the FIRST ".". Failed lookups are retried with that "domain" appended. The idea, back in the 1980s, was that if you're on "foo.bigcompany.com", and look up "bar", it's looked up as "bar.bigcompany.com". This idea backfires when the actual hostname only has two components, and the search just appends ".com". There is a "com.com" domain, and this gets them traffic. They exploit this to send you (where else) to an ad-heavy page. Try "python.com.com", for example,and you'll get an ad for a Java database. The workaround in Python is to add the AI_CANONNAME flag to getaddrinfo calls, then check that the returned domain name matches the one put in. That workaround won't work for some domains. For example, >>> socket.getaddrinfo(s,"http",0,0,socket.SOL_TCP,socket.AI_CANONNAME) [(2, 1, 6, 'orig-10005.themarker.cotcdn.net', ('208.93.137.80', 80))] Nor will addiing options to /etc/resolv.conf work well, because that file is overwritten by some system administration programs. I may have to bring in "dnspython" to get a reliable DNS lookup. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to structure data for efficient searching
On 3/28/2012 11:39 AM, larry.mart...@gmail.com wrote: I have the following use case: I have a set of data that is contains 3 fields, K1, K2 and a timestamp. There are duplicates in the data set, and they all have to processed. Then I have another set of data with 4 fields: K3, K4, K5, and a timestamp. There are also duplicates in that data set, and they also all have to be processed. I need to find all the items in the second data set where K1==K3 and K2==K4 and the 2 timestamps are within 20 seconds of each other. I have this working, but the way I did it seems very inefficient - I simply put the data in 2 arrays (as tuples) and then walked through the entire second data set once for each item in the first data set, looking for matches. Is there a better, more efficient way I could have done this? How big are the data sets? Millions of entries? Billions? Trillions? Will all the data fit in memory, or will this need files or a database. In-memory, it's not hard. First, decide which data set is smaller. That one gets a dictionary keyed by K1 or K3, with each entry being a list of tuples. Then go through the other data set linearly. You can also sort one database by K1, the other by K3, and match. Then take the matches, sort by K2 and K4, and match again. Sort the remaining matches by timestamp and pull the ones within the threshold. Or you can load all the data into a database with a query optimizer, like MySQL, and let it figure out, based on the index sizes, how to do the join. All of these approaches are roughly O(N log N), which beats the O(N^2) approach you have now. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit
On 4/2/2012 6:53 PM, John Nagle wrote: On 4/1/2012 1:41 PM, John Nagle wrote: On 4/1/2012 9:26 AM, Michael Torrie wrote: On 03/31/2012 04:58 PM, John Nagle wrote: Removed all "search" and "domain" entries from /etc/resolve.conf It's a design bug in glibc. I just submitted a bug report. http://sourceware.org/bugzilla/show_bug.cgi?id=13935 The same bug is in "dnspython". Submitted a bug report there, too. https://github.com/rthalley/dnspython/issues/6 John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Gotcha's?
On 4/4/2012 3:34 PM, Miki Tebeka wrote: Greetings, I'm going to give a "Python Gotcha's" talk at work. If you have an interesting/common "Gotcha" (warts/dark corners ...) please share. (Note that I want over http://wiki.python.org/moin/PythonWarts already). Thanks, -- Miki A few Python "gotchas": 1. Nobody is really in charge of third party packages. In the Perl world, there's a central repository, CPAN, and quality control. Python's "pypi" is just a collection of links. Many major packages are maintained by one person, and if they lose interest, the package dies. 2. C extensions are closely tied to the exact version of CPython you're using, and finding a properly built version may be difficult. 3. "eggs". The "distutils" system has certain assumptions built into it about where things go, and tends to fail in obscure ways. There's no uniform way to distribute a package. 4. The syntax for expression-IF is just weird. 5. "+" as concatenation. This leads to strange numerical semantics, such as (1,2) + (3,4) is (1,2,3,4). But, for "numarray" arrays, "+" does addition. What does a mixed mode expression of a numarray and a tuple do? Guess. 5. It's really hard to tell what's messing with the attributes of a class, since anything can store into anything. This creates debugging problems. 6. Multiple inheritance is a mess. Especially "super". 7. Using attributes as dictionaries can backfire. The syntax of attributes is limited. So turning XML or HTML structures into Python objects creates problems. 8. Opening a URL can result in an unexpected prompt on standard input if the URL has authentication. This can stall servers. 9. Some libraries aren't thread-safe. Guess which ones. 10. Python 3 isn't upward compatible with Python 2. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Gotcha's?
On 4/8/2012 10:55 AM, Miki Tebeka wrote: 8. Opening a URL can result in an unexpected prompt on standard input if the URL has authentication. This can stall servers. Can you give an example? I don't think anything in the standard library does that. It's in "urllib". See http://docs.python.org/library/urllib.html "When performing basic authentication, a FancyURLopener instance calls its prompt_user_passwd() method. The default implementation asks the users for the required information on the controlling terminal. A subclass may override this method to support more appropriate behavior if needed." A related "gotcha" is knowing that "urllib" sucks and you should use "urllib2". John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Donald E. Knuth in Python, cont'd
On 4/11/2012 6:03 AM, Antti J Ylikoski wrote: I wrote about a straightforward way to program D. E. Knuth in Python, and received an excellent communcation about programming Deterministic Finite Automata (Finite State Machines) in Python. The following stems from my Knuth in Python programming exercises, according to that very good communication. (By Roy Smith.) I'm in the process of delving carefully into Knuth's brilliant and voluminous work The Art of Computer Programming, Parts 1--3 plus the Fascicles in Part 4 -- the back cover of Part 1 reads: "If you think you're a really good programmer -- read [Knuth's] Art of Computer Programming... You should definitely send me a résumé if you can read the whole thing." -- Bill Gates. (Microsoft may in the future receive some e-mail from me.) You don't need those books as much as you used to. You don't have to write collections, hash tables, and sorts much any more. Those are solved problems and there are good libraries. Most of the basics are built into Python. Serious programmers should read those books, much as they should read von Neumann's "First Draft of a Report on the EDVAC", for background on how things work down at the bottom. But they're no longer essential desk references for most programmers. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: python module development workflow
On 4/11/2012 1:04 PM, Miki Tebeka wrote: Could any expert suggest an authoritative and complete guide for developing python modules? Thanks! I'd start with http://docs.python.org/distutils/index.html Make sure that python setup.py build python setup.py install works. Don't use the "rotten egg" distribution system. (http://packages.python.org/distribute/easy_install.html) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Deep merge two dicts?
On 4/12/2012 10:41 AM, Roy Smith wrote: Is there a simple way to deep merge two dicts? I'm looking for Perl's Hash::Merge (http://search.cpan.org/~dmuey/Hash-Merge-0.12/Merge.pm) in Python. def dmerge(a, b) : for k in a : v = a[k] if isinstance(v, dict) and k in b: dmerge(v, b[k]) a.update(b) -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote: On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote: I believe it says somewhere in the Python docs that it's undefined and implementation-dependent whether two identical expressions have the same identity when the result of each is immutable Bad design. Where "is" is ill-defined, it should raise ValueError. A worse example, one which is very implementation-dependent: http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers >>> a = 256 >>> b = 256 >>> a is b True # this is an expected result >>> a = 257 >>> b = 257 >>> a is b False Operator "is" should be be an error between immutables unless one is a built-in constant. ("True" and "False" should be made hard constants, like "None". You can't assign to None, but you can assign to True, usually with unwanted results. It's not clear why True and False weren't locked down when None was.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: global vars across modules
On 4/22/2012 12:39 PM, mambokn...@gmail.com wrote: Question: How can I access to the global 'a' in file_2 without resorting to the whole name 'file_1.a' ? Actually, it's better to use the fully qualified name "file_1.a". Using "import *" brings in everything in the other module, which often results in a name clash. Just do import file_1 and, if desired localnamefora = file_1.a John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/22/2012 3:17 PM, John Roth wrote: On Sunday, April 22, 2012 1:43:36 PM UTC-6, John Nagle wrote: On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote: On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote: I believe it says somewhere in the Python docs that it's undefined and implementation-dependent whether two identical expressions have the same identity when the result of each is immutable Bad design. Where "is" is ill-defined, it should raise ValueError. A worse example, one which is very implementation-dependent: http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers a = 256 b = 256 a is b True # this is an expected result a = 257 b = 257 a is b False Operator "is" should be be an error between immutables unless one is a built-in constant. ("True" and "False" should be made hard constants, like "None". You can't assign to None, but you can assign to True, usually with unwanted results. It's not clear why True and False weren't locked down when None was.) John Nagle Three points. First, since there's no obvious way of telling whether an arbitrary user-created object is immutable, trying to make "is" fail in that case would be a major change to the language. If a program fails because such a comparison becomes invalid, it was broken anyway. The idea was borrowed from LISP, which has both "eq" (pointer equality) and and "equals" (compared equality). It made somewhat more sense in the early days of LISP, when the underlying representation of everything was well defined. Second: the definition of "is" states that it determines whether two objects are the same object; this has nothing to do with mutability or immutability. The id([]) == id([]) thing is a place where cPython's implementation is showing through. It won't work that way in any implementation that uses garbage collection and object compaction. I think Jython does it that way, I'm not sure about either IronPython or PyPy. That represents a flaw in the language design - the unexpected exposure of an implementation dependency. Third: True and False are reserved names and cannot be assigned to in the 3.x series. They weren't locked down in the 2.x series when they were introduced because of backward compatibility. That's one of the standard language designer fuckups. Somebody starts out thinking that 0 and 1 don't have to be distinguished from False and True. When they discover that they do, the backwards compatibility sucks. C still suffers from this. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/22/2012 9:34 PM, Steven D'Aprano wrote: On Sun, 22 Apr 2012 12:43:36 -0700, John Nagle wrote: On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote: On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote: I believe it says somewhere in the Python docs that it's undefined and implementation-dependent whether two identical expressions have the same identity when the result of each is immutable Bad design. Where "is" is ill-defined, it should raise ValueError. "is" is never ill-defined. "is" always, without exception, returns True if the two operands are the same object, and False if they are not. This is literally the simplest operator in Python. John, you've been using Python for long enough that you should know this. I can only guess that you are trolling, although I can't imagine why. Because the language definition should not be what CPython does. As PyPy advances, we need to move beyond that. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/25/2012 5:01 PM, Steven D'Aprano wrote: On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote: Though, maybe it's better to use a different keyword than 'is' though, due to the plain English connotations of the term; I like 'sameobj' personally, for whatever little it matters. Really, I think taking away the 'is' operator altogether is better, so the only way to test identity is: id(x) == id(y) Four reasons why that's a bad idea: 1) The "is" operator is fast, because it can be implemented directly by the interpreter as a simple pointer comparison (or equivalent). This assumes that everything is, internally, an object. In CPython, that's the case, because Python is a naive interpreter and everything, including numbers, is "boxed". That's not true of PyPy or Shed Skin. So does "is" have to force the creation of a temporary boxed object? The concept of "object" vs. the implementation of objects is one reason you don't necessarily want to expose the implementation. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/26/2012 4:45 AM, Adam Skutt wrote: On Apr 26, 1:48 am, John Nagle wrote: On 4/25/2012 5:01 PM, Steven D'Aprano wrote: On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote: Though, maybe it's better to use a different keyword than 'is' though, due to the plain English connotations of the term; I like 'sameobj' personally, for whatever little it matters. Really, I think taking away the 'is' operator altogether is better, so the only way to test identity is: id(x) == id(y) Four reasons why that's a bad idea: 1) The "is" operator is fast, because it can be implemented directly by the interpreter as a simple pointer comparison (or equivalent). This assumes that everything is, internally, an object. In CPython, that's the case, because Python is a naive interpreter and everything, including numbers, is "boxed". That's not true of PyPy or Shed Skin. So does "is" have to force the creation of a temporary boxed object? That's what C# does AFAIK. Java defines '==' as value comparison for primitives and '==' as identity comparison for objects, but I don't exactly know how one would do that in Python. I would suggest that "is" raise ValueError for the ambiguous cases. If both operands are immutable, "is" should raise ValueError. That's the case where the internal representation of immutables shows through. If this breaks a program, it was broken anyway. It will catch bad comparisons like if x is 1000 : ... which is implementation dependent. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
CPython thread starvation
I have a multi-threaded CPython program, which has up to four threads. One thread is simply a wait loop monitoring the other three and waiting for them to finish, so it can give them more work to do. When the work threads, which read web pages and then parse them, are compute-bound, I've had the monitoring thread starved of CPU time for as long as 120 seconds. It's sleeping for 0.5 seconds, then checking on the other threads and for new work do to, so the work thread isn't using much compute time. I know that the CPython thread dispatcher sucks, but I didn't realize it sucked that bad. Is there a preference for running threads at the head of the list (like UNIX, circa 1979) or something like that? (And yes, I know about "multiprocessing". These threads are already in one of several service processes. I don't want to launch even more copies of the Python interpreter. The threads are usually I/O bound, but when they hit unusually long web pages, they go compute-bound during parsing.) Setting "sys.setcheckinterval" from the default to 1 seems to have little effect. This is on Windows 7. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/27/2012 6:25 PM, Adam Skutt wrote: On Apr 27, 2:54 pm, John Nagle wrote: I have a multi-threaded CPython program, which has up to four threads. One thread is simply a wait loop monitoring the other three and waiting for them to finish, so it can give them more work to do. When the work threads, which read web pages and then parse them, are compute-bound, I've had the monitoring thread starved of CPU time for as long as 120 seconds. How exactly are you determining that this is the case? Found the problem. The threads, after doing their compute intensive work of examining pages, stored some URLs they'd found. The code that stored them looked them up with "getaddrinfo()", and did this while a lock was set. On CentOS, "getaddrinfo()" at the glibc level doesn't always cache locally (ref https://bugzilla.redhat.com/show_bug.cgi?id=576801). Python doesn't cache either. So huge numbers of DNS requests were being made. For some pages being scanned, many of the domains required accessing a rather slow DNS server. The combination of thousands of instances of the same domain, a slow DNS server, and no caching slowed the crawler down severely. Added a local cache in the program to prevent this. Performance much improved. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/27/2012 9:20 PM, Paul Rubin wrote: John Nagle writes: The code that stored them looked them up with "getaddrinfo()", and did this while a lock was set. Don't do that!! Added a local cache in the program to prevent this. Performance much improved. Better to release the lock while the getaddrinfo is running, if you can. I may do that to prevent the stall. But the real problem was all those DNS requests. Parallizing them wouldn't help much when it took hours to grind through them all. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/27/2012 9:55 PM, Paul Rubin wrote: John Nagle writes: I may do that to prevent the stall. But the real problem was all those DNS requests. Parallizing them wouldn't help much when it took hours to grind through them all. True dat. But building a DNS cache into the application seems like a kludge. Unless the number of requests is insane, running a caching nameserver on the local box seems cleaner. I know. When I have a bit more time, I'll figure out why CentOS 5 and Webmin didn't set up a caching DNS resolver by default. Sometimes the number of requests IS insane. When the system hits a page with a thousand links, it has to resolve all of them. (Beyond a thousand links, we classify it as link spam and stop. The record so far is a page with over 10,000 links.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: CPython thread starvation
On 4/28/2012 1:04 PM, Paul Rubin wrote: Roy Smith writes: I agree that application-level name cacheing is "wrong", but sometimes doing it the wrong way just makes sense. I could whip up a simple cacheing wrapper around getaddrinfo() in 5 minutes. Depending on the environment (both technology and bureaucracy), getting a cacheing nameserver installed might take anywhere from 5 minutes to a few days to ... IMHO this really isn't one of those times. The in-app wrapper would only be usable to just that process, and we already know that the OP has multiple processes running the same app on the same machine. They would benefit from being able to share the cache, so now your wrapper gets more complicated. If it's not a nameserver then it's something that fills in for one. And then, since the application appears to be a large scale web spider, it probably wants to run on a cluster, and the cache should be shared across all the machines. So you really probably want an industrial strength nameserver with a big persistent cache, and maybe a smaller local cache because of high locality when crawling specific sites, etc. Each process is analyzing one web site, and has its own cache. Once the site is analyzed, which usually takes about a minute, the cache disappears. Multiple threads are reading multiple pages from the web site during that time. A local cache is enough to fix the huge overhead problem of doing a DNS lookup for every link found. One site with a vast number of links took over 10 hours to analyze before this fix; now it takes about four minutes. That solved the problem. We can probably get an additional minor performance boost with a real local DNS daemon, and will probably configure one. We recently changed servers from Red Hat to CentOS, and management from CPanel to Webmin. Before the change, we had a local DNS daemon with cacheing, so we didn't have this problem. Webmin's defaults tend to be on the minimal side. The DNS information is used mostly to help decide whether two URLs actually point to the same IP address, as part of deciding whether a link is on-site or off-site. Most of those links will never be read. We're not crawling the entire site, just looking at likely pages to find the name and address of the business behind the site. (It's part of our "Know who you're dealing with" system, SiteTruth.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: why () is () and [] is [] work in other way?
On 4/28/2012 4:47 AM, Kiuhnm wrote: On 4/27/2012 17:39, Adam Skutt wrote: On Apr 27, 8:07 am, Kiuhnm wrote: Useful... maybe, conceptually sound... no. Conceptually, NaN is the class of all elements which are not numbers, therefore NaN = NaN. NaN isn't really the class of all elements which aren't numbers. NaN is the result of a few specific IEEE 754 operations that cannot be computed, like 0/0, and for which there's no other reasonable substitute (e.g., infinity) for practical applications . In the real world, if we were doing the math with pen and paper, we'd stop as soon as we hit such an error. Equality is simply not defined for the operations that can produce NaN, because we don't know to perform those computations. So no, it doesn't conceptually follow that NaN = NaN, what conceptually follows is the operation is undefined because NaN causes a halt. Mathematics is more than arithmetics with real numbers. We can use FP too (we actually do that!). We can say that NaN = NaN but that's just an exception we're willing to make. We shouldn't say that the equivalence relation rules shouldn't be followed just because *sometimes* we break them. This is what programming languages ought to do if NaN is compared to anything other than a (floating-point) number: disallow the operation in the first place or toss an exception. If you do a signaling floating point comparison on IEEE floating point numbers, you do get an exception. On some FPUs, though, signaling operations are slower. On superscalar CPUs, exact floating point exceptions are tough to implement. They are done right on x86 machines, mostly for backwards compatibility. This requires an elaborate "retirement unit" to unwind the state of the CPU after a floating point exception. DEC Alphas didn't have that; SPARC and MIPS machines varied by model. ARM machines in their better modes do have that. Most game console FPUs do not have a full IEEE implementation. Proper language support for floating point exceptions varies with the platform. Microsoft C++ on Windows does support getting it right. (I had to deal with this once in a physics engine, where an overflow or a NaN merely indicated that a shorter time step was required.) But even there, it's an OS exception, like a signal, not a language-level exception. Other than Ada, which requires it, few languages handle such exceptions as language level exceptions. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a directory structure and modifying files automatically in Python
On 4/30/2012 8:19 AM, deltaquat...@gmail.com wrote: Hi, I would like to automate the following task under Linux. I need to create a set of directories such as 075 095 100 125 The directory names may be read from a text file foobar, which also contains a number corresponding to each dir, like this: 075 1.818 095 2.181 100 2.579 125 3.019 In each directory I must copy a text file input.in. This file contains two lines which need to be edited: Learn how to use a database. Creating and managing a big collection of directories to handle small data items is the wrong approach to data storage. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Python SOAP library
On 5/2/2012 8:35 AM, Alec Taylor wrote: What's the best SOAP library for Python? I am creating an API converter which will be serialising to/from a variety of sources, including REST and SOAP. Relevant parsing is XML [incl. SOAP] and JSON. Would you recommend: http://code.google.com/p/soapbox/ Or suggest another? Thanks for all information, Are you implementing the client or the server? Python "Suds" is a good client-side library. It's strict SOAP; you must have a WSDL file, and the XML queries and replies must verify against the WSDL file. https://fedorahosted.org/suds/ John Nagle -- http://mail.python.org/mailman/listinfo/python-list
"
An HTML page for a major site (http://www.chase.com) has some incorrect HTML. It contains
Re: key/value store optimized for disk storage
On 5/4/2012 12:14 AM, Steve Howell wrote: On May 3, 11:59 pm, Paul Rubin wrote: Steve Howell writes: compressor = zlib.compressobj() s = compressor.compress("foobar") s += compressor.flush(zlib.Z_SYNC_FLUSH) s_start = s compressor2 = compressor.copy() That's awful. There's no point in compressing six characters with zlib. Zlib has a minimum overhead of 11 bytes. You just made the data bigger. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a directory structure and modifying files automatically in Python
On 5/6/2012 9:59 PM, Paul Rubin wrote: Javier writes: Or not... Using directories may be a way to do rapid prototyping, and check quickly how things are going internally, without needing to resort to complex database interfaces. dbm and shelve are extremely simple to use. Using the file system for a million item db is ridiculous even for prototyping. Right. Steve Bellovin wrote that back when UNIX didn't have any database programs, let alone free ones. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a directory structure and modifying files automatically in Python
On 5/7/2012 9:09 PM, Steve Howell wrote: On May 7, 8:46 pm, John Nagle wrote: On 5/6/2012 9:59 PM, Paul Rubin wrote: Javierwrites: Or not... Using directories may be a way to do rapid prototyping, and check quickly how things are going internally, without needing to resort to complex database interfaces. dbm and shelve are extremely simple to use. Using the file system for a million item db is ridiculous even for prototyping. Right. Steve Bellovin wrote that back when UNIX didn't have any database programs, let alone free ones. It's kind of sad that the Unix file system doesn't serve as an effective key-value store at any kind of nontrivial scale. It would simplify a lot of programming if filenames were keys and file contents were values. You don't want to go there in a file system. Some people I know tried that around 1970. "A bit is a file. An ordered collection of files is a file". Didn't work out. There are file models other than the UNIX one. Many older systems had file versioning. Tandem built their file system on top of their distributed, redundant database system. There are backup systems where the name of the file is its hash, allowing elimination of duplicates. Most of the "free online storage" sites do that. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: serial module
On 5/22/2012 8:42 AM, Grant Edwards wrote: On 2012-05-22, Albert van der Horst wrote: It is anybody's guess what they do in USB. They do exactly what they're supposed to regardless of what sort of bus is used to connect the CPU and the UART (ISA, PCI, PCI-express, USB, Ethernet, etc.). If a device is registered as /dev/ttyUSBnn, one would hope that the Linux USB insertion event handler, which assigns that name, determined that the device was a serial port emulator. Unfortunately, the USB standard device classes (http://www.usb.org/developers/defined_class) don't have "serial port emulator" as a standardized device. So there's more variation in this area than in keyboards, mice, or storage devices. The best answers is probably that it depends on the whim of whoever implements the usb device. It does not depend on anybody's whim. The meaning of those parameters is well-defined. Certainly this stuff is system dependant, No, it isn't. It is, a little. There's a problem with the way Linux does serial ports. The only speeds allowed are the ones nailed into the kernel as named constants. This is a holdover from UNIX, which is a holdover from DEC PDP-11 serial hardware circa mid 1970s, which had 14 standard baud rates encoded in 4 bits. Really. In the Windows world, the actual baud rate is passed to the driver. Serial ports on the original IBM PC were loaded with a clock rate, so DOS worked that way. This only matters if you need non-standard baud rates. I've had to deal with that twice, for a SICK LMS LIDAR, (1,000,000 baud) and 1930s Teletype machines (45.45 baud). If you need non-standard speeds, see this: http://www.aetherltd.com/connectingusb.html If 19,200 baud is enough for you, don't worry about it. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: serial module
On 5/22/2012 2:07 PM, Paul Rubin wrote: John Nagle writes: If a device is registered as /dev/ttyUSBnn, one would hope that the Linux USB insertion event handler, which assigns that name, determined that the device was a serial port emulator. Unfortunately, the USB standard device classes (http://www.usb.org/developers/defined_class) don't have "serial port emulator" as a standardized device. So there's more variation in this area than in keyboards, mice, or storage devices. Hmm, I've been using USB-to-serial adapters and so far they've worked just fine. I plug the USB end of adapter into a Ubuntu box, see /dev/ttyUSB* appear, plug the serial end into the external serial device, and just use pyserial like with an actual serial port. I didn't realize there were issues with this. There are. See "http://wiki.debian.org/usbserial";. Because there's no standard USB class for such devices, the specific vendor ID/product ID pair has to be known to the OS. In Linux, there's a file of these, but not all USB to serial adapters are in it. In Windows, there tends to be a vendor-provided driver for each brand of USB to serial converter. This all would have been much simpler if the USB Consortium had defined a USB class for these devices, as they did for keyboards, mice, etc. However, this is not the original poster's problem. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: escaping/encoding/formatting in python
On 4/5/2012 10:10 PM, Steve Howell wrote: On Apr 5, 9:59 pm, rusi wrote: On Apr 6, 6:56 am, Steve Howell wrote: You've one-upped me with 2-to-the-N backspace escaping. Early attempts at UNIX word processing, "nroff" and "troff", suffered from that problem, due to a badly designed macro system. A question in language design is whether to escape or quote. Do you write "X = %d" % (n,)) or "X = " + str(n) In general, for anything but output formatting, the second scales better. Regular expressions have a bad case of the first. For a quoted alternative to regular expression syntax, see SNOBOL or Icon. SNOBOL allows naming patterns, and those patterns can then be used as components of other patterns. SNOBOL is obsolete, but that approach produced much more readable code. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Email Id Verification
On 5/24/2012 5:32 AM, niks wrote: Hello everyone.. I am new to asp.net... I want to use Regular Expression validator in Email id verification.. Can anyone tell me how to use this and what is the meaning of this \w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)* Not a Python question. It matches anything that looks like a mail user name followed by an @ followed by anything that looks more or less like a domain name. The domain name must contain at least one ".", and cannot end with a ".", which is not strictly correct but usually works. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: sqlite INSERT performance
On 5/30/2012 6:57 PM, duncan smith wrote: Hello, I have been attempting to speed up some code by using an sqlite database, but I'm not getting the performance gains I expected. SQLite is a "lite" database. It's good for data that's read a lot and not changed much. It's good for small data files. It's so-so for large database loads. It's terrible for a heavy load of simultaneous updates from multiple processes. However, wrapping the inserts into a transaction with BEGIN and COMMIT may help. If you have 67 columns in a table, you may be approaching the problem incorrectly. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Internationalized domain names not working with URLopen
I'm trying to open http://пример.испытание with urllib2.urlopen(s1) in Python 2.7 on Windows 7. This produces a Unicode exception: >>> s1 u'http://\u043f\u0440\u0438\u043c\u0435\u0440.\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435' >>> fd = urllib2.urlopen(s1) Traceback (most recent call last): File "", line 1, in File "C:\python27\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\python27\lib\urllib2.py", line 394, in open response = self._open(req, data) File "C:\python27\lib\urllib2.py", line 412, in _open '_open', req) File "C:\python27\lib\urllib2.py", line 372, in _call_chain result = func(*args) File "C:\python27\lib\urllib2.py", line 1199, in http_open return self.do_open(httplib.HTTPConnection, req) File "C:\python27\lib\urllib2.py", line 1168, in do_open h.request(req.get_method(), req.get_selector(), req.data, headers) File "C:\python27\lib\httplib.py", line 955, in request self._send_request(method, url, body, headers) File "C:\python27\lib\httplib.py", line 988, in _send_request self.putheader(hdr, value) File "C:\python27\lib\httplib.py", line 935, in putheader hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values])) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) >>> The HTTP library is trying to put the URL in the header as ASCII. Why isn't "urllib2" handling that? What does "urllib2" want? Percent escapes? Punycode? John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Internationalized domain names not working with URLopen
On 6/12/2012 11:42 PM, Andrew Berg wrote: On 6/13/2012 1:17 AM, John Nagle wrote: What does "urllib2" want? Percent escapes? Punycode? Looks like Punycode is the correct answer: https://en.wikipedia.org/wiki/Internationalized_domain_name#ToASCII_and_ToUnicode I haven't tried it, though. This is Python bug #9679: http://bugs.python.org/issue9679 It's been open for years, and the maintainers offer elaborate excuses for not fixing the problem. The socket module accepts Unicode domains, as does httplib. But urllib2, which is a front end to both, is still broken. It's failing when it constructs the HTTP headers. Domains in HTTP headers have to be in punycode. The code in stackoverflow doesn't really work right. Only the domain part of a URL should be converted to punycode. Path, port, and query parameters need to be converted to percent-encoding. (Unclear if urllib2 or httplib does this already. The documentation doesn't say.) While HTTP content can be in various character sets, the headers are currently required to be ASCII only, since the header has to be processed to determine the character code. (http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0155.html) Here's a workaround, for the domain part only. # # idnaurlworkaround -- workaround for Python defect 9679 # PYTHONDEFECT9679FIXED = False # Python defect #9679 - change when fixed def idnaurlworkaround(url) : """ Convert a URL to a form the currently broken urllib2 will accept. Converts the domain to "punycode" if necessary. This is a workaround for Python defect #9679. """ if PYTHONDEFECT9679FIXED : # if defect fixed return(url) # use unmodified URL url = unicode(url) # force to Unicode (scheme, accesshost, path, params, query, fragment) = urlparse.urlparse(url)# parse URL if scheme == '' and accesshost == '' and path != '' : # bare domain accesshost = path # use path as access host path = '' # no path labels = accesshost.split('.') # split domain into sections ("labels") labels = [encodings.idna.ToASCII(w) for w in labels]# convert each label to punycode if necessary accesshost = '.'.join(labels) # reassemble domain url = urlparse.urlunparse((scheme, accesshost, path, params, query, fragment)) # reassemble url return(url) # return complete URL with punycode domain John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: feedparser hanging after I/O error
On 6/2/2011 4:40 AM, xDog Walker wrote: On Wednesday 2011 June 01 10:34, John Nagle wrote: I have a program which uses "feedparser". It occasionally hangs when the network connection has been lost, and remains hung after the network connection is restored. My solution is to download the feed file using wget, then hand that file to feedparser. feedparser will also hang forever on a url if the server doesn't serve. Then you don't get the poll optimization, where feedparser sends the token to indicate that it's already seen version N. This is for a program that's constantly polling RSS feeds and fetching changes. Feedparser is good for that, until the network fails temporarily. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Function declarations ?
On 6/12/2011 12:38 PM, Andre Majorel wrote: On 2011-06-10, Asen Bozhilov wrote: Andre Majorel wrote: Is there a way to keep the definitions of the high-level functions at the top of the source ? I don't see a way to declare a function in Python. Languages with variable and function declarations usually use hoisted environment. Hoisted ? With a pulley and a cable ? There are languages with definitions and in which the compiler looks ahead. FORTRAN, for example. Python doesn't work that way. Nor do C and the languages derived from it, because the syntax is context-dependent. (In C++, "A b;" is ambiguous until after the declaration of A. In Pascal-derived languages, you write "var b: A;", which is parseable before you know what A is. So declarations don't have to be in dependency order.) None of this is relevant to Python, but that's what "hoisted" means in this context.. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: those darn exceptions
On 6/21/2011 2:51 PM, Chris Torek wrote: On Tue, 21 Jun 2011 01:43:39 +, Chris Torek wrote: But how can I know a priori that os.kill() could raise OverflowError in the first place? If you passed an integer that was at some time a valid PID to "os.kill()", and OverflowError was raised, I'd consider that a bug in "os.kill()". Only OSError, or some subclass thereof, should be raised for a possibly-valid PID. If you passed some unreasonably large number, that would be a legitimate reason for an OverflowError. That's for parameter errors, though; it shouldn't happen for environment errors. That's a strong distinction. If something can raise an exception because the environment external to the process has a problem, the exception should be an EnvironmentError or a subclass thereof. This maintains a separation between bugs (which usually should cause termination or fairly drastic recovery action) and normal external events (which have to be routinely handled.) It's quite possible to get a OSError on "os.kill()" for a number of legitimate reasons. The target process may have exited since the PID was obtained, for example. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: How to import data from MySQL db into excel sheet
On 6/2/2011 5:11 AM, hisan wrote: Please let me know how can i import my sql data of multiple rows and columns into an excel sheet. here i need to adjust the column width based on the on the data that sits into the column You're asking in the wrong forum. Try the MySQL forum or an Excel forum. For a one-off job, use the MySQL Workbench, do a SELECT, click on the floppy disk icon, and export a CSV (comma-separated value) file, which Excel will import. It's possible to link Excel directly to an SQL database; see the Excel documentation. On a server, you can SELECT ... INTO OUTFILE and get a CSV file that way, but the file is created on the machine where the database is running, not the client machine. You can write a Python program to SELECT from the database and use the CSV module to create a CSV file, but as a one-off, it's not necessary. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: What Programing Language are the Largest Website Written In?
On 7/12/2011 4:54 AM, Xah Lee wrote: > Then, this question piqued me, even i tried to not waste my time. But > it overpowered me before i resisted, becuase i quickly spend 15 min to > write this list (with help of Google): > > 1 Google ◇ Java > 2 Facebook ◇ PHP > 3 YouTube ◇ Python > 4 Yahoo! ◇ PHP > 5 blogger.com ◇ Java > 6 baidu.com ◇ C/C++. perl/python/ruby > 7 Wikipedia ◇ PHP Aargh. Much misinformation. First, most of the heavy machinery of Google is written in C++. Some user-facing stuff is written in Java, and some scripting is done in Python. Google is starting to use Go internally, but they're not saying much about where. Facebook is PHP on the user-facing side, but there's heavy inter-server communication and caching, mostly in C++. The original user interface for YouTube, before Google bought it, was in Python. But it's since been rewritten. All the stuff that actually handles video is, of course in C/C++. The load of handling the video dwarfs the user interface load. Wikipedia is indeed written in PHP. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a way to customise math.sqrt(x) for some x?
On 7/16/2011 2:14 AM, Chris Angelico wrote: On Sat, Jul 16, 2011 at 6:35 PM, Steven D'Aprano wrote: I have a custom object that customises the usual maths functions and operators, such as addition, multiplication, math.ceil etc. Is there a way to also customise math.sqrt? I don't think there is, but I may have missed something. Only thing I can think of is: import math math.sqrt=lambda(x) x.__sqrt__(x) if x.whatever else math.sqrt(x) I don't suppose there's a lambda version of try/catch? ChrisA Why use a lambda? Just use a def. A lambda with an "if" is un-Pythonic. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: I am fed up with Python GUI toolkits...
On 7/19/2011 7:34 PM, Andrew Berg wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 There's PyGUI, which, at a glance, fits whit what you want. Looks like it uses OpenGL and native GUI facilities. http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/ It has quite a few external dependencies, though (different dependencies for each platform, so it requires a lot to be cross-platform). It still uses Tcl/Tk stuff, which is un-Pythonic. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Complex sort on big files
On 8/6/2011 10:53 AM, sturlamolden wrote: On Aug 1, 5:33 pm, aliman wrote: I've read the recipe at [1] and understand that the way to sort a large file is to break it into chunks, sort each chunk and write sorted chunks to disk, then use heapq.merge to combine the chunks as you read them. Or just memory map the file (mmap.mmap) and do an inline .sort() on the bytearray (Python 3.2). With Python 2.7, use e.g. numpy.memmap instead. If the file is large, use 64-bit Python. You don't have to process the file in chunks as the operating system will take care of those details. Sturla No, no, no. If the file is too big to fit in memory, trying to page it will just cause thrashing as the file pages in and out from disk. The UNIX sort program is probably good enough. There are better approaches, if you have many gigabytes to sort, (see Syncsort, which is a commercial product) but few people need them. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: 'Use-Once' Variables and Linear Objects
On 8/2/2011 7:19 AM, Neal Becker wrote: I thought this was an interesting article http://www.pipeline.com/~hbaker1/Use1Var.html Single-use was something of a dead end in programming. Single assignment, where you can only set a variable when you create it, is more useful. Single assignment is comparable to functional programming, but without the deeply nested syntax. Functional programs are trees, while single-assignment programs are directed acyclic graphs. The difference is that you can fan-out results, while in a a functional language, you can only fan in. This fits well with Python, where you can write things like def fn(x) : (a, b, c) = fn1() return(fn2(a) + fn3(b)*c) "const" is often used in C and C++ to indicate single-assignment usage. But C/C++ doesn't have multiple return values, so the concept isn't as useful as it is in Python. Optimizing compilers usually recognize variable lifetimes, and so they create single-assignment variables internally when possible. This is a win for register and stack allocation, and for fine-grain parallelism on machines which support it. Since Python isn't very optimizable, this is mostly a curiosity. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: try... except with unknown error types
On 8/19/2011 1:24 PM, John Gordon wrote: In<4e4ec405$0$29994$c3e8da3$54964...@news.astraweb.com> Steven D'Aprano writes: You can catch all exceptions by catching the base class Exception: Except that is nearly always poor advice, because it catches too much: it hides bugs in code, as well as things which should be caught. You should always catch the absolute minimum you need to catch. Right. When in doubt, catch EnvironmentError. That means something external to the program, at the OS or network level, has a problem. "Exception" covers errors which are program bugs, like references to undefined class members. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Error when deleting and reimporting subpackages
On 8/22/2011 11:51 AM, Matthew Brett wrote: Hi, I recently ran into this behavior: import sys import apkg.subpkg del sys.modules['apkg'] import apkg.subpkg as subpkg Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'subpkg' where 'apkg' and 'subpkg' comprise empty __init__.py files to simplify the example. It appears then, that importing a subpackage, then deleting the containing package from sys.modules, orphans the subpackage in an unfixable state. I ran into this because the nose testing framework does exactly this kind of thing when loading test modules, causing some very confusing errors and failures. Is this behavior expected? It's undefined behavior. You're dealing with CPython implementation semantics, not Python language semantics. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: try... except with unknown error types
On 8/21/2011 5:30 PM, Steven D'Aprano wrote: Chris Angelico wrote: A new and surprising mode of network failure would be indicated by a new subclass of IOError or EnvironmentError. /s/would/should/ I don't see why you expect this, when *existing* network-related failures aren't: import socket issubclass(socket.error, EnvironmentError) False (Fortunately that specific example is fixed in Python 3.) I think I reported that some years ago. There were some other errors in the URL and SSL area that weren't subclasses of EnvironmentError. It's also possible to get UnicodeError from URL operations. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
SSL module needs issuer information
The SSL module still doesn't return much information from the certificate. SSLSocket.getpeercert only returns a few basic items about the certificate subject. You can't retrieve issuer information, and you can't get the extensions needed to check if a cert is an EV cert. With the latest flaps about phony cert issuers, it's worth having issuer info available. It was available in the old M2Crypto module, but not in the current Python SSL module. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Did MySQL support ever make it to Python 3.x?
Is there Python 3.x support for MySQL yet? MySQLdb's page still says "Python versions 2.3-2.6 are supported.": https://sourceforge.net/projects/mysql-python/ There's PyMySQL, which is pure Python, but it's at version 0.4. There's good progress there, but it's not being used heavily yet, and users are reporting bugs like "broken pipe" errors. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: how to read the last line of a huge file???
On 3/5/2011 10:21 AM, tkp...@hotmail.com wrote: Question: how do I use f.tell() to identify if an offset is legal or illegal? Read backwards in binary mode, byte by byte, until you reach a byte which is, in binary, either 0xxx 11xx You are then at the beginning of an ASCII or UTF-8 character. You can copy the bytes forward from there into an array of bytes, then apply the appropriate codec. This is also what you do if skipping ahead in a UTF-8 file, to get in sync. Reading the last line or lines is easier. Read backwards in binary until you hit an LF or CR, both of which are the same in ASCII and UTF-8. Copy the bytes forward from that point into an array of bytes, then apply the appropriate codec. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Extending dict (dict's) to allow for multidimensional dictionary
On 3/5/2011 12:05 PM, Paul Rubin wrote: Ravi writes: I can extend dictionary to allow for the my own special look-up tables. However now I want to be able to define multidimensional dictionary which supports look-up like this: d[1]['abc'][40] = 'dummy' Why do that anyway? You can use a tuple as a subscript: d[1,'abc',40] = 'dummy' Also, at some point, it's time to use a database. If you find yourself writing those "dictionaries" to files, or trying to look up everything with "abc" in the second subscript, a database is appropriate. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: having both dynamic and static variables
On 3/2/2011 9:27 PM, Steven D'Aprano wrote: On Wed, 02 Mar 2011 19:45:16 -0800, Yingjie Lan wrote: Hi everyone, Variables in Python are resolved dynamically at runtime, which comes at a performance cost. However, a lot of times we don't need that feature. Variables can be determined at compile time, which should boost up speed. [...] This is a very promising approach taken by a number of projects. It's worth having some syntax for constants. I'd suggest using "let": let PI = 3.1415926535897932384626433832795028841971693993751 I'd propose the following semantics: 1. "let" creates an object whose binding is unchangeable. This is effectively a constant, provided that the value is immutable. A compiler may treat such variables as constants for optimization purposes. 2. Assignment to a a variable created with "let" produces an error at compile time or run time. 3. Names bound with "let" have the same scope as any other name created in the same context. Function-local "let" variables are permitted. 4. It is an error to use "let" on a name explicitly made "global", because that would allow access to the variable before it was initialized. This is close to the semantics of "const" in C/C++, except that there's no notion of a const parameter. "let" allows the usual optimizations - constant folding, hoisting out of loops, compile time arithmetic, unboxing, etc. Ordinarily, Python compilers have to assume that any variable can be changed at any time from another thread, requiring worst-case code for everything. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: having both dynamic and static variables
On 3/5/2011 7:46 PM, Corey Richardson wrote: On 03/05/2011 10:23 PM, MRAB wrote: Having a fixed binding could be useful elsewhere, for example, with function definitions: [..] fixed PI = 3.1415926535897932384626433832795028841971693993751 fixed def squared(x): return x * x This question spawns from my ignorance: When would a functions definition change? What is the difference between a dynamic function and a fixed function? All functions in Python can be replaced dynamically. While they're running. From another thread. Really. Implementing this is either inefficient, with a lookup for every use (CPython) or really, really complicated, involving just-in-time compilers, invalidation, recompilation, and a backup interpreter for when things get ugly (PyPy). John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing module in async db query
On 3/8/2011 3:34 PM, Philip Semanchuk wrote: On Mar 8, 2011, at 3:25 PM, Sheng wrote: This looks like a tornado problem, but trust me, it is almost all about the mechanism of multiprocessing module. [snip] So the workflow is like this, get() --> fork a subprocess to process the query request in async_func() -> when async_func() returns, callback_func uses the return result of async_func as the input argument, and send the query result to the client. So the problem is the the query result as the result of sql_command might be too big to store them all in the memory, which in our case is stored in the variable "data". Can I send return from the async method early, say immediately after the query returns with the first result set, then stream the results to the browser. In other words, can async_func somehow notify callback_func to prepare receiving the data before async_func actually returns? Hi Sheng, Have you looked at multiprocessing.Queue objects? Make sure that, having made a request of the database, you quickly read all the results. Until you finish the transaction, the database has locks set, and other transactions may stall. "Streaming" out to a network connection while still reading from the database is undesirable. If you're doing really big SELECTs, consider using LIMIT and OFFSET in SQL to break them up into smaller bites. Especially if the user is paging through the results. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Passing Functions
On 3/11/2011 5:49 AM, yoro wrote: I've found the error, I had to type in: for node in nodeTable: if node != 0 and Node.visited == False: That's just your first error. (Also, you shouldn't have anything but Node items in nodeTable, so you don't need the "node != 0".) The biggest problem is at #Values to assign to each node > > class Node: > > distFromSource = infinity > > previous = invalid_node > > visited = False Those are variables of the entire class. Every instance of Node shares the same variables. You need class Node: def __init__(self) : self.distFromSource = infinity self.previous = invalid_node self.visited = False John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Compile time evaluation of dictionaries
On 3/10/2011 8:23 AM, Gerald Britton wrote: Today I noticed that an expression like this: "one:%(one)s two:%(two)s" % {"one": "is the loneliest number", "two": "can be as bad as one"} could be evaluated at compile time, but is not: CPython barely evaluates anything at compile time. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating a very simple revision system for photos in python
On 3/11/2011 6:56 AM, Thomas W wrote: I`m thinking about creating a very simple revision system for photos in python, something like bazaar, mercurial or git, but for photos. The problem is that handling large binary files compared to plain text files are quite different. Has anybody done something like this or have any thoughts about it, I`d be very grateful. If something like mercurial or git could be used and/or extended/customized that would be even better. Alienbrain (http://www.alienbrain.com/) does this. That's what game companies use for revision control, where data includes images, motion capture files, game levels, and music, as well as code. There's also Autodesk Vault, which does a similar job for engineering data. One key to doing this well is the ability to talk about a group of revisions across multiple files as an entity, without having to be the owner of those files. You need to say what goes into a build of a game, or a revision of a manufactured product. You also need really good tools to show the differences between revisions. John Nagle -- http://mail.python.org/mailman/listinfo/python-list