Re: [RELEASED] Python 2.7.5

2013-06-03 Thread John Nagle
On 5/15/2013 9:19 PM, Benjamin Peterson wrote:
> It is my greatest pleasure to announce the release of Python 2.7.5.
> 
> 2.7.5 is the latest maintenance release in the Python 2.7 series.

Thanks very much.  It's important that Python 2.x be maintained.

3.x is a different language, with different libraries, and lots of
things that still don't work.  Many old applications will never
be converted.

    John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why has python3 been created as a seperate language where there is still python2.7 ?

2012-06-26 Thread John Nagle

On 6/25/2012 1:36 AM, Stefan Behnel wrote:

gmspro, 24.06.2012 05:46:

Why has python3 been created as a seperate language where there is still 
python2.7 ?



The intention of Py3 was to deliberately break backwards compatibility in
order to clean up the language. The situation is not as bad as you seem to
think, a huge amount of packages have been ported to Python 3 already
and/or work happily with both language dialects.


The syntax changes in Python 3 are a minor issue for
serious programmers.  The big headaches come from packages that
aren't being ported to Python 3 at all.  In some cases, there's
a replacement package from another author that performs the
same function, but has a different API.  Switching packages
involves debugging some new package with, probably, one
developer and a tiny user community.

The Python 3 to MySQL connection is still a mess.
The original developer of MySQLdb doesn't want to support
Python 3.  There's "pymysql", but it hasn't been updated
since 2010 and has a long list of unfixed bugs.
There was a "MySQL-python-1.2.3-py3k" port by a third party,
but the domain that hosted it 
("http://www.elecmor.mooo.com/python/MySQL-python-1.2.3-py3k.zip";) is 
dead.  There's

MySQL for Python 3 (https://github.com/davispuh/MySQL-for-Python-3)
but it doesn't work on Windows.  MySQL Connector
(https://code.launchpad.net/myconnpy) hasn't been updated in a
while, but at least has some users.  OurSQL has a different
API than MySQLdb, and isn't quite ready for prime time yet.

    That's why I'm still on Python 2.7.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: PySerial could not open port COM4: [Error 5] Access is denied - please help

2012-06-26 Thread John Nagle

On 6/26/2012 9:12 PM, Adam wrote:

Host OS:Ubuntu 10.04 LTS
Guest OS:Windows XP Pro SP3


I am able to open port COM4 with Terminal emulator.

So, what can cause PySerial to generate the following error ...

C:\Wattcher>python wattcher.py
Traceback (most recent call last):
   File "wattcher.py", line 56, in 
 ser.open()
   File "C:\Python25\Lib\site-packages\serial\serialwin32.py", line 56, in
open
 raise SerialException("could not open port %s: %s" % (self.portstr,
ctypes.WinError()))
serial.serialutil.SerialException: could not open port COM4: [Error 5]
Access is denied.


Are you trying to access serial ports from a virtual machine?
Which virtual machine environment?  Xen?  VMware? QEmu?  VirtualBox?
I wouldn't expect that to work in most of those.

What is "COM4", anyway?   Few machines today actually have four
serial ports.  Is some device emulating a serial port?

John Nagle


--
http://mail.python.org/mailman/listinfo/python-list


Re: when "normal" parallel computations in CPython will be implemented at last?

2012-07-02 Thread John Nagle

On 7/1/2012 10:51 AM, dmitrey wrote:

hi all,
are there any information about upcoming availability of parallel
computations in CPython without modules like  multiprocessing? I mean
something like parallel "for" loops, or, at least, something without
forking with copying huge amounts of RAM each time and possibility to
involve unpiclable data (vfork would be ok, but AFAIK it doesn't work
with CPython due to GIL).

AFAIK in PyPy some progress have been done (
http://morepypy.blogspot.com/2012/06/stm-with-threads.html )

Thank you in advance, D.



   It would be "un-Pythonic" to have real concurrency in Python.
You wouldn't be able to patch code running in one thread from
another thread.  Some of the dynamic features of Python
would break.   If you want fine-grained concurrency, you need
controlled isolation between concurrent tasks, so they interact
only at well-defined points.  That's un-Pythonic.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: simpler increment of time values?

2012-07-05 Thread John Nagle

On 7/4/2012 5:29 PM, Vlastimil Brom wrote:

Hi all,
I'd like to ask about the possibilities to do some basic manipulation
on timestamps - such as incrementing a given time (hour.minute -
string) by some minutes.
Very basic notion of "time" is assumed, i.e. dateless,
timezone-unaware, DST-less etc.
I first thought, it would be possible to just add a timedelta to a
time object, but, it doesn't seem to be the case.


   That's correct.  A datetime.time object is a time within a day.
A datetime.date object is a date without a time.  A datetime.datetime
object contains both.

  You can add a datetime.timedelta object to a datetime.datetime
object, which will yield a datetime.datetime object.

  You can also call time.time(), and get the number of seconds
since the epoch (usually 1970-01-01 00:00:00 UTC). That's just
a number, and you can do arithmetic on that.

  Adding a datetime.time to a datetime.timedelta isn't that
useful.  It would have to return a value error if the result
crossed a day boundary.

    John Nagle


--
http://mail.python.org/mailman/listinfo/python-list


Re: Socket code not executing properly in a thread (Windows)

2012-07-07 Thread John Nagle

On 7/8/2012 3:55 AM, Andrew D'Angelo wrote:

Hi, I've been writing an IRC chatbot that an relay messages it receives as
an SMS.


   We have no idea what IRC module you're using.


As it stands, I can retrieve and parse SMSs from Google Voice perfectly


   The Google Voice code you have probably won't work once you have
enough messages stored that Google Voice returns them on multiple
pages.  You have to read all the pages.  If there's any significant
amount of traffic, the completed messages have to be moved or deleted,
or each polling cycle returns more data than the last one.

   Google Voice isn't a very good SMS gateway.  I used to use it,
but switched to Twilio (which costs, but works) two years ago.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread John Nagle

On 7/8/2012 2:52 PM, Christian Heimes wrote:

You are contradicting yourself. Either the OS is providing a fully
atomic rename or it doesn't. All POSIX compatible OS provide an atomic
rename functionality that renames the file atomically or fails without
loosing the target side. On POSIX OS it doesn't matter if the target exists.


Rename on some file system types (particularly NFS) may not be atomic.


You don't need locks or any other fancy stuff. You just need to make
sure that you flush the data and metadata correctly to the disk and
force a re-write of the directory inode, too. It's a standard pattern on
POSIX platforms and well documented in e.g. the maildir RFC.

You can use the same pattern on Windows but it doesn't work as good.


  That's because you're using the wrong approach. See how to use
ReplaceFile under Win32:

http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

Renaming files is the wrong way to synchronize a
crawler.  Use a database that has ACID properties, such as
SQLite.  Far fewer I/O operations are required for small updates.
It's not the 1980s any more.

I use a MySQL database to synchronize multiple processes
which crawl web sites.  The tables of past activity are InnoDB
tables, which support transactions.  The table of what's going
on right now is a MEMORY table.  If the database crashes, the
past activity is recovered cleanly, the MEMORY table comes back
empty, and all the crawler processes lose their database
connections, abort, and are restarted.  This allows multiple
servers to coordinate through one database.

John Nagle




--
http://mail.python.org/mailman/listinfo/python-list


Re: Implicit conversion to boolean in if and while statements

2012-07-17 Thread John Nagle

On 7/15/2012 1:34 AM, Andrew Berg wrote:

This has probably been discussed before, but why is there an implicit
conversion to a boolean in if and while statements?

if not None:
print('hi')
prints 'hi' since bool(None) is False.

If this was discussed in a PEP, I would like a link to it. There are so
many PEPs, and I wouldn't know which ones to look through.

Converting 0 and 1 to False and True seems reasonable, but I don't see
the point in converting other arbitrary values.


   Because Boolean types were an afterthought in Python.  See PEP 285.
If a language starts out with a Boolean type, it tends towards
Pascal/Ada/Java semantics in this area.  If a language backs
into needing a Boolean type, as Python and C did, it tends to have
the somewhat weird semantics of a language which can't quite decide 
what's a Boolean.  C and C++ have the same problem, for exactly the

same reason - boolean types were an afterthought there, too.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: On-topic: alternate Python implementations

2012-08-06 Thread John Nagle
On 8/4/2012 7:19 PM, Steven D'Aprano wrote:
> On Sat, 04 Aug 2012 18:38:33 -0700, Paul Rubin wrote:
> 
>> Steven D'Aprano  writes:
>>> Runtime optimizations that target the common case, but fall back to
>>> unoptimized code in the rare cases that the optimization doesn't apply,
>>> offer the opportunity of big speedups for most code at the cost of
>>> trivial slowdowns when you do something unusual.
>>
>> The problem is you can't always tell if the unusual case is being
>> exercised without an expensive dynamic check, which in some cases must
>> be repeated in every iteration of a critical inner loop, even though it
>> turns out that the program never actually uses the unusual case.

   There are other approaches. PyPy uses two interpreters and a JIT
compiler to handle the hard cases.  When code does something unexpected
to other code, the backup interpreter is used to get control out of
the trouble spot so that the JIT compiler can then recompile the
code.  (I think; I've read the paper but haven't looked at the
internals.)

   This is hard to implement and hard to get right.

John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python 6 compilation failure on RHEL

2012-08-20 Thread John Nagle
On 8/20/2012 2:50 PM, Emile van Sebille wrote:
> On 8/20/2012 1:55 PM Walter Hurry said...
>> On Mon, 20 Aug 2012 12:19:23 -0700, Emile van Sebille wrote:
>>
>>> Package dependencies.  If the OP intends to install a package that
>>> doesn't support other than 2.6, you install 2.6.
>>
>> It would be a pretty poor third party package which specified Python 2.6
>> exactly, rather than (say) "Python 2.6 or later, but not Python 3"

After a thread of clueless replies, it's clear that nobody
responding actually read the build log.  Here's the problem:

  Failed to find the necessary bits to build these modules:
bsddb185
dl
imageop
sunaudiodev

What's wrong is that the Python 2.6 build script is looking for
some antiquated packages that aren't in a current RHEL.  Those
need to be turned off.

This is a known problem (see
http://pythonstarter.blogspot.com/2010/08/bsddb185-sunaudiodev-python-26-ubuntu.html)
but, unfortunately, the site with the patch for it
(http://www.lysium.de/sw/python2.6-disable-old-modules.patch)
is no longer in existence.  

But someone archived it on Google Code, at

http://code.google.com/p/google-earth-enterprise-compliance/source/browse/trunk/googleclient/geo/earth_enterprise/src/third_party/python/python2.6-disable-old-modules.patch

so if you apply that patch to the setup.py file for Python 2.6, that
ought to help.

You might be better off building Python 2.7, but you asked about 2.6.

John Nagle



-- 
http://mail.python.org/mailman/listinfo/python-list


Parsing ISO date/time strings - where did the parser go?

2012-09-06 Thread John Nagle
In Python 2.7:

   I want to parse standard ISO date/time strings such as

2012-09-09T18:00:00-07:00

into Python "datetime" objects.  The "datetime" object offers
an output method , datetimeobj.isoformat(), but not an input
parser.  There ought to be

classmethod datetime.fromisoformat(s)

but there isn't.  I'd like to avoid adding a dependency on
a third party module like "dateutil".

The "Working with time" section of the Python wiki is so
ancient it predates "datetime", and says so.

There's an iso8601 module on PyPi, but it's abandoned; it hasn't been
updated since 2007 and has many outstanding issues.

There are mentions of "xml.utils.iso8601.parse" in
various places, but the "xml" module that comes
with Python 2.7 doesn't have xml.utils.

http://www.seehuhn.de/pages/pdate
says:

"Unfortunately there is no easy way to parse full ISO 8601 dates using
the Python standard library."

It looks like this was taken out of "xml" at some point,
but not moved into "datetime".

John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing ISO date/time strings - where did the parser go?

2012-09-06 Thread John Nagle
On 9/6/2012 12:51 PM, Paul Rubin wrote:
> John Nagle  writes:
>> There's an iso8601 module on PyPi, but it's abandoned; it hasn't been
>> updated since 2007 and has many outstanding issues.
> 
> Hmm, I have some code that uses ISO date/time strings and just checked
> to see how I did it, and it looks like it uses iso8601-0.1.4-py2.6.egg .
> I don't remember downloading that module (I must have done it and
> forgotten).  I'm not sure what its outstanding issues are, as it works
> ok in the limited way I use it.
> 
> I agree that this functionality ought to be in the stdlib.

   Yes, it should.  There's no shortage of implementations.
PyPi has four.  Each has some defect.

   PyPi offers:

iso8601 0.1.4   Simple module to parse ISO 8601 dates
iso8601.py 0.1dev   Parse utilities for iso8601 encoding.
iso8601plus 0.1.6   Simple module to parse ISO 8601 dates
zc.iso8601 0.2.0ISO 8601 utility functions

Unlike CPAN, PyPi has no quality control.

Looking at the first one, it's in Google Code.

http://code.google.com/p/pyiso8601/source/browse/trunk/iso8601/iso8601.py

The first bug is at line 67.  For a timestamp with a "Z"
at the end, the offset should always be zero, regardless of the default
timezone.  See "http://en.wikipedia.org/wiki/ISO_8601";.
The code uses the default time zone in that case, which is wrong.
So don't call that code with your local time zone as the default;
it will return bad times.

Looking at the second one, it's on github:

https://github.com/accellion/iso8601.py/blob/master/iso8601.py

Giant regular expressions!  The code to handle the offset
is present, but it doesn't make the datetime object a
timezone-aware object.  It returns a naive object in UTC.

The third one is at

https://github.com/jimklo/pyiso8601plus

This is a fork of the first one, because the first one is abandonware.
The bug in the first one, mentioned above, isn't fixed.  However, if
a time zone is present, it does return an "aware" datetime object.

The fourth one is the Zope version.  This brings in the pytz
module, which brings in the Olsen database of named time zones and
their historical conversion data. None of that information is
used, or necessary, to parse ISO dates and times.  Somebody
just wanted the pytz.fixedOffset() function, which does something
datetime already does.

(For all the people who keep saying "use strptime", that doesn't
handle time zone offsets at all.)

John Nagle


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing ISO date/time strings - where did the parser go?

2012-09-08 Thread John Nagle
On 9/8/2012 5:20 PM, John Gleeson wrote:
> 
> On 2012-09-06, at 2:34 PM, John Nagle wrote:
>>  Yes, it should.  There's no shortage of implementations.
>> PyPi has four.  Each has some defect.
>>
>>   PyPi offers:
>>
>> iso8601 0.1.4 Simple module to parse ISO 8601 dates
>> iso8601.py 0.1dev Parse utilities for iso8601 encoding.
>> iso8601plus 0.1.6 Simple module to parse ISO 8601 dates
>> zc.iso8601 0.2.0 ISO 8601 utility functions
> 
> 
> Here are three more on PyPI you can try:
> 
> iso-8601 0.2.3   Flexible ISO 8601 parser...
> PySO8601 0.1.7   PySO8601 aims to parse any ISO 8601 date...
> isodate 0.4.8An ISO 8601 date/time/duration parser and formater
> 
> All three have been updated this year.

   There's another one inside feedparser, and there used to be
one in the xml module.

   Filed issue 15873: "datetime" cannot parse ISO 8601 dates and times
http://bugs.python.org/issue15873

   This really should be handled in the standard library, instead of
everybody rolling their own, badly.  Especially since in Python 3.x,
there's finally a useful "tzinfo" subclass for fixed time zone
offsets.  That provides a way to directly represent ISO 8601 date/time
strings with offsets as "time zone aware" date time objects.

John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: search google with python

2012-01-25 Thread John Nagle

On 1/25/2012 8:38 AM, Jerry Hill wrote:

On Wed, Jan 25, 2012 at 5:36 AM, Tracubik  wrote:

thanks a lot but it say it's deprecated, is there a replacement? Anyway
it'll useful for me to study json, thanks :)


I don't believe Google is particularly supportive of allowing
third-parties (like us) to use their search infrastructure.  All of
the search-related APIs they used to provide are slowly going away and
not being replaced, as far as I can tell.


   True.  The Google SOAP API disappeared years ago.  The AJAX
search widget was very restrictive, and is now on end of life
(no new users).  "Google Custom Search" only lets you search
specific sites.

   The Bing API comes with limitations on what you can do with
the results.

   The Yahoo search API went away, replaced by the Yahoo BOSS
API. Then that was replaced by a pay-per-search interface.

   Bleeko has an API, but you have to ask to use it.

        John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Where to put data

2012-01-27 Thread John Nagle

On 1/25/2012 9:26 AM, bvdp wrote:

I'm having a disagreement with a buddy on the packaging of a program
we're doing in Python. It's got a number of modules and large number
of library files. The library stuff is data, not code.


How much data?  Megabytes? Gigabytes?

I have some modules which contain nothing but big
constants, written by a program in Python format.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Killing threads, and os.system()

2012-02-03 Thread John Nagle

On 1/31/2012 8:04 AM, Dennis Lee Bieber wrote:

({muse: who do we have to kill
to persuade OS designers to incorporate something like the Amiga ARexx
"rexxport" system}).


   QNX, which is a real-time microkernel which looks like POSIX to
applications.  actually got interprocess communication right.  It
has to; everything in QNX is done by interprocess communication,
including all I/O.  File systems and drivers are ordinary programs.
The kernel just handles message passing, CPU dispatching, and timers.
QNX's message passing looks more like a subroutine call than an
I/O operation, and this has important implications for efficient CPU 
dispatching.


   Any QNX system call that can block is really a message pass.  Message
passes can be given a timeout, and they can be canceled from another
thread.  The "system call" then returns with an error status.  This
provides a way to keep threads from getting "stuck" in a system call.

(Unfortunately, QNX, which survived as a separate company for decades,
sold out to Harmon (car audio) a few years ago. They had no clue
what to do with an OS.  They sold it to Research In Motion, the
Blackberry company, which is in the process of tanking.)

Python's thread model is unusually dumb.  You can't send signals
to other threads, you can't force an exception in another thread, and
I won't even get into the appalling mess around the Global Interpreter
Lock.  This has forced the use of subprocesses where, in other 
languages, you'd use threads.  Of course, you load a new copy of the

interpreter in each thread, so this bloats memory usage.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: MySQLdb not allowing hyphen

2012-02-08 Thread John Nagle

On 2/5/2012 2:46 PM, Chris Rebert wrote:

On Sun, Feb 5, 2012 at 2:41 PM, Emeka  wrote:


Hello All,

I noticed that MySQLdb not allowing hyphen may be way to prevent injection
attack.
I have something like below:

"insert into reviews(message, title)values('%s', '%s')" %( "We don't know
where to go","We can't wait till morrow" )

ProgrammingError(1064, "You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right syntax to
use near 't know where to go.

How do I work around this error?


Don't use raw SQL strings in the first place. Use a proper
parameterized query, e.g.:

cursor.execute("insert into reviews(message, title) values (%s, %s)",
 ("We don't know where to go", "We can't wait till morrow"))


  Yes.  You are doing it wrong.  Do NOT use the "%" operator when
putting SQL queries together.  Let "cursor.execute" fill them
in.  It knows how to escape special characters in the input fields,
which will fix your bug and prevent SQL injection.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: changing sys.path

2012-02-08 Thread John Nagle

On 2/1/2012 8:15 AM, Andrea Crotti wrote:

So suppose I want to modify the sys.path on the fly before running some
code
which imports from one of the modules added.

at run time I do
sys.path.extend(paths_to_add)

but it still doesn't work and I get an import error.


   Do

import sys

first.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Common LISP-style closures with Python

2012-02-09 Thread John Nagle

On 2/3/2012 4:27 PM, Antti J Ylikoski wrote:


In Python textbooks that I have read, it is usually not mentioned that
we can very easily program Common LISP-style closures with Python. It
is done as follows:


   Most dynamic languages have closures.  Even Perl and Javascript
have closures.  Javascript really needs them, because the "callback"
orientation of Javascript means you often need to package up state
and pass it into a callback.  It really has very little to do with
functional programming.

   If you want to see a different style of closure, check out Rust,
Mozilla's new language.  Rust doesn't have the "spaghetti stack"
needed to implement closures, so it has more limited closure
semantics.  It's more like some of the C add-ons for closures,
but sounder.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: frozendict

2012-02-10 Thread John Nagle

On 2/10/2012 10:14 AM, Nathan Rice wrote:

Lets also not forget that knowing an object is immutable lets you do a
lot of optimizations; it can be inlined, it is safe to convert to a
contiguous block of memory and stuff in cache, etc.  If you know the
input to a function is guaranteed to be frozen you can just go crazy.
Being able to freeze(anyobject) seems like a pretty clear win.
Whether or not it is pythonic is debatable.  I'd argue if the meaning
of pythonic in some context is limiting, we should consider updating
the term rather than being dogmatic.


A real justification for the ability to make anything immutable is
to make it safely shareable between threads.  If it's immutable, it
doesn't have to be locked for access.  Mozilla's new "Rust"
language takes advantage of this.  Take a look at Rust's concurrency
semantics.  They've made some progress.

    John Nagle


--
http://mail.python.org/mailman/listinfo/python-list


Re: frozendict

2012-02-13 Thread John Nagle

On 2/10/2012 9:52 PM, 8 Dihedral wrote:

在 2012年2月11日星期六UTC+8上午2时57分34秒,John Nagle写道:

On 2/10/2012 10:14 AM, Nathan Rice wrote:

Lets also not forget that knowing an object is immutable lets you do a
lot of optimizations; it can be inlined, it is safe to convert to a
contiguous block of memory and stuff in cache, etc.  If you know the
input to a function is guaranteed to be frozen you can just go crazy.
Being able to freeze(anyobject) seems like a pretty clear win.
Whether or not it is pythonic is debatable.  I'd argue if the meaning
of pythonic in some context is limiting, we should consider updating
the term rather than being dogmatic.


  A real justification for the ability to make anything immutable is
to make it safely shareable between threads.  If it's immutable, it
doesn't have to be locked for access.  Mozilla's new "Rust"
language takes advantage of this.  Take a look at Rust's concurrency
semantics.  They've made some progress.

John Nagl



Lets model the system as an asynchronous set of objects with multiple threads
performing operatons on objects as in the above.


   I'd argue for a concurrency system where everything is either 
immutable, unshared, synchronized, or owned by a synchronized object.

This eliminates almost all explicit locking.

   Python's use of immutability has potential in that direction, but
Python doesn't do anything with that concept.

John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Looking for PyPi 2.0...

2012-02-15 Thread John Nagle

On 2/8/2012 9:47 AM, Chris Rebert wrote:

On Wed, Feb 8, 2012 at 8:54 AM, Nathan Rice
  wrote:

As a user:
* Finding the right module in PyPi is a pain because there is limited,
low quality semantic information, and there is no code indexing.


   CPAN does it right.  They host the code.  (PyPi is just a
collection of links).  They have packaging standards (PyPi
does not.)  CPAN tends not to be full of low-quality modules
that do roughly the same thing.

   If you want to find a Python module, Google is more useful
than PyPi.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Script randomly exits for seemingly no reason with strange traceback

2012-02-15 Thread John Nagle

On 2/4/2012 12:43 PM, Chris Angelico wrote:

On Sun, Feb 5, 2012 at 3:32 AM, Andrew Berg  wrote:

On 2/3/2012 9:15 PM, Chris Angelico wrote:

Do you call on potentially-buggy external modules?

It imports one module that does little more than define a few simple
functions. There's certainly no (intentional) interpreter hackery at work.


   Are you doing a conditional import, one that takes place after load
time?  If you do an import within a function or class, it is executed
when the code around it executes.  If you import a file with a
syntax error during execution, you could get the error message you're
getting.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


"Decoding unicode is not supported" in unusual situation

2012-03-07 Thread John Nagle

I'm getting

line 79, in tounicode
return(unicode(s, errors='replace'))
TypeError: decoding Unicode is not supported

from this, under Python 2.7:

def tounicode(s) :
if type(s) == unicode :
return(s)
return(unicode(s, errors='replace'))

That would seem to be impossible.  But it's not.
"s" is generated from the "suds" SOAP client.  The documentation
for "suds" says:

"Suds leverages python meta programming to provide an intuative API for 
consuming web services. Runtime objectification of types defined in the 
WSDL is provided without class generation."


I think that somewhere in "suds", they subclass the "unicode" type.
That's almost too cute.

The proper test is

isinstance(s,unicode)


John Nagle


--
http://mail.python.org/mailman/listinfo/python-list


Re: "Decoding unicode is not supported" in unusual situation

2012-03-07 Thread John Nagle

On 3/7/2012 3:42 AM, Steven D'Aprano wrote:


I *think* he is complaining that some other library -- suds? -- has a
broken test for Unicode, by using:

if type(s) is unicode: ...

instead of

if isinstance(s, unicode): ...

Consequently, when the library passes a unicode *subclass* to the
tounicode function, the "type() is unicode" test fails. That's a bad bug.


   No, that was my bug.

   The library bug, if any, is that you can't apply

unicode(s, errors='replace')

to a Unicode string. TypeError("Decoding unicode is not supported") is 
raised.  However


unicode(s)

will accept Unicode input.

The Python documentation
("http://docs.python.org/library/functions.html#unicode";) does not 
mention this.  It is therefore necessary to check the type before

calling "unicode", or catch the undocumented TypeError exception
afterward.


John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: "Decoding unicode is not supported" in unusual situation

2012-03-08 Thread John Nagle

On 3/7/2012 6:18 PM, Ben Finney wrote:

Steven D'Aprano  writes:


On Thu, 08 Mar 2012 08:48:58 +1100, Ben Finney wrote:

I think that's a Python bug. If the latter succeeds as a no-op, the
former should also succeed as a no-op. Neither should ever get any
errors when ‘s’ is a ‘unicode’ object already.


No. The semantics of the unicode function (technically: a type
constructor) are well-defined, and there are two distinct behaviours:


   Right. The real problem is that Python 2.7 doesn't have distinct
"str" and "bytes" types.  type(bytes() returns 
"str" is assumed to be ASCII 0..127, but that's not enforced.
"bytes" and "str" should have been distinct types, but
that would have broken much old code.  If they were distinct, then
constructors could distinguish between string type conversion
(which requires no encoding information) and byte stream decoding.

   So it's possible to get junk characters in a "str", and they
won't convert to Unicode.  I've had this happen with databases which
were supposed to be ASCII, but occasionally a non-ASCII character
would slip through.

   This is all different in Python 3.x, where "str" is Unicode and
"bytes" really are a distinct type.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


PyUSB available for current versions of Windows?

2012-03-09 Thread John Nagle

   I want to enumerate the available USB devices.  All I really
need is the serial number of the USB devices available to PySerial.
(When you plug in a USB device on Windows, it's assigned the next
available COM port number.  On a reboot, the numbers are reassigned.
So if you have multiple USB serial ports, there's a problem.)

   PyUSB can supposedly do this, but the documentation is misleading.
It makes a big point of being "100% Python", but that's because it's
just glue code to a platform-specific "back end" provided by someone
else.

   There's an old Windows back-end at 
"http://www.craftedge.com/products/libusb.html";, but it was written for 
Windows XP, and can supposedly be run in "compatibility mode" on Windows 
Vista. Current versions of Windows, who knows? It's not open source, and 
it comes from someone who sells paper-cutting machines for crafters.


   There's another Windows back end at

https://sourceforge.net/apps/trac/libusb-win32/wiki

but it involves installing a low-level driver in Windows.
I especially like the instruction "Close all applications which use USB 
devices before installing."  Does this include the keyboard and mouse?
They also warn "The device driver can not be easily removed from the 
system."


John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: "Decoding unicode is not supported" in unusual situation

2012-03-09 Thread John Nagle

On 3/8/2012 2:58 PM, Prasad, Ramit wrote:

 Right. The real problem is that Python 2.7 doesn't have distinct
"str" and "bytes" types.  type(bytes() returns
"str" is assumed to be ASCII 0..127, but that's not enforced.
"bytes" and "str" should have been distinct types, but
that would have broken much old code.  If they were distinct, then
constructors could distinguish between string type conversion
(which requires no encoding information) and byte stream decoding.

 So it's possible to get junk characters in a "str", and they
won't convert to Unicode.  I've had this happen with databases which
were supposed to be ASCII, but occasionally a non-ASCII character
would slip through.


bytes and str are just aliases for each other.


   That's true in Python 2.7, but not in 3.x.  From 2.6 forward,
"bytes" and "str" were slowly being separated.  See PEP 358.
Some of the problems in Python 2.7 come from this ambiguity.
Logically, "unicode" of "str" should be a simple type conversion
from ASCII to Unicode, while "unicode" of "bytes" should
require an encoding.  But because of the bytes/str ambiguity
in Python 2.6/2.7, the behavior couldn't be type-based.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: "Decoding unicode is not supported" in unusual situation

2012-03-10 Thread John Nagle

On 3/9/2012 4:57 PM, Steven D'Aprano wrote:

On Fri, 09 Mar 2012 10:11:58 -0800, John Nagle wrote:
This demonstrates a gross confusion about both Unicode and Python. John,
I honestly don't mean to be rude here, but if you actually believe that
(rather than merely expressing yourself poorly), then it seems to me that
you are desperately misinformed about Unicode and are working on the
basis of some serious misapprehensions about the nature of strings.

In Python 2.6/2.7, there is no ambiguity between str/bytes. The two names
are aliases for each other. The older name, "str", is a misnomer, since
it *actually* refers to bytes (and always has, all the way back to the
earliest days of Python). At best, it could be read as "byte string" or
"8-bit string", but the emphasis should always be on the *bytes*.


   There's an inherent ambiguity in that "bytes" and "str" are really
the same type in Python 2.6/2.7.  That's a hack for backwards
compatibility, and it goes away in 3.x.  The notes for PEP 358
admit this.

   It's implicit in allowing

unicode(s)

with no encoding, on type "str", that there is an implicit
assumption that s is ASCII.  Arguably, "unicode()" should
have required an encoding in all cases.

Or "str" and "bytes" should have been made separate types in
Python 2.7, in which case unicode() of a str would be a safe
ASCII to Unicode translation, and unicode() of a bytes object
would require an encoding.  But that would break too much old code.
So we have an ambiguity and a hack.

"While Python 2 also has a unicode string type, the fundamental 
ambiguity of the core string type, coupled with Python 2's default 
behavior of supporting automatic coercion from 8-bit strings to unicode 
objects when the two are combined, often leads to UnicodeErrors"

- PEP 404

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


html5lib not thread safe. Is the Python SAX library thread-safe?

2012-03-11 Thread John Nagle

   "html5lib" is apparently not thread safe.
(see "http://code.google.com/p/html5lib/issues/detail?id=189";)
Looking at the code, I've only found about three problems.
They're all the usual "cached in a global without locking" bug.
A few locks would fix that.

   But html5lib calls the XML SAX parser. Is that thread-safe?
Or is there more trouble down at the bottom?

(I run a multi-threaded web crawler, and currently use BeautifulSoup,
which is thread safe, although dated.  I'm looking at converting to
html5lib.)

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: html5lib not thread safe. Is the Python SAX library thread-safe?

2012-03-11 Thread John Nagle

On 3/11/2012 2:45 PM, Cameron Simpson wrote:

On 11Mar2012 13:30, John Nagle  wrote:
| "html5lib" is apparently not thread safe.
| (see "http://code.google.com/p/html5lib/issues/detail?id=189";)
| Looking at the code, I've only found about three problems.
| They're all the usual "cached in a global without locking" bug.
| A few locks would fix that.
|
| But html5lib calls the XML SAX parser. Is that thread-safe?
| Or is there more trouble down at the bottom?
|
| (I run a multi-threaded web crawler, and currently use BeautifulSoup,
| which is thread safe, although dated.  I'm looking at converting to
| html5lib.)

IIRC, BeautifulSoup4 may do that for you:

   http://www.crummy.com/software/BeautifulSoup/bs4/doc/

   http://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser
 "Beautiful Soup 4 uses html.parser by default, but you can plug in
 lxml or html5lib and use that instead."


   I want to use HTML5 standard parsing of bad HTML.  (HTML5 formally
defines how to parse bad comments, for example.)  I currently have
a modified version of BeautifulSoup that's more robust than the
standard one, but it doesn't handle errors the same way browsers do.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: html5lib not thread safe. Is the Python SAX library thread-safe?

2012-03-12 Thread John Nagle

On 3/12/2012 3:05 AM, Stefan Behnel wrote:

John Nagle, 11.03.2012 21:30:

"html5lib" is apparently not thread safe.
(see "http://code.google.com/p/html5lib/issues/detail?id=189";)
Looking at the code, I've only found about three problems.
They're all the usual "cached in a global without locking" bug.
A few locks would fix that.

But html5lib calls the XML SAX parser. Is that thread-safe?
Or is there more trouble down at the bottom?

(I run a multi-threaded web crawler, and currently use BeautifulSoup,
which is thread safe, although dated.  I'm looking at converting to
html5lib.)


You may also consider moving to lxml. BeautifulSoup supports it as a parser
backend these days, so you wouldn't even have to rewrite your code to use
it. And performance-wise, well ...

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Stefan


   I want to move to html5lib because it handles HTML errors as
specified by the HTML5 spec, which is what all newer browsers do.
The HTML5 spec actually specifies, in great detail, how to parse
common errors in HTML.  It's amusing seeing that formalized.
Malformed comments ( <- instead of <-- ) are now handled in
a standard way, for example.  So I'm trying to get html5parser
fixed for thread safety.

   John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: are int, float, long, double, side-effects of computer engineering?

2012-03-13 Thread John Nagle

On 3/7/2012 2:02 PM, Russ P. wrote:

On Mar 6, 7:25 pm, rusi  wrote:

On Mar 6, 6:11 am, Xah Lee  wrote:



I might add that Mathematica is designed mainly for symbolic
computation, whereas IEEE floating point numbers are intended for
numerical computation. Those are two very different endeavors. I
played with Mathematica a bit several years ago, and I know it can do
numerical computation too. I wonder if it resorts to IEEE floating
point numbers when it does.


   Mathematica has, for some computations, algorithms to determine the
precision of results.  This is different than trying to do infinite
precision arithmetic, which doesn't help as soon as you get to trig
functions.  It's about bounding the error.

   It's possible to do bounded arithmetic, where you carry along an
upper and lower bound on each number.  The problem is what to do
about comparisons.  Comparisons between bounded numbers are
ambiguous when the ranges overlap.  Algorithms have to be designed
to deal with that.  Mathematica has such algorithms for some
operations, especially numerical integration.

   It's a very real issue. I had to deal with this when I was
writing the first "ragdoll physics" system that worked right,
back in the 1990s.  Everybody else's system blew up on the hard
cases; mine just slowed down.  Correct integration over a force
function that's changing over 18 orders of magnitude is difficult,
but quite possible.

(Here it is, from 1997: "http://www.youtube.com/watch?v=5lHqEwk7YHs";)
(A test with a heavy object:
"http://www.youtube.com/watch?v=-DaWIHc1VLY";.  Most physics engines
don't do heavy objects well. Everything looks too light. We call
this the "boink problem.")

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Enchancement suggestion for argparse: intuit type from default

2012-03-14 Thread John Nagle

On 3/13/2012 2:08 PM, Roy Smith wrote:

Using argparse, if I write:

 parser.add_argument('--foo', default=100)

it seems like it should be able to intuit that the type of foo should
be int (i.e. type(default)) without my having to write:

 parser.add_argument('--foo', type=int, default=100)

Does this seem like a reasonable enhancement to argparse?


   default=None

presents some problems.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Does anyone actually use PyPy in production?

2012-03-16 Thread John Nagle

  Does anyone run PyPy in production?

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Programming D. E. Knuth in Python with the Deterministic Finite Automaton construct

2012-03-17 Thread John Nagle

On 3/17/2012 9:31 AM, Antti J Ylikoski wrote:

On 17.3.2012 17:47, Roy Smith wrote:

In article,
Antti J Ylikoski wrote:


I came across the problem, which would be the clearest way to program
such algorithms with a programming language such as Python, which has
no GOTO statement.

Oh, my, I can't even begin to get my head around all the nested
conditionals. And that for a nearly trivial machine with only 5 states.
Down this path lies madness.


   Right.  Few programs should be written as state machines.
As a means of rewriting Knuth's algorithms, it's inappropriate.

   Some should.  LALR(1) parsers, such as what YACC and Bison
generate, are state machines.  They're huge collections of nested
switch statements.

   Python doesn't have a "switch" or "case" statement.  Which is
surprising, for a language that loves dictionary lookups.
You can create a dict full of function names and lambdas, but
it's clunky looking.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: urllib.urlretrieve never returns???

2012-03-20 Thread John Nagle

On 3/17/2012 9:34 AM, Chris Angelico wrote:

2012/3/18 Laszlo Nagy:

In the later case, "log.txt" only contains "#1" and nothing else. If I look
at pythonw.exe from task manager, then its shows +1 thread every time I
click the button, and "#1" is appended to the file.


   Does it fail to retrieve on all URLs, or only on some of them?

   Running a web crawler, I've seen some pathological cases.
There are a very few sites that emit data very, very slowly,
but don't time out because they are making progress.  There are
also some sites where attempting to negotiate a SSL connection
results in the SSL protocol reaching a point where the host end
is supposed to finish the handshake, but it doesn't.

   The odds are against this being the problem. I see problems
like that in maybe 1 in 100,000 URLs.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Fetching data from a HTML file

2012-03-24 Thread John Nagle

On 3/23/2012 10:12 PM, Jon Clements wrote:

ROBOT Framework


   Would people please stop using robotic names for
things that aren't robots?  Thank you.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: "convert" string to bytes without changing data (encoding)

2012-03-28 Thread John Nagle

On 3/28/2012 10:43 AM, Peter Daum wrote:

On 2012-03-28 12:42, Heiko Wundram wrote:

Am 28.03.2012 11:43, schrieb Peter Daum:



The longer story of my question is: I am new to python (obviously), and
since I am not familiar with either one, I thought it would be advisory
to go for python 3.x. The biggest problem that I am facing is, that I
am often dealing with data, that is basically text, but it can contain
8-bit bytes. In this case, I can not safely assume any given encoding,
but I actually also don't need to know - for my purposes, it would be
perfectly good enough to deal with the ascii portions and keep anything
else unchanged.


   So why let the data get into a "str" type at all? Do everything
end to end with "bytes" or "bytearray" types.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Will MySQL ever be supported for Python 3.x?

2012-03-30 Thread John Nagle

The MySQLdb entry on SourceForge
(http://sourceforge.net/projects/mysql-python/)
web site still says the last supported version of Python is 2.6.
PyPi says the last supported version is Python 2.5.  The
last download is from 2007.

I realize there are unsupported fourth-party versions from other
sources. (http://www.lfd.uci.edu/~gohlke/pythonlibs/) But those
are just blind builds; they haven't been debugged.

MySQL Connector (http://forge.mysql.com/projects/project.php?id=302)
is still pre-alpha.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Will MySQL ever be supported for Python 3.x?

2012-03-30 Thread John Nagle

On 3/30/2012 2:32 PM, Irmen de Jong wrote:

Try Oursql instead  http://packages.python.org/oursql/
"oursql is a new set of MySQL bindings for python 2.4+, including python 3.x"


   Not even close to being compatible with existing code.   Every SQL
statement has to be rewritten, with the parameters expressed
differently.  It's a good approach, but very incompatible.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit

2012-03-31 Thread John Nagle

   Some versions of CentOS 6 seem to have a potential
getaddrinfo exploit.  See

To test, try this from a command line:

ping example

If it fails, good.  If it returns pings from "example.com", bad.
The getaddrinfo code is adding ".com" to the domain.

If that returns pings, please try

ping noexample.com

There is no "noexample.com" domain in DNS.  This should time out.
But if you get ping replies from a CNET site, let me know.
Some implementations try "noexample.com", get a NXDOMAIN error,
and try again, adding ".com".  This results in a ping of
"noexample.com,com".  "com.com" is a real domain, run by a
unit of CBS, and they have their DNS set up to catch all
subdomains and divert them to, inevitably, an ad-oriented
junk search page.  (You can view the junk page at
"http://slimeball.com.com";.  Replace "slimeball" with anything
else you like; it will still resolve.)

If you find a case where "ping noexample.com" returns a reply,
then try it in Python:


import socket
socket.getaddrinfo("noexample.com", 80)

That should return an error.  If it returns the IP address of
CNET's ad server, there's trouble.

This isn't a problem with the upstream DNS.  Usually, this sort
of thing means you're using some sleazy upstream DNS provider
like Comcast.  That's not the case here.  "host" and "nslookup"
aren't confused.  Only programs that use getaddrinfo, like "ping",
"wget", and Python, have this ".com" appending thing.  Incidentally,
if you try "noexample.net", there's no problem, because the
owner of "net.com" hasn't set up their DNS to exploit this.

And, of course, it has nothing to do with browser toolbars.  This
is at a much lower level.

If you can make this happen, report back the CentOS version and
the library version, please.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit

2012-03-31 Thread John Nagle

On 3/31/2012 9:26 PM, Owen Jacobson wrote:

On 2012-03-31 22:58:45 +, John Nagle said:


Some versions of CentOS 6 seem to have a potential
getaddrinfo exploit. See

To test, try this from a command line:

ping example

If it fails, good. If it returns pings from "example.com", bad.
The getaddrinfo code is adding ".com" to the domain.


There is insufficient information in your diagnosis to make that
conclusion. For example: what network configuration services (DHCP
clients and whatnot, along with various desktop-mode configuration tools
and services) are running? What kernel and libc versions are you
running? What are the contents of /etc/nsswitch.conf? Of
/etc/resolv.conf (particularly, the 'search' entries)? What do
/etc/hosts, LDAP, NIS+, or other hostname services say about the names
you're resolving? Does a freestanding C program that directly calls
getaddrinfo and that runs in a known-good loader environment exhibit the
same surprises? Name resolution is not so simple that you can conclude
"getaddrinfo is misbehaving" from the behaviour of ping, or of your
Python sample, alone.

In any case, this seems more appropriate for a Linux or a CentOS
newsgroup/mailing list than a Python one. Please do not reply to this
post in comp.lang.python.

-o


   I expected that some noob would have a reply like that.

   A more detailed discussion appears here:

http://serverfault.com/questions/341383/possible-nxdomain-hijacking

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: [OT] getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit

2012-04-01 Thread John Nagle

On 4/1/2012 9:26 AM, Michael Torrie wrote:

On 03/31/2012 04:58 PM, John Nagle wrote:

If you can make this happen, report back the CentOS version and
the library version, please.


CentOS release 6.2 (Final)
glibc-2.12-1.47.el6_2.9.x86_64

example does not ping
example.com does not resolve to example.com.com

Removed all "search" and "domain" entries from /etc/resolve.conf


It's a design bug in glibc. I just submitted a bug report.

  http://sourceware.org/bugzilla/show_bug.cgi?id=13935

It only appears if you have a machine with a two-component domain
name ending in ".com" as the actual machine name.  Most hosting
services generate some long arbitrary name as the primary name,
but I happen to have a server set up as "companyname.com".

The default rule for looking up domains in glibc is that the
"domain" is everything after the FIRST ".".  Failed lookups
are retried with that "domain" appended.  The idea, back
in the 1980s, was that if you're on "foo.bigcompany.com",
and look up "bar", it's looked up as "bar.bigcompany.com".
This idea backfires when the actual hostname only
has two components, and the search just appends ".com".

There is a "com.com" domain, and this gets them traffic.
They exploit this to send you (where else) to an ad-heavy page.
Try "python.com.com", for example,and you'll get an ad for a
Java database.

The workaround in Python is to add the AI_CANONNAME flag
to getaddrinfo calls, then check that the returned domain
name matches the one put in.

Good case:
>>> s = "python.org"
>>> socket.getaddrinfo(s, 80, 0,0, 0, socket.AI_CANONNAME)
[(2, 1, 6, 'python.org', ('82.94.164.162', 80)), (2, 2, 17, '', 
('82.94.164.162', 80)), (2, 3, 0, '', ('82.94.164.162', 80)), (10, 1, 6, 
'', ('2001:888:2000:d::a2', 80, 0, 0)), (10, 2, 17, '', 
('2001:888:2000:d::a2', 80, 0, 0)), (10, 3, 0, '', 
('2001:888:2000:d::a2', 80, 0, 0))]


Bad case:
>>> s = "noexample.com"
>>> socket.getaddrinfo(s, 80, 0,0, 0, socket.AI_CANONNAME)
[(2, 1, 6, 'phx1-ss-2-lb.cnet.com', ('64.30.224.112', 80)), (2, 2, 17, 
'', ('64.30.224.112', 80)), (2, 3, 0, '', ('64.30.224.112', 80))]


Note that what went in isn't what came back.  getaddrinfo has
been pwned.

Again, you only get this if you're on a machine whose primary host
name is "something.com", with exactly two components ending in ".com".


John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Will MySQL ever be supported for Python 3.x?

2012-04-01 Thread John Nagle

On 3/31/2012 10:54 PM, Tim Roberts wrote:

John Nagle  wrote:


On 3/30/2012 2:32 PM, Irmen de Jong wrote:

Try Oursql instead  http://packages.python.org/oursql/
"oursql is a new set of MySQL bindings for python 2.4+, including python 3.x"


Not even close to being compatible with existing code.   Every SQL
statement has to be rewritten, with the parameters expressed
differently.  It's a good approach, but very incompatible.


Those changes can be automated, given an adequate editor.  "Oursql" is a
far better product than the primitive MySQLdb wrapper.  It is worth the
trouble.


It's an interesting approach.  As it matures, and a few big sites
use it. it will become worth looking at.

The emphasis on server-side buffering seems strange.  Are there
benchmarks indicating this is worth doing?  Does it keep transactions
locked longer?  This bug report

https://answers.launchpad.net/oursql/+question/191256

indicates a performance problem.  I'd expect server side buffering
to slow things down.  Usually, you want to drain results out of
the server as fast as possible, then close out the command,
releasing server resources and locks.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit

2012-04-02 Thread John Nagle

On 4/1/2012 1:41 PM, John Nagle wrote:

On 4/1/2012 9:26 AM, Michael Torrie wrote:

On 03/31/2012 04:58 PM, John Nagle wrote:



Removed all "search" and "domain" entries from /etc/resolve.conf


It's a design bug in glibc. I just submitted a bug report.

http://sourceware.org/bugzilla/show_bug.cgi?id=13935

It only appears if you have a machine with a two-component domain
name ending in ".com" as the actual machine name. Most hosting
services generate some long arbitrary name as the primary name,
but I happen to have a server set up as "companyname.com".

The default rule for looking up domains in glibc is that the
"domain" is everything after the FIRST ".". Failed lookups
are retried with that "domain" appended. The idea, back
in the 1980s, was that if you're on "foo.bigcompany.com",
and look up "bar", it's looked up as "bar.bigcompany.com".
This idea backfires when the actual hostname only
has two components, and the search just appends ".com".

There is a "com.com" domain, and this gets them traffic.
They exploit this to send you (where else) to an ad-heavy page.
Try "python.com.com", for example,and you'll get an ad for a
Java database.

The workaround in Python is to add the AI_CANONNAME flag
to getaddrinfo calls, then check that the returned domain
name matches the one put in.


   That workaround won't work for some domains.  For example,

>>> socket.getaddrinfo(s,"http",0,0,socket.SOL_TCP,socket.AI_CANONNAME)
[(2, 1, 6, 'orig-10005.themarker.cotcdn.net', ('208.93.137.80', 80))]

   Nor will addiing options to /etc/resolv.conf work well, because
that file is overwritten by some system administration programs.

   I may have to bring in "dnspython" to get a reliable DNS lookup.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to structure data for efficient searching

2012-04-03 Thread John Nagle

On 3/28/2012 11:39 AM, larry.mart...@gmail.com wrote:

I have the following use case:

I have a set of data that is contains 3 fields, K1, K2 and a
timestamp. There are duplicates in the data set, and they all have to
processed.

Then I have another set of data with 4 fields: K3, K4, K5, and a
timestamp. There are also duplicates in that data set, and they also
all have to be processed.

I need to find all the items in the second data set where K1==K3 and
K2==K4 and the 2 timestamps are within 20 seconds of each other.

I have this working, but the way I did it seems very inefficient - I
simply put the data in 2 arrays (as tuples) and then walked through
the entire second data set once for each item in the first data set,
looking for matches.

Is there a better, more efficient way I could have done this?


   How big are the data sets?  Millions of entries?  Billions?
Trillions?  Will all the data fit in memory, or will this need
files or a database.

   In-memory, it's not hard.  First, decide which data set is smaller.
That one gets a dictionary keyed by K1 or K3, with each entry being
a list of tuples.  Then go through the other data set linearly.

   You can also sort one database by K1, the other by K3, and
match.  Then take the matches, sort by K2 and K4, and match again.
Sort the remaining matches by timestamp and pull the ones within
the threshold.

   Or you can load all the data into a database with a query
optimizer, like MySQL, and let it figure out, based on the
index sizes, how to do the join.

   All of these approaches are roughly O(N log N), which
beats the O(N^2) approach you have now.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: getaddrinfo NXDOMAIN exploit - please test on CentOS 6 64-bit

2012-04-04 Thread John Nagle

On 4/2/2012 6:53 PM, John Nagle wrote:

On 4/1/2012 1:41 PM, John Nagle wrote:

On 4/1/2012 9:26 AM, Michael Torrie wrote:

On 03/31/2012 04:58 PM, John Nagle wrote:



Removed all "search" and "domain" entries from /etc/resolve.conf


It's a design bug in glibc. I just submitted a bug report.

http://sourceware.org/bugzilla/show_bug.cgi?id=13935


  The same bug is in "dnspython". Submitted a bug report there,
too.

   https://github.com/rthalley/dnspython/issues/6

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python Gotcha's?

2012-04-07 Thread John Nagle

On 4/4/2012 3:34 PM, Miki Tebeka wrote:

Greetings,

I'm going to give a "Python Gotcha's" talk at work.
If you have an interesting/common "Gotcha" (warts/dark corners ...) please 
share.

(Note that I want over http://wiki.python.org/moin/PythonWarts already).

Thanks,
--
Miki


A few Python "gotchas":

1.  Nobody is really in charge of third party packages.  In the
Perl world, there's a central repository, CPAN, and quality
control.  Python's "pypi" is just a collection of links.  Many
major packages are maintained by one person, and if they lose
interest, the package dies.

2.  C extensions are closely tied to the exact version of CPython
you're using, and finding a properly built version may be difficult.

3.  "eggs".  The "distutils" system has certain assumptions built into
it about where things go, and tends to fail in obscure ways.  There's
no uniform way to distribute a package.

4.  The syntax for expression-IF is just weird.

5.  "+" as concatenation.  This leads to strange numerical
semantics, such as (1,2) + (3,4) is (1,2,3,4).  But, for
"numarray" arrays, "+" does addition.  What does a mixed
mode expression of a numarray and a tuple do?  Guess.

5.  It's really hard to tell what's messing with the
attributes of a class, since anything can store into
anything.  This creates debugging problems.

6.  Multiple inheritance is a mess.  Especially "super".

7.  Using attributes as dictionaries can backfire.  The
syntax of attributes is limited.  So turning XML or HTML
structures into Python objects creates problems.

8.  Opening a URL can result in an unexpected prompt on
standard input if the URL has authentication.  This can
stall servers.

9.  Some libraries aren't thread-safe.  Guess which ones.

10. Python 3 isn't upward compatible with Python 2.

John Nagle


--
http://mail.python.org/mailman/listinfo/python-list


Re: Python Gotcha's?

2012-04-08 Thread John Nagle

On 4/8/2012 10:55 AM, Miki Tebeka wrote:

8.  Opening a URL can result in an unexpected prompt on
standard input if the URL has authentication.  This can
stall servers.

Can you give an example? I don't think anything in the standard library does 
that.


   It's in "urllib".  See

http://docs.python.org/library/urllib.html

"When performing basic authentication, a FancyURLopener instance calls 
its prompt_user_passwd() method. The default implementation asks the 
users for the required information on the controlling terminal. A 
subclass may override this method to support more appropriate behavior 
if needed."


A related "gotcha" is knowing that "urllib" sucks and you should use
"urllib2".

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Donald E. Knuth in Python, cont'd

2012-04-11 Thread John Nagle

On 4/11/2012 6:03 AM, Antti J Ylikoski wrote:


I wrote about a straightforward way to program D. E. Knuth in Python,
and received an excellent communcation about programming Deterministic
Finite Automata (Finite State Machines) in Python.

The following stems from my Knuth in Python programming exercises,
according to that very good communication. (By Roy Smith.)

I'm in the process of delving carefully into Knuth's brilliant and
voluminous work The Art of Computer Programming, Parts 1--3 plus the
Fascicles in Part 4 -- the back cover of Part 1 reads:

"If you think you're a really good programmer -- read [Knuth's] Art of
Computer Programming... You should definitely send me a résumé if you
can read the whole thing." -- Bill Gates.

(Microsoft may in the future receive some e-mail from me.)


You don't need those books as much as you used to.
You don't have to write collections, hash tables, and sorts much
any more.  Those are solved problems and there are good libraries.
Most of the basics are built into Python.

Serious programmers should read those books, much as they should
read von Neumann's "First Draft of a Report on the EDVAC", for
background on how things work down at the bottom.  But they're
no longer essential desk references for most programmers.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: python module development workflow

2012-04-12 Thread John Nagle

On 4/11/2012 1:04 PM, Miki Tebeka wrote:

Could any expert suggest an authoritative and complete guide for
developing python modules? Thanks!

I'd start with http://docs.python.org/distutils/index.html


Make sure that

python setup.py build
python setup.py install

works.

Don't use the "rotten egg" distribution system.
(http://packages.python.org/distribute/easy_install.html)

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Deep merge two dicts?

2012-04-12 Thread John Nagle

On 4/12/2012 10:41 AM, Roy Smith wrote:

Is there a simple way to deep merge two dicts?  I'm looking for Perl's
Hash::Merge (http://search.cpan.org/~dmuey/Hash-Merge-0.12/Merge.pm)
in Python.


def dmerge(a, b) :
   for k in a :
v = a[k]
if isinstance(v, dict) and k in b:
dmerge(v, b[k])
   a.update(b)



--
http://mail.python.org/mailman/listinfo/python-list


Re: why () is () and [] is [] work in other way?

2012-04-22 Thread John Nagle

On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote:

On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote:


I believe it says somewhere in the Python docs that it's undefined and
implementation-dependent whether two identical expressions have the same
identity when the result of each is immutable


   Bad design.  Where "is" is ill-defined, it should raise ValueError.

A worse example, one which is very implementation-dependent:

http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers

>>> a = 256
>>> b = 256
>>> a is b
True   # this is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False

Operator "is" should be be an error between immutables
unless one is a built-in constant.  ("True" and "False"
should be made hard constants, like "None". You can't assign
to None, but you can assign to True, usually with
unwanted results.  It's not clear why True and False
weren't locked down when None was.)

John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: global vars across modules

2012-04-22 Thread John Nagle

On 4/22/2012 12:39 PM, mambokn...@gmail.com wrote:



Question:
How can I access to the global 'a' in file_2 without resorting to the whole 
name 'file_1.a' ?


Actually, it's better to use the fully qualified name "file_1.a". 
Using "import *" brings in everything in the other module, which often

results in a name clash.

Just do

import file_1

and, if desired

localnamefora = file_1.a



    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: why () is () and [] is [] work in other way?

2012-04-22 Thread John Nagle

On 4/22/2012 3:17 PM, John Roth wrote:

On Sunday, April 22, 2012 1:43:36 PM UTC-6, John Nagle wrote:

On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote:

On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote:


I believe it says somewhere in the Python docs that it's
undefined and implementation-dependent whether two identical
expressions have the same identity when the result of each is
immutable


Bad design.  Where "is" is ill-defined, it should raise
ValueError.

A worse example, one which is very implementation-dependent:

http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers




a = 256

b = 256 a is b

True   # this is an expected result

a = 257 b = 257 a is b

False

Operator "is" should be be an error between immutables unless one
is a built-in constant.  ("True" and "False" should be made hard
constants, like "None". You can't assign to None, but you can
assign to True, usually with unwanted results.  It's not clear why
True and False weren't locked down when None was.)

John Nagle


Three points. First, since there's no obvious way of telling whether
an arbitrary user-created object is immutable, trying to make "is"
fail in that case would be a major change to the language.


   If a program fails because such a comparison becomes invalid, it
was broken anyway.

   The idea was borrowed from LISP, which has both "eq" (pointer 
equality) and and "equals" (compared equality).  It made somewhat

more sense in the early days of LISP, when the underlying
representation of everything was well defined.


Second: the definition of "is" states that it determines whether two
objects are the same object; this has nothing to do with mutability
or immutability.

The id([]) == id([]) thing is a place where cPython's implementation
is showing through. It won't work that way in any implementation that
uses garbage collection and object compaction. I think Jython does it
that way, I'm not sure about either IronPython or PyPy.


   That represents a flaw in the language design - the unexpected
exposure of an implementation dependency.


Third: True and False are reserved names and cannot be assigned to in
the 3.x series. They weren't locked down in the 2.x series when they
were introduced because of backward compatibility.


That's one of the standard language designer fuckups.  Somebody
starts out thinking that 0 and 1 don't have to be distinguished from
False and True.  When they discover that they do, the backwards
compatibility sucks.  C still suffers from this.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: why () is () and [] is [] work in other way?

2012-04-23 Thread John Nagle

On 4/22/2012 9:34 PM, Steven D'Aprano wrote:

On Sun, 22 Apr 2012 12:43:36 -0700, John Nagle wrote:


On 4/20/2012 9:34 PM, john.tant...@gmail.com wrote:

On Friday, April 20, 2012 12:34:46 PM UTC-7, Rotwang wrote:


I believe it says somewhere in the Python docs that it's undefined and
implementation-dependent whether two identical expressions have the
same identity when the result of each is immutable


 Bad design.  Where "is" is ill-defined, it should raise ValueError.


"is" is never ill-defined. "is" always, without exception, returns True
if the two operands are the same object, and False if they are not. This
is literally the simplest operator in Python.

John, you've been using Python for long enough that you should know this.
I can only guess that you are trolling, although I can't imagine why.


   Because the language definition should not be what CPython does.
As PyPy advances, we need to move beyond that.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: why () is () and [] is [] work in other way?

2012-04-25 Thread John Nagle

On 4/25/2012 5:01 PM, Steven D'Aprano wrote:

On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote:


Though, maybe it's better to use a different keyword than 'is' though,
due to the plain English
connotations of the term; I like 'sameobj' personally, for whatever
little it matters.  Really, I think taking away the 'is' operator
altogether is better, so the only way to test identity is:
 id(x) == id(y)


Four reasons why that's a bad idea:

1) The "is" operator is fast, because it can be implemented directly by
the interpreter as a simple pointer comparison (or equivalent).


   This assumes that everything is, internally, an object.  In CPython,
that's the case, because Python is a naive interpreter and everything,
including numbers, is "boxed".  That's not true of PyPy or Shed Skin.
So does "is" have to force the creation of a temporary boxed object?

   The concept of "object" vs. the implementation of objects is
one reason you don't necessarily want to expose the implementation.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: why () is () and [] is [] work in other way?

2012-04-26 Thread John Nagle

On 4/26/2012 4:45 AM, Adam Skutt wrote:

On Apr 26, 1:48 am, John Nagle  wrote:

On 4/25/2012 5:01 PM, Steven D'Aprano wrote:


On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote:



Though, maybe it's better to use a different keyword than 'is' though,
due to the plain English
connotations of the term; I like 'sameobj' personally, for whatever
little it matters.  Really, I think taking away the 'is' operator
altogether is better, so the only way to test identity is:
  id(x) == id(y)



Four reasons why that's a bad idea:



1) The "is" operator is fast, because it can be implemented directly by
the interpreter as a simple pointer comparison (or equivalent).


 This assumes that everything is, internally, an object.  In CPython,
that's the case, because Python is a naive interpreter and everything,
including numbers, is "boxed".  That's not true of PyPy or Shed Skin.
So does "is" have to force the creation of a temporary boxed object?


That's what C# does AFAIK.  Java defines '==' as value comparison for
primitives and '==' as identity comparison for objects, but I don't
exactly know how one would do that in Python.


   I would suggest that "is" raise ValueError for the ambiguous cases.
If both operands are immutable, "is" should raise ValueError.
That's the case where the internal representation of immutables
shows through.

   If this breaks a program, it was broken anyway.  It will
catch bad comparisons like

if x is 1000 :
...

which is implementation dependent.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


CPython thread starvation

2012-04-27 Thread John Nagle

I have a multi-threaded CPython program, which has up to four
threads.  One thread is simply a wait loop monitoring the other
three and waiting for them to finish, so it can give them more
work to do.  When the work threads, which read web pages and
then parse them, are compute-bound, I've had the monitoring thread
starved of CPU time for as long as 120 seconds.
It's sleeping for 0.5 seconds, then checking on the other threads
and for new work do to, so the work thread isn't using much
compute time.

   I know that the CPython thread dispatcher sucks, but I didn't
realize it sucked that bad.  Is there a preference for running
threads at the head of the list (like UNIX, circa 1979) or
something like that?

   (And yes, I know about "multiprocessing".  These threads are already
in one of several service processes.  I don't want to launch even more
copies of the Python interpreter.  The threads are usually I/O bound,
but when they hit unusually long web pages, they go compute-bound
during parsing.)

   Setting "sys.setcheckinterval" from the default to 1 seems
to have little effect.  This is on Windows 7.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: CPython thread starvation

2012-04-27 Thread John Nagle

On 4/27/2012 6:25 PM, Adam Skutt wrote:

On Apr 27, 2:54 pm, John Nagle  wrote:

  I have a multi-threaded CPython program, which has up to four
threads.  One thread is simply a wait loop monitoring the other
three and waiting for them to finish, so it can give them more
work to do.  When the work threads, which read web pages and
then parse them, are compute-bound, I've had the monitoring thread
starved of CPU time for as long as 120 seconds.


How exactly are you determining that this is the case?


   Found the problem.  The threads, after doing their compute
intensive work of examining pages, stored some URLs they'd found.
The code that stored them looked them up with "getaddrinfo()", and
did this while a lock was set.  On CentOS, "getaddrinfo()" at the
glibc level doesn't always cache locally (ref
https://bugzilla.redhat.com/show_bug.cgi?id=576801).  Python
doesn't cache either.  So huge numbers of DNS requests were being
made.  For some pages being scanned, many of the domains required
accessing a rather slow  DNS server.  The combination of thousands
of instances of the same domain, a slow DNS server, and no caching
slowed the crawler down severely.

   Added a local cache in the program to prevent this.
Performance much improved.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: CPython thread starvation

2012-04-27 Thread John Nagle

On 4/27/2012 9:20 PM, Paul Rubin wrote:

John Nagle  writes:


The code that stored them looked them up with "getaddrinfo()", and
did this while a lock was set.


Don't do that!!


Added a local cache in the program to prevent this.
Performance much improved.


Better to release the lock while the getaddrinfo is running, if you can.


   I may do that to prevent the stall.  But the real problem was all
those DNS requests.  Parallizing them wouldn't help much when it took
hours to grind through them all.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: CPython thread starvation

2012-04-27 Thread John Nagle

On 4/27/2012 9:55 PM, Paul Rubin wrote:

John Nagle  writes:

I may do that to prevent the stall.  But the real problem was all
those DNS requests.  Parallizing them wouldn't help much when it took
hours to grind through them all.


True dat.  But building a DNS cache into the application seems like a
kludge.  Unless the number of requests is insane, running a caching
nameserver on the local box seems cleaner.


   I know.  When I have a bit more time, I'll figure out why
CentOS 5 and Webmin didn't set up a caching DNS resolver by
default.

   Sometimes the number of requests IS insane.  When the
system hits a page with a thousand links, it has to resolve
all of them.  (Beyond a thousand links, we classify it as
link spam and stop.  The record so far is a page with over
10,000 links.)

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: CPython thread starvation

2012-04-29 Thread John Nagle

On 4/28/2012 1:04 PM, Paul Rubin wrote:

Roy Smith  writes:

I agree that application-level name cacheing is "wrong", but sometimes
doing it the wrong way just makes sense.  I could whip up a simple
cacheing wrapper around getaddrinfo() in 5 minutes.  Depending on the
environment (both technology and bureaucracy), getting a cacheing
nameserver installed might take anywhere from 5 minutes to a few days to ...


IMHO this really isn't one of those times.  The in-app wrapper would
only be usable to just that process, and we already know that the OP has
multiple processes running the same app on the same machine.  They would
benefit from being able to share the cache, so now your wrapper gets
more complicated.  If it's not a nameserver then it's something that
fills in for one.  And then, since the application appears to be a large
scale web spider, it probably wants to run on a cluster, and the cache
should be shared across all the machines.  So you really probably want
an industrial strength nameserver with a big persistent cache, and maybe
a smaller local cache because of high locality when crawling specific
sites, etc.


Each process is analyzing one web site, and has its own cache.
Once the site is analyzed, which usually takes about a minute,
the cache disappears.  Multiple threads are reading multiple pages
from the web site during that time.

A local cache is enough to fix the huge overhead problem of
doing a DNS lookup for every link found.  One site with a vast
number of links took over 10 hours to analyze before this fix;
now it takes about four minutes.  That solved the problem.
We can probably get an additional minor performance boost with a real
local DNS daemon, and will probably configure one.

We recently changed servers from Red Hat to CentOS, and management
from CPanel to Webmin.  Before the change, we had a local DNS daemon
with cacheing, so we didn't have this problem.  Webmin's defaults
tend to be on the minimal side.

The DNS information is used mostly to help decide whether two URLs
actually point to the same IP address, as part of deciding whether a
link is on-site or off-site.  Most of those links will never be read.
We're not crawling the entire site, just looking at likely pages to
find the name and address of the business behind the site.  (It's
part of our "Know who you're dealing with" system, SiteTruth.)
    
John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: why () is () and [] is [] work in other way?

2012-04-29 Thread John Nagle

On 4/28/2012 4:47 AM, Kiuhnm wrote:

On 4/27/2012 17:39, Adam Skutt wrote:

On Apr 27, 8:07 am, Kiuhnm wrote:

Useful... maybe, conceptually sound... no.
Conceptually, NaN is the class of all elements which are not numbers,
therefore NaN = NaN.


NaN isn't really the class of all elements which aren't numbers. NaN
is the result of a few specific IEEE 754 operations that cannot be
computed, like 0/0, and for which there's no other reasonable
substitute (e.g., infinity) for practical applications .

In the real world, if we were doing the math with pen and paper, we'd
stop as soon as we hit such an error. Equality is simply not defined
for the operations that can produce NaN, because we don't know to
perform those computations. So no, it doesn't conceptually follow
that NaN = NaN, what conceptually follows is the operation is
undefined because NaN causes a halt.


Mathematics is more than arithmetics with real numbers. We can use FP
too (we actually do that!). We can say that NaN = NaN but that's just an
exception we're willing to make. We shouldn't say that the equivalence
relation rules shouldn't be followed just because *sometimes* we break
them.


This is what programming languages ought to do if NaN is compared to
anything other than a (floating-point) number: disallow the operation
in the first place or toss an exception.


   If you do a signaling floating point comparison on IEEE floating
point numbers, you do get an exception.  On some FPUs, though,
signaling operations are slower.  On superscalar CPUs, exact
floating point exceptions are tough to implement.  They are
done right on x86 machines, mostly for backwards compatibility.
This requires an elaborate "retirement unit" to unwind the
state of the CPU after a floating point exception.  DEC Alphas
didn't have that; SPARC and MIPS machines varied by model.
ARM machines in their better modes do have that.
Most game console FPUs do not have a full IEEE implementation.

   Proper language support for floating point exceptions varies
with the platform.  Microsoft C++ on Windows does support
getting it right.  (I had to deal with this once in a physics
engine, where an overflow or a NaN merely indicated that a
shorter time step was required.)  But even there, it's
an OS exception, like a signal, not a language-level
exception.  Other than Ada, which requires it, few
languages handle such exceptions as language level
exceptions.


John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a directory structure and modifying files automatically in Python

2012-04-30 Thread John Nagle

On 4/30/2012 8:19 AM, deltaquat...@gmail.com wrote:

Hi,

I would like to automate the following task under Linux. I need to create a set 
of directories such as

075
095
100
125

The directory names may be read from a text file foobar, which also contains a 
number corresponding to each dir, like this:

075 1.818
095 2.181
100 2.579
125 3.019


In each directory I must copy a text file input.in. This file contains  two 
lines which need to be edited:


   Learn how to use a database.  Creating and managing a
big collection of directories to handle small data items is the
wrong approach to data storage.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python SOAP library

2012-05-02 Thread John Nagle

On 5/2/2012 8:35 AM, Alec Taylor wrote:

What's the best SOAP library for Python?
I am creating an API converter which will be serialising to/from a variety of 
sources, including REST and SOAP.
Relevant parsing is XML [incl. SOAP] and JSON.
Would you recommend: http://code.google.com/p/soapbox/

Or suggest another?
Thanks for all information,


   Are you implementing the client or the server?

   Python "Suds" is a good client-side library. It's strict SOAP;
you must have a WSDL file, and the XML queries and replies must
verify against the WSDL file.

https://fedorahosted.org/suds/

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


"

2012-05-03 Thread John Nagle

  An HTML page for a major site (http://www.chase.com) has
some incorrect HTML.  It contains


Re: key/value store optimized for disk storage

2012-05-06 Thread John Nagle

On 5/4/2012 12:14 AM, Steve Howell wrote:

On May 3, 11:59 pm, Paul Rubin  wrote:

Steve Howell  writes:

 compressor = zlib.compressobj()
 s = compressor.compress("foobar")
 s += compressor.flush(zlib.Z_SYNC_FLUSH)



 s_start = s
 compressor2 = compressor.copy()


   That's awful. There's no point in compressing six characters
with zlib.  Zlib has a minimum overhead of 11 bytes.  You just
made the data bigger.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a directory structure and modifying files automatically in Python

2012-05-07 Thread John Nagle

On 5/6/2012 9:59 PM, Paul Rubin wrote:

Javier  writes:

Or not... Using directories may be a way to do rapid prototyping, and
check quickly how things are going internally, without needing to resort
to complex database interfaces.


dbm and shelve are extremely simple to use.  Using the file system for a
million item db is ridiculous even for prototyping.


   Right.  Steve Bellovin wrote that back when UNIX didn't have any
database programs, let alone free ones.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a directory structure and modifying files automatically in Python

2012-05-07 Thread John Nagle

On 5/7/2012 9:09 PM, Steve Howell wrote:

On May 7, 8:46 pm, John Nagle  wrote:

On 5/6/2012 9:59 PM, Paul Rubin wrote:


Javierwrites:

Or not... Using directories may be a way to do rapid prototyping, and
check quickly how things are going internally, without needing to resort
to complex database interfaces.



dbm and shelve are extremely simple to use.  Using the file system for a
million item db is ridiculous even for prototyping.


 Right.  Steve Bellovin wrote that back when UNIX didn't have any
database programs, let alone free ones.



It's kind of sad that the Unix file system doesn't serve as an
effective key-value store at any kind of nontrivial scale.  It would
simplify a lot of programming if filenames were keys and file contents
were values.


   You don't want to go there in a file system.  Some people I know
tried that around 1970.  "A bit is a file.  An ordered collection of 
files is a file".  Didn't work out.


   There are file models other than the UNIX one.  Many older systems
had file versioning.  Tandem built their file system on top of their
distributed, redundant database system.  There are backup systems
where the name of the file is its hash, allowing elimination of
duplicates.  Most of the "free online storage" sites do that.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: serial module

2012-05-22 Thread John Nagle

On 5/22/2012 8:42 AM, Grant Edwards wrote:

On 2012-05-22, Albert van der Horst  wrote:



It is anybody's guess what they do in USB.


They do exactly what they're supposed to regardless of what sort of
bus is used to connect the CPU and the UART (ISA, PCI, PCI-express,
USB, Ethernet, etc.).


   If a device is registered as /dev/ttyUSBnn, one would hope that
the Linux USB insertion event handler, which assigns that name,
determined that the device was a serial port emulator.  Unfortunately,
the USB standard device classes
(http://www.usb.org/developers/defined_class) don't have "serial port
emulator" as a standardized device.  So there's more variation in this
area than in keyboards, mice, or storage devices.



The best answers is probably that it depends on the whim of whoever
implements the usb device.


It does not depend on anybody's whim.  The meaning of those parameters
is well-defined.


Certainly this stuff is system dependant,


No, it isn't.


   It is, a little.  There's a problem with the way Linux does
serial ports.   The only speeds allowed are the ones nailed into the
kernel as named constants.  This is a holdover from UNIX, which is a
holdover from DEC PDP-11 serial hardware circa mid 1970s, which had
14 standard baud rates encoded in 4 bits.  Really.

   In the Windows world, the actual baud rate is passed to the
driver.  Serial ports on the original IBM PC were loaded with
a clock rate, so DOS worked that way.

   This only matters if you need non-standard baud rates.  I've
had to deal with that twice, for a SICK LMS LIDAR, (1,000,000 baud)
and 1930s Teletype machines (45.45 baud).

   If you need non-standard speeds, see this:

http://www.aetherltd.com/connectingusb.html

   If 19,200 baud is enough for you, don't worry about it.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: serial module

2012-05-22 Thread John Nagle

On 5/22/2012 2:07 PM, Paul Rubin wrote:

John Nagle  writes:

If a device is registered as /dev/ttyUSBnn, one would hope that
the Linux USB insertion event handler, which assigns that name,
determined that the device was a serial port emulator.  Unfortunately,
the USB standard device classes
(http://www.usb.org/developers/defined_class) don't have "serial port
emulator" as a standardized device.  So there's more variation in this
area than in keyboards, mice, or storage devices.


Hmm, I've been using USB-to-serial adapters and so far they've worked
just fine.  I plug the USB end of adapter into a Ubuntu box, see
/dev/ttyUSB* appear, plug the serial end into the external serial
device, and just use pyserial like with an actual serial port.  I didn't
realize there were issues with this.


   There are.  See "http://wiki.debian.org/usbserial";.  Because there's
no standard USB class for such devices, the specific vendor ID/product
ID pair has to be known to the OS.  In Linux, there's a file of these,
but not all USB to serial adapters are in it.  In Windows, there
tends to be a vendor-provided driver for each brand of USB to
serial converter.  This all would have been much simpler if the USB
Consortium had defined a USB class for these devices, as they did
for keyboards, mice, etc.

   However, this is not the original poster's problem.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: escaping/encoding/formatting in python

2012-05-23 Thread John Nagle

On 4/5/2012 10:10 PM, Steve Howell wrote:

On Apr 5, 9:59 pm, rusi  wrote:

On Apr 6, 6:56 am, Steve Howell  wrote:



You've one-upped me with 2-to-the-N backspace escaping.


   Early attempts at UNIX word processing, "nroff" and "troff",
suffered from that problem, due to a badly designed macro system.

   A question in language design is whether to escape or quote.
Do you write

"X = %d" % (n,))

or

"X = " + str(n)

In general, for anything but output formatting, the second scales
better.  Regular expressions have a bad case of the first.
For a quoted alternative to regular expression syntax, see
SNOBOL or Icon.   SNOBOL allows naming patterns, and those patterns
can then be used as components of other patterns.  SNOBOL
is obsolete, but that approach produced much more readable
code.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Email Id Verification

2012-05-24 Thread John Nagle

On 5/24/2012 5:32 AM, niks wrote:

Hello everyone..
I am new to asp.net...
I want to use Regular Expression validator in Email id verification..
Can anyone tell me how to use this and what is the meaning of
this
\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*


   Not a Python question.

   It matches anything that looks like a mail user name followed by
an @ followed by anything that looks more or less like a domain name.
The domain name must contain at least one ".", and cannot end with
a ".", which is not strictly correct but usually works.

        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: sqlite INSERT performance

2012-05-30 Thread John Nagle

On 5/30/2012 6:57 PM, duncan smith wrote:

Hello,
I have been attempting to speed up some code by using an sqlite
database, but I'm not getting the performance gains I expected.


SQLite is a "lite" database.  It's good for data that's read a
lot and not changed much.  It's good for small data files.  It's
so-so for large database loads.  It's terrible for a heavy load of 
simultaneous updates from multiple processes.


However, wrapping the inserts into a transaction with BEGIN
and COMMIT may help.

If you have 67 columns in a table, you may be approaching the
problem incorrectly.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Internationalized domain names not working with URLopen

2012-06-12 Thread John Nagle

I'm trying to open

http://пример.испытание

with

urllib2.urlopen(s1)

in Python 2.7 on Windows 7. This produces a Unicode exception:

>>> s1
u'http://\u043f\u0440\u0438\u043c\u0435\u0440.\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435'
>>> fd = urllib2.urlopen(s1)
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
  File "C:\python27\lib\urllib2.py", line 394, in open
response = self._open(req, data)
  File "C:\python27\lib\urllib2.py", line 412, in _open
'_open', req)
  File "C:\python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
  File "C:\python27\lib\urllib2.py", line 1199, in http_open
return self.do_open(httplib.HTTPConnection, req)
  File "C:\python27\lib\urllib2.py", line 1168, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
  File "C:\python27\lib\httplib.py", line 955, in request
self._send_request(method, url, body, headers)
  File "C:\python27\lib\httplib.py", line 988, in _send_request
self.putheader(hdr, value)
  File "C:\python27\lib\httplib.py", line 935, in putheader
hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 
0-5: ordinal not in range(128)

>>>

The HTTP library is trying to put the URL in the header as ASCII.  Why 
isn't "urllib2" handling that?


What does "urllib2" want?  Percent escapes?  Punycode?

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Internationalized domain names not working with URLopen

2012-06-13 Thread John Nagle

On 6/12/2012 11:42 PM, Andrew Berg wrote:

On 6/13/2012 1:17 AM, John Nagle wrote:

What does "urllib2" want?  Percent escapes?  Punycode?

Looks like Punycode is the correct answer:
https://en.wikipedia.org/wiki/Internationalized_domain_name#ToASCII_and_ToUnicode

I haven't tried it, though.


   This is Python bug #9679:

http://bugs.python.org/issue9679

It's been open for years, and the maintainers offer elaborate
excuses for not fixing the problem.

The socket module accepts Unicode domains, as does httplib.
But urllib2, which is a front end to both, is still broken.
It's failing when it constructs the HTTP headers.  Domains
in HTTP headers have to be in punycode.

The code in stackoverflow doesn't really work right.  Only
the domain part of a URL should be converted to punycode.
Path, port, and query parameters need to be converted to
percent-encoding.  (Unclear if urllib2 or httplib does this
already.  The documentation doesn't say.)

While HTTP content can be in various character sets, the
headers are currently required to be ASCII only, since the
header has to be processed to determine the character code.
(http://lists.w3.org/Archives/Public/ietf-http-wg/2011OctDec/0155.html)

Here's a workaround, for the domain part only.


#
#   idnaurlworkaround  --  workaround for Python defect 9679
#
PYTHONDEFECT9679FIXED = False # Python defect #9679 - change when fixed

def idnaurlworkaround(url) :
"""
Convert a URL to a form the currently broken urllib2 will accept.
Converts the domain to "punycode" if necessary.
This is a workaround for Python defect #9679.
"""
if PYTHONDEFECT9679FIXED :  # if defect fixed
return(url)   # use unmodified URL
url = unicode(url)  # force to Unicode
(scheme, accesshost, path, params,
query, fragment) = urlparse.urlparse(url)# parse URL
if scheme == '' and accesshost == '' and path != '' : # bare domain
accesshost = path # use path as access host
path = '' # no path
labels = accesshost.split('.') # split domain into sections ("labels")
labels = [encodings.idna.ToASCII(w) for w in labels]# convert each 
label to punycode if necessary

accesshost = '.'.join(labels) # reassemble domain
url = urlparse.urlunparse((scheme, accesshost, path, params, query, 
fragment))  # reassemble url

return(url) # return complete URL with punycode domain

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: feedparser hanging after I/O error

2011-06-03 Thread John Nagle

On 6/2/2011 4:40 AM, xDog Walker wrote:

On Wednesday 2011 June 01 10:34, John Nagle wrote:

I have a program which uses "feedparser".  It occasionally hangs when
the network connection has been lost, and remains hung after the network
connection is restored.


My solution is to download the feed file using wget, then hand that file to
feedparser. feedparser will also hang forever on a url if the server doesn't
serve.


   Then you don't get the poll optimization, where feedparser sends the
token to indicate that it's already seen version N.

   This is for a program that's constantly polling RSS feeds and 
fetching changes.  Feedparser is good for that, until the network

fails temporarily.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Function declarations ?

2011-06-13 Thread John Nagle

On 6/12/2011 12:38 PM, Andre Majorel wrote:

On 2011-06-10, Asen Bozhilov  wrote:

Andre Majorel wrote:


Is there a way to keep the definitions of the high-level
functions at the top of the source ? I don't see a way to
declare a function in Python.


Languages with variable and function declarations usually use
hoisted environment.


Hoisted ? With a pulley and a cable ?


   There are languages with definitions and in which
the compiler looks ahead.  FORTRAN, for example.
Python doesn't work that way.  Nor do C and the
languages derived from it, because the syntax
is context-dependent.  (In C++, "A b;" is ambiguous
until after the declaration of A. In
Pascal-derived languages, you write "var b: A;",
which is parseable before you know what A is.
So declarations don't have to be in dependency order.)

   None of this is relevant to Python, but that's
what "hoisted" means in this context..

John Nagle


--
http://mail.python.org/mailman/listinfo/python-list


Re: those darn exceptions

2011-06-27 Thread John Nagle

On 6/21/2011 2:51 PM, Chris Torek wrote:

On Tue, 21 Jun 2011 01:43:39 +, Chris Torek wrote:

But how can I know a priori
that os.kill() could raise OverflowError in the first place?


   If you passed an integer that was at some time a valid PID
to "os.kill()", and OverflowError was raised, I'd consider that
a bug in "os.kill()".  Only OSError, or some subclass thereof,
should be raised for a possibly-valid PID.

   If you passed some unreasonably large number, that would be
a legitimate reason for an OverflowError. That's for parameter
errors, though; it shouldn't happen for environment errors.

   That's a strong distinction.  If something can raise an
exception because the environment external to the process
has a problem, the exception should be an EnvironmentError
or a subclass thereof.   This maintains a separation between
bugs (which usually should cause termination or fairly
drastic recovery action) and normal external events (which
have to be routinely handled.)

   It's quite possible to get a OSError on "os.kill()" for
a number of legitimate reasons. The target process may have
exited since the PID was obtained, for example.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: How to import data from MySQL db into excel sheet

2011-06-30 Thread John Nagle

On 6/2/2011 5:11 AM, hisan wrote:

Please let me know how can i import my sql data of multiple rows and
columns into an excel sheet.
here i need to adjust the column width based on the on the data that
sits into the column


   You're asking in the wrong forum.  Try the MySQL forum or an
Excel forum.

   For a one-off job, use the MySQL Workbench, do a SELECT, click on
the floppy disk icon, and export a CSV (comma-separated value) file, 
which Excel will import.


   It's possible to link Excel directly to an SQL database; see
the Excel documentation.

   On a server, you can SELECT ... INTO OUTFILE and get a CSV file
that way, but the file is created on the machine where the database
is running, not the client machine.

   You can write a Python program to SELECT from the database and
use the CSV module to create a CSV file, but as a one-off, it's
not necessary.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: What Programing Language are the Largest Website Written In?

2011-07-22 Thread John Nagle
On 7/12/2011 4:54 AM, Xah Lee wrote:
> Then, this question piqued me, even i tried to not waste my time. But
> it overpowered me before i resisted, becuase i quickly spend 15 min to
> write this list (with help of Google):
> 
>  1 Google ◇ Java
>  2 Facebook ◇ PHP
>  3 YouTube ◇ Python
>  4 Yahoo! ◇ PHP
>  5 blogger.com ◇ Java
>  6 baidu.com ◇ C/C++. perl/python/ruby
>  7 Wikipedia ◇ PHP

Aargh.  Much misinformation.

First, most of the heavy machinery of Google is written in C++.
Some user-facing stuff is written in Java, and some scripting is done
in Python.  Google is starting to use Go internally, but they're
not saying much about where.

Facebook is PHP on the user-facing side, but there's heavy
inter-server communication and caching, mostly in C++.

The original user interface for YouTube, before Google bought it,
was in Python.  But it's since been rewritten.  All the stuff that
actually handles video is, of course in C/C++.  The load of handling
the video dwarfs the user interface load.

Wikipedia is indeed written in PHP.

    John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is there a way to customise math.sqrt(x) for some x?

2011-07-23 Thread John Nagle

On 7/16/2011 2:14 AM, Chris Angelico wrote:

On Sat, Jul 16, 2011 at 6:35 PM, Steven D'Aprano
  wrote:

I have a custom object that customises the usual maths functions and
operators, such as addition, multiplication, math.ceil etc.

Is there a way to also customise math.sqrt? I don't think there is, but I
may have missed something.


Only thing I can think of is:

import math
math.sqrt=lambda(x) x.__sqrt__(x) if x.whatever else math.sqrt(x)

I don't suppose there's a lambda version of try/catch?

ChrisA


Why use a lambda?  Just use a def.

A lambda with an "if" is un-Pythonic.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: I am fed up with Python GUI toolkits...

2011-07-24 Thread John Nagle

On 7/19/2011 7:34 PM, Andrew Berg wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

There's PyGUI, which, at a glance, fits whit what you want. Looks like
it uses OpenGL and native GUI facilities.
http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/

It has quite a few external dependencies, though (different dependencies
for each platform, so it requires a lot to be cross-platform).


  It still uses Tcl/Tk stuff, which is un-Pythonic.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Complex sort on big files

2011-08-09 Thread John Nagle

On 8/6/2011 10:53 AM, sturlamolden wrote:

On Aug 1, 5:33 pm, aliman  wrote:


I've read the recipe at [1] and understand that the way to sort a
large file is to break it into chunks, sort each chunk and write
sorted chunks to disk, then use heapq.merge to combine the chunks as
you read them.


Or just memory map the file (mmap.mmap) and do an inline .sort() on
the bytearray (Python 3.2). With Python 2.7, use e.g. numpy.memmap
instead. If the file is large, use 64-bit Python. You don't have to
process the file in chunks as the operating system will take care of
those details.

Sturla


   No, no, no.  If the file is too big to fit in memory, trying to
page it will just cause thrashing as the file pages in and out from
disk.

   The UNIX sort program is probably good enough.  There are better
approaches, if you have many gigabytes to sort, (see Syncsort, which
is a commercial product) but few people need them.

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: 'Use-Once' Variables and Linear Objects

2011-08-12 Thread John Nagle

On 8/2/2011 7:19 AM, Neal Becker wrote:

I thought this was an interesting article

http://www.pipeline.com/~hbaker1/Use1Var.html


   Single-use was something of a dead end in programming.

   Single assignment, where you can only set a variable when you create
it, is more useful.  Single assignment is comparable to functional
programming, but without the deeply nested syntax.

   Functional programs are trees, while single-assignment programs are
directed acyclic graphs.  The difference is that you can fan-out
results, while in a a functional language, you can only fan in.
This fits well with Python, where you can write things like

   def fn(x) :
(a, b, c) = fn1()
return(fn2(a) + fn3(b)*c)

"const" is often used in C and C++ to indicate single-assignment usage.
But C/C++ doesn't have multiple return values, so the concept isn't as
useful as it is in Python.

Optimizing compilers usually recognize variable lifetimes, and so they
create single-assignment variables internally when possible.  This
is a win for register and stack allocation, and for fine-grain
parallelism on machines which support it.

Since Python isn't very optimizable, this is mostly a curiosity.

        John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: try... except with unknown error types

2011-08-20 Thread John Nagle

On 8/19/2011 1:24 PM, John Gordon wrote:

In<4e4ec405$0$29994$c3e8da3$54964...@news.astraweb.com>  Steven 
D'Aprano  writes:


You can catch all exceptions by catching the base class Exception:



Except that is nearly always poor advice, because it catches too much: it
hides bugs in code, as well as things which should be caught.



You should always catch the absolute minimum you need to catch.


   Right.  When in doubt, catch EnvironmentError.  That means something
external to the program, at the OS or network level, has a problem.
"Exception" covers errors which are program bugs, like references to
undefined class members.

        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Error when deleting and reimporting subpackages

2011-08-22 Thread John Nagle

On 8/22/2011 11:51 AM, Matthew Brett wrote:

Hi,

I recently ran into this behavior:


import sys import apkg.subpkg del sys.modules['apkg'] import
apkg.subpkg as subpkg

Traceback (most recent call last): File "", line 1,
in AttributeError: 'module' object has no attribute 'subpkg'

where 'apkg' and 'subpkg' comprise empty __init__.py files to
simplify the example.

It appears then, that importing a subpackage, then deleting the
containing package from sys.modules, orphans the subpackage in an
unfixable state.

I ran into this because the nose testing framework does exactly this
kind of thing when loading test modules, causing some very confusing
errors and failures.

Is this behavior expected?


   It's undefined behavior.  You're dealing with CPython implementation
semantics, not Python language semantics.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: try... except with unknown error types

2011-08-31 Thread John Nagle

On 8/21/2011 5:30 PM, Steven D'Aprano wrote:

Chris Angelico wrote:



A new and surprising mode of network failure would be indicated by a
new subclass of IOError or EnvironmentError.


/s/would/should/

I don't see why you expect this, when *existing* network-related failures
aren't:


import socket
issubclass(socket.error, EnvironmentError)

False

(Fortunately that specific example is fixed in Python 3.)


   I think I reported that some years ago.

   There were some other errors in the URL and SSL area that
weren't subclasses of EnvironmentError.  It's also possible
to get UnicodeError from URL operations.

        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


SSL module needs issuer information

2011-09-03 Thread John Nagle

  The SSL module still doesn't return much information from the
certificate.  SSLSocket.getpeercert only returns a few basic items
about the certificate subject.  You can't retrieve issuer information,
and you can't get the extensions needed to check if a cert is an EV cert.

  With the latest flaps about phony cert issuers, it's worth
having issuer info available.  It was available in the old M2Crypto
module, but not in the current Python SSL module.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Did MySQL support ever make it to Python 3.x?

2011-03-01 Thread John Nagle

   Is there Python 3.x support for MySQL yet?  MySQLdb's
page still says "Python versions 2.3-2.6 are supported.":

   https://sourceforge.net/projects/mysql-python/

There's PyMySQL, which is pure Python, but it's at version
0.4.  There's good progress there, but it's not being used
heavily yet, and users are reporting bugs like "broken pipe"
errors.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to read the last line of a huge file???

2011-03-05 Thread John Nagle

On 3/5/2011 10:21 AM, tkp...@hotmail.com wrote:

Question: how do I use f.tell() to
identify if an offset is legal or illegal?


   Read backwards in binary mode, byte by byte,
until you reach a byte which is, in binary, either

0xxx
11xx

You are then at the beginning of an ASCII or UTF-8
character.  You can copy the bytes forward from there
into an array of bytes, then apply the appropriate
codec.  This is also what you do if skipping ahead
in a UTF-8 file, to get in sync.

   Reading the last line or lines is easier.  Read backwards
in binary until you hit an LF or CR, both of which
are the same in ASCII and UTF-8.  Copy the bytes
forward from that point into an array of bytes, then
apply the appropriate codec.

John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Extending dict (dict's) to allow for multidimensional dictionary

2011-03-05 Thread John Nagle

On 3/5/2011 12:05 PM, Paul Rubin wrote:

Ravi  writes:

I can extend dictionary to allow for the my own special look-up
tables. However now I want to be able to define multidimensional
dictionary which supports look-up like this:

d[1]['abc'][40] = 'dummy'


Why do that anyway?  You can use a tuple as a subscript:

d[1,'abc',40] = 'dummy'


   Also, at some point, it's time to use a database.
If you find yourself writing those "dictionaries" to
files, or trying to look up everything with "abc"
in the second subscript, a database is appropriate.

John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: having both dynamic and static variables

2011-03-05 Thread John Nagle

On 3/2/2011 9:27 PM, Steven D'Aprano wrote:

On Wed, 02 Mar 2011 19:45:16 -0800, Yingjie Lan wrote:


Hi everyone,

Variables in Python are resolved dynamically at runtime, which comes at
a performance cost. However, a lot of times we don't need that feature.
Variables can be determined at compile time, which should boost up
speed.

[...]

This is a very promising approach taken by a number of projects.


   It's worth having some syntax for constants.  I'd suggest
using "let":

let PI = 3.1415926535897932384626433832795028841971693993751

I'd propose the following semantics:

1.  "let" creates an object whose binding is unchangeable.  This
is effectively a constant, provided that the value is immutable.
A compiler may treat such variables as constants for optimization
purposes.

2.  Assignment to a a variable created with "let" produces an error
at compile time or run time.

3.  Names bound with "let" have the same scope as any other name
created in the same context.  Function-local "let" variables
are permitted.

4.  It is an error to use "let" on a name explicitly made "global",
because that would allow access to the variable before it was
initialized.

This is close to the semantics of "const" in C/C++, except that
there's no notion of a const parameter.

"let" allows the usual optimizations - constant folding, hoisting
out of loops, compile time arithmetic, unboxing, etc.  Ordinarily,
Python compilers have to assume that any variable can be changed
at any time from another thread, requiring worst-case code for
everything.

John Nagle  
--
http://mail.python.org/mailman/listinfo/python-list


Re: having both dynamic and static variables

2011-03-05 Thread John Nagle

On 3/5/2011 7:46 PM, Corey Richardson wrote:

On 03/05/2011 10:23 PM, MRAB wrote:

Having a fixed binding could be useful elsewhere, for example, with
function definitions:
[..]
  fixed PI = 3.1415926535897932384626433832795028841971693993751

  fixed def squared(x):
  return x * x


This question spawns from my ignorance: When would a functions
definition change? What is the difference between a dynamic function and
a fixed function?


   All functions in Python can be replaced dynamically. While they're
running. From another thread.  Really.

   Implementing this is either inefficient, with a lookup for every
use (CPython) or really, really complicated, involving just-in-time
compilers, invalidation, recompilation, and a backup interpreter
for when things get ugly (PyPy).

    John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing module in async db query

2011-03-08 Thread John Nagle

On 3/8/2011 3:34 PM, Philip Semanchuk wrote:


On Mar 8, 2011, at 3:25 PM, Sheng wrote:


This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.


[snip]



So the workflow is like this,

get() -->  fork a subprocess to process the query request in
async_func() ->  when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.

So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?


Hi Sheng,
Have you looked at multiprocessing.Queue objects?


Make sure that, having made a request of the database, you
quickly read all the results.  Until you finish the transaction,
the database has locks set, and other transactions may stall.
"Streaming" out to a network connection while still reading from
the database is undesirable.

If you're doing really big SELECTs, consider using LIMIT and
OFFSET in SQL to break them up into smaller bites.  Especially
if the user is paging through the results.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Passing Functions

2011-03-11 Thread John Nagle

On 3/11/2011 5:49 AM, yoro wrote:


I've found the error, I had to type in:

for node in nodeTable:
   if node != 0 and Node.visited == False:


  That's just your first error.
(Also, you shouldn't have anything but Node items in
nodeTable, so you don't need the "node != 0".)

  The biggest problem is at

#Values to assign to each node
> > class Node:
> >   distFromSource = infinity
> >   previous = invalid_node
> >   visited = False

   Those are variables of the entire class.  Every
instance of Node shares the same variables. You
need

class Node:
   def __init__(self) :
   self.distFromSource = infinity
   self.previous = invalid_node
   self.visited = False

John Nagle

--
http://mail.python.org/mailman/listinfo/python-list


Re: Compile time evaluation of dictionaries

2011-03-11 Thread John Nagle

On 3/10/2011 8:23 AM, Gerald Britton wrote:

Today I noticed that an expression like this:

"one:%(one)s two:%(two)s" % {"one": "is the loneliest number", "two":
"can be as bad as one"}

could be evaluated at compile time, but is not:


   CPython barely evaluates anything at compile time.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Creating a very simple revision system for photos in python

2011-03-11 Thread John Nagle

On 3/11/2011 6:56 AM, Thomas W wrote:

I`m thinking about creating a very simple revision system for photos
in python, something like bazaar, mercurial or git, but for photos.
The problem is that handling large binary files compared to plain text
files are quite different. Has anybody done something like this or
have any thoughts about it, I`d be very grateful. If something like
mercurial or git could be used and/or extended/customized that would
be even better.


   Alienbrain (http://www.alienbrain.com/) does this.  That's
what game companies use for revision control, where data includes
images, motion capture files, game levels, and music, as well as
code.  There's also Autodesk Vault, which does a similar job for
engineering data.

   One key to doing this well is the ability to talk about a
group of revisions across multiple files as an entity, without
having to be the owner of those files.  You need to say what
goes into a build of a game, or a revision of a manufactured
product.

   You also need really good tools to show the differences
between revisions.

    John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >