Re: OO approach to decision sequence?

2005-06-20 Thread Thomas Lotze
Jordan Rastrick wrote:

> Without knowing more about your problem, I think the most obvious OO
> approach would be to write a seperate (simple) class for each of
> node_type_1, node_type_2, etc.

While I agree that this is the cleanest and usually simplest approach,
it does have its drawbacks. I'm currently working on a project where I'd
very much like to avoid writing a whole set of classes just for the
purpose of avoiding a decision chain.

For a PDF library, I need basic data types that are used in a PDF
document. Such are integers, floats, strings, lists, dictionaries and a
few. At some point they have to be written to a file, and at first I was
tempted to create types like pdfint, pdffloat, pdfstr etc. which
implement the respective file encoding either in a write method or
directly in __str__.

However, the whole point of the library is to allow working with the
document's data. Beside manipulating existing (as in read from a PDF
file) mutable objects this includes creating new objects of type pdffoo.
And I realized it is very bothersome to have to say x = pdfint(5)
instead of x = 5 everytime I deal with integers that would end up in the
document. Similar for, e.g., adding to PDF integers: x = pdfint(y+z)
instead of just x = y+z.

The latter can be cured by touching all methods returning any pdffoo
instances. No sane person would do this, however, and it would not
eliminate any pdffoo(x) type conversions in the app code anyway.

So I decided that in this case it is best to go without special types
and use those provided by Python, and live with an ugly decision chain
or two at defined places in the library.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Package organization

2005-06-22 Thread Thomas Lotze
Hi,

I've two questions concerning organizing and naming things when writing
a Python package.

- Naming of classes: I'm writing a library that reads PDF files. I have
  a data structure that represents the large-scale structure of a PDF
  file (header, trailer, incremental updates etc), and I'll have others,
  e.g. one that represents the document as a collection of logical
  objects (page descriptions, images etc).

  Assume I have a package called PDF. Should the classes then be called
  simply File and Objects, as it is clear what they do as they are
  imported from PDF? Or should they be called PDFFile and PDFObjects, as
  the names would be too undescriptive otherwise?

- Organizing subpackages and interfaces: I'm using the zope.interface
  package in order to define interface classes. In a small package
  called foo, one might define interfaces IReadableFoo and IWritableFoo
  in foo.interfaces.

  However, in a large package foo with subpackages bar and baz,
  interface definitions might either sit in foo.bar.interfaces and
  foo.baz.interfaces, or in foo.interfaces.bar and foo.interfaces.baz.
  Which is preferable?

Thanks for any thought on this.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Package organization

2005-06-22 Thread Thomas Lotze
F. Petitjean wrote:

> As you whish :-)

Damn freedom of choice *g

> if in the package ie in the __init__.py (not the best idea) from PDF
> import File as PDFFile  # always possible

Technically, this is clear - however I don't like the idea of giving the
same thing different names, especially if there's a chance that other
people get to look at and try to understand the code...

Using short names being unique by virtue of the subpackage hierarchy
internally and leaving it to the user (which might even be another
subpackage of the library) to import it as something more descriptive in
his context is probably the easiest, cleanest and least obtrusive thing,
as I think about it.

> Have you installed the reportlab package ? It is full of from ... import
> ..  and it generates PDF.

I do know ReportLab. IIRC, last time I looked, it didn't simply expose an
API that models and operates on a PDF document's structures, but was
designed to produce PDF files with a certain kind of content. It didn't
seem to be of much easy use for anything wildly different from that.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Module Exposure

2005-07-08 Thread Thomas Lotze
Jacob Page wrote:

> better-named,

Just a quick remark, without even having looked at it yet: the name is not
really descriptive and runs a chance of misleading people. The example I'm
thinking of is using zope.interface in the same project: it's customary to
name interfaces ISomething.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Should I use "if" or "try" (as a matter of speed)?

2005-07-09 Thread Thomas Lotze
Steve Juranich wrote:

> I was wondering how true this holds for Python, where exceptions are such
> an integral part of the execution model.  It seems to me, that if I'm
> executing a loop over a bunch of items, and I expect some condition to
> hold for a majority of the cases, then a "try" block would be in order,
> since I could eliminate a bunch of potentially costly comparisons for each
> item.

Exactly.

> But in cases where I'm only trying a single getattr (for example),
> using "if" might be a cheaper way to go.

Relying on exceptions is faster. In the Python world, this coding style
is called EAFP (easier to ask forgiveness than permission). You can try
it out, just do something 10**n times and measure the time it takes. Do
this twice, once with prior checking and once relying on exceptions.

And JFTR: the very example you chose gives you yet another choice:
getattr can take a default parameter.

> What do I mean by "cheaper"?  I'm basically talking about the number of
> instructions that are necessary to set up and execute a try block as
> opposed to an if block.

I don't know about the implementation of exceptions but I suspect most
of what try does doesn't happen at run-time at all, and things get
checked and looked for only if an exception did occur. An I suspect that
it's machine code that does that checking and looking, not byte code.
(Please correct me if I'm wrong, anyone with more insight.)

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Should I use "if" or "try" (as a matter of speed)?

2005-07-10 Thread Thomas Lotze
Steven D'Aprano wrote:

> On the gripping hand, testing for errors before they happen will be slow
> if errors are rare:

Hm, might have something to do with why those things intended for
handling errors after they happened are called exceptions ;o)

> - If your code has side effects (eg changing existing objects, writing to
> files, etc), then you might want to test for error conditions first.
> Otherwise, you can end up with your data in an inconsistent state.

BTW: Has the context management stuff from PEP 343 been considered for
implementing transactions?

> - Why are you optimizing your code now anyway? Get it working the simplest
> way FIRST, then _time_ how long it runs. Then, if and only if it needs to
> be faster, should you worry about optimizing. The simplest way will often
> be try...except blocks.

Basically, I agree with the "make it run, make it right, make it fast"
attitude. However, FWIW, I sometimes can't resist optimizing routines that
probably don't strictly need it. Not only does the resulting code run
faster, but it is usually also shorter and more readable and expressive.
Plus, I tend to gain further insight into the problem and tools in the
process. YMMV, of course.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Frankenstring

2005-07-12 Thread Thomas Lotze
Hi,

I think I need an iterator over a string of characters pulling them out
one by one, like a usual iterator over a str does. At the same time the
thing should allow seeking and telling like a file-like object:

>>> f = frankenstring("0123456789")
>>> for c in f:
... print c
... if c == "2":
... break
... 
0
1
2
>>> f.tell()
3L
>>> f.seek(7)
>>> for c in f:
... print c
... 
7
8
9
>>>

It's definitely no help that file-like objects are iterable; I do want
to get a character, not a complete line, at a time.

I can think of more than one clumsy way to implement the desired
behaviour in Python; I'd rather like to know whether there's an
implementation somewhere that does it fast. (Yes, it's me and speed
considerations again; this is for a tokenizer at the core of a library,
and I'd really like it to be fast.) I don't think there's anything like
it in the standard library, at least not anything that would be obvious
to me.

I don't care whether this is more of a string iterator with seeking and
telling, or a file-like object with a single-character iterator; as long
as it does both efficiently, I'm happy.

I'd even consider writing such a beast in C, albeit more as a learning
exercise than as a worthwhile measure to speed up some code.

Thanks for any hints.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fwd: Should I use "if" or "try" (as a matter of speed)?

2005-07-12 Thread Thomas Lotze
Christopher Subich wrote:

> try:
> f=file('file_here')
> except IOError: #File doesn't exist
> error_handle
> error_flag = 1
> if not error_flag:
> do_setup_code
> do_stuff_with(f)
> 
> which nests on weird, arbitrary error flags, and doesn't seem like good
> programming to me.

Neither does it to me. What about

try:
f=file('file_here')
except IOError: #File doesn't exist
error_handle
else:
do_setup_code
do_stuff_with(f)

(Not that I'd want to defend Joel's article, mind you...)

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Slicing every element of a list

2005-07-12 Thread Thomas Lotze
Alex Dempsey wrote:

> for line in lines:
> line = line[1:-5]
> line = line.split('\"\t\"')
> 
> This went without returning any errors, but nothing was sliced or split.
> Next I tried:
> 
> for i in range(len(lines)):
> lines[i] = lines[i][1:-5]
> lines[i] = lines[i].split('\"\t\"')
> 
> This of course worked, but why didn't the first one work.

Because when assigning to line the second time, you just make the
identifier reference a new object, you don't touch the list. This is how
one might do it without ranging over the length of the list and having
to get the lines out by element access:

for i, line in enumerate(lines):
line = line[1:-5]
lines[i] = line.split('\"\t\"')

Probably there are even better ways, this is just off the top of my
head.

> Further why
> didn't the first one return an error?

Because you didn't make any. You just discarded your results; why should
anyone stop you from burning cycles? *g

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-12 Thread Thomas Lotze
jay graves wrote:

> see StringIO or cStringIO in the standard library.

Just as with files, iterating over them returns whole lines, which is
unfortunately not what I want.

-- 
Thomas



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-12 Thread Thomas Lotze
Scott David Daniels wrote:

> Now if you want to do it for a file, you could do:
> 
>  for c in thefile.read():
>  

The whole point of the exercise is that seeking on a file doesn't
influence iteration over its content. In the loop you suggest, I can
seek() on thefile to my heart's content and will always get its content
iterated over exactly from beginning to end. It had been read before any
of this started, after all. Similarly, thefile.tell() will always tell me
thefile's size or the place I last seek()'ed to instead of the position of
the next char I will get.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-13 Thread Thomas Lotze
Roland Heiber wrote:

> if i did understand what you mean, what about using mmap?

AIUI (and as a little experimenting seems to confirm), you can't
reposition an iterator over an mmap'ed file by seeking. True, you have
both iterating by characters and seeking/telling, but the two
functionalities don't play together.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-13 Thread Thomas Lotze
Bengt Richter wrote:

> < lotzefile.py >--

Thanks.

[...]
> byte = self.buf[self.pos]

This is the place where the thing is basically a str whose items are
accessed as sequence elements. It has some iterator behaviour and file
management which makes it nice to use, of course, and to most this will
be enough (and it is a lot indeed). But it loses the efficiency of

for c in "asdf": do_something(c)

Actually, relying on string[index] behind the scenes is one of the ways
of implementing frankenstring I labelled "clumsy" in the original
posting ;o)

> I suspect you could get better performance if you made LotzeFile instances
> able to return interators over buffer chunks and get characters from them,
> which would be string iterators supplying the characters rather than the
> custom .next, but the buffer chunks would have to be of some size to make
> that pay. Testing is the only way to find out what the crossing point is,
> if you really have to.

If I understand this correctly, you'd have to switch to using a new iterator
after seeking, which would make this impossible:

f = LotzeFile('something')
for c in iter(f):
do_something(c)
if some_condition:
f.seek(somewhere)
# the next iteration reads from the new position

And it would break telling since the class can't know how many
characters have been read from an iterator once it returned one after
seeking or switching to another buffer chunk.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-13 Thread Thomas Lotze
Peter Otten wrote:

 class frankenstring(StringIO):
> ... def next(self):
> ... c = self.read(1)
> ... if not c:
> ... raise StopIteration
> ... return c

Repeated read(1) on a file-like object is one of the ways of doing it with
existing tools I labelled "clumsy" in the original posting ;o)

Thanks anyway.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-14 Thread Thomas Lotze
Andreas Lobinger wrote:

>  >>> t2 = f.find('2')+1

This is indeed faster than going through a string char by char. It doesn't
make for a nice character-based state machine, but of course it avoids
making Python objects for every character and uses the C implementation of
str for searching.

However, it's only fine if you are looking for single characters. As soon
as you're looking for classes of characters, you need the (slower) regex
machinery (as you well know, but for the sake of discussion...).

> A string, and a pointer on that string. If you give up the boundary
> condition to tell backwards, you can start to eat up the string via f =
> f[p:]. There was a performance difference with that, in fact it was faster
> ~4% on a python2.2.

When I tried it just now, it was the other way around. Eating up the
string was slower, which makes sense to me since it involves creating new
string objects all the time.

> I dont't expect any iterator solution to be faster than that.

It's not so much an issue of iterators, but handling Python objects
for every char. Iterators would actually be quite helpful for searching: I
wonder why there doesn't seem to be an str.iterfind or str.itersplit
thing. And I wonder whether there shouldn't be str.findany and
str.iterfindany, which takes a sequence as an argument and returns the
next match on any element of it.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-14 Thread Thomas Lotze
Peter Otten wrote:

> Not clumsy, just slow.

As you wish ;o) I didn't mean clumsy as in "clumsy looking Python code"
anyway, rather as in "clumsy to use the Python machinery for operations
that are straight-forward and efficient in C, in which language str and
cStringIO are implemented already".

> I hope you'll let us know how much faster your
> final approach turns out to be.

I'm pretty convinced that implementing an algorithmically nice state
machine that goes through a string char by char won't get any faster than
using s[index] all the time unless I do a frankenstring in C. Failing
that, a more pragmatic approach is what Andreas suggests; see the other
subthread.

> By the way, I'll consider anything that
> doesn't implement seek() and tell() cheating :-)

An implementation of frankenstring would have to have seek and tell,
that's the point of doing it. But for half-way simple state machines,
hiding the index handling in a Python class that slows things down it just
not worth it. Doing index += 1 here and there is fine if it happens only
half a dozen times. I know it's not beautiful, that's why I started this
thread ;o)

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-14 Thread Thomas Lotze
Thomas Lotze wrote:

> And I wonder whether there shouldn't be str.findany and
> str.iterfindany, which takes a sequence as an argument and returns the
> next match on any element of it.

On second thought, that wouldn't gain much on a loop over finding each
sequence, but add more complexity than it is worth. What would be more
useful, especially thinking of a C implementation, is str.findanyof and
str.findnoneof. They take a string as an argument and find the first
occurrence of any char in that string or any char not in that string,
resp. Especially finding any char not among a given few needs a hoop to
jump through now, if I didn't miss anything.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: using hotshot for timing and coverage analysis

2005-07-15 Thread Thomas Lotze
Andreas Lobinger wrote:

> hotshot.Profile has flags for recording timing per line and line events.
> Even if i had both set to 1 i still get only the standard data (time per
> call).

Could it be that pstats.Stats doesn't know about hotshot? Haven't checked...

What's much more annoying about hotshot is that loading the stats takes
ages if one profiles stuff that runs about half a minute or so. At least
it does that on Python 2.4.1a0 as shipped with Debian testing a while ago.

> Is there any document available that has examples how to use the hotshot
> for converage analysis and to display timing per line?

Haven't looked thoroughly yet; all I know is what's in the Python docs.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Programming Contest

2005-07-15 Thread Thomas Lotze
Brian Quinlan wrote:

> I've decided that it would be be fun to host a weekly Python programming
> contest.

I like the idea, and doing the first problem was fun indeed
:o)

> I'm always looking for feedback, so let me know what you think or if you
> have any ideas for future problems.

It would be nice if you could put up a suite of test data with oracle
solutions for download. For those sitting behind a modem line (like me),
it would be a great help and speed up of the testing cycle.

Thanks for your effort, in any case!

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Frankenstring

2005-07-18 Thread Thomas Lotze
Peter Otten wrote:

> I hope you'll let us know how much faster your
> final approach turns out to be

OK, here's a short report on the current state. Such code as there is can
be found at ,
with a Python mock-up in the same directory.

Thinking about it (Andreas, thank you for the reminder :o)), doing
character-by-character scanning in Python is stupid, both in terms of
speed and, given some more search capabilities than str currently has,
elegance.

So what I did until now (except working myself into writing extensions
in C) is give the evolving FrankenString some search methods that enable
searching for the first occurrence in the string of any character out of
a set of characters given as a string, or any character not in such a
set. This has nothing to do yet with iterators and seeking/telling.

Just letting C do the "while data[index] not in whitespace: index += 1"
part speeds up my PDF tokenizer by a factor between 3 and 4. I have
never compared that directly to using regular expressions, though... As
a bonus, even with this minor addition the Python code looks a little
cleaner already:

c = data[cursor]

while c in whitespace:
# Whitespace tokens.
cursor += 1

if c == '%':
# We're just inside a comment, read beyond EOL.
while data[cursor] not in "\r\n":
cursor += 1
cursor += 1

c = data[cursor]

becomes

cursor = data.skipany(whitespace, start)
c = data[cursor]

while c == '%':
# Whitespace tokens: comments till EOL and whitespace.
cursor = data.skipother("\r\n", cursor)
cursor = data.skipany(whitespace, cursor)
c = data[cursor]

(removing '%' from the whitespace string, in case you wonder).

The next thing to do is make FrankenString behave. Right now there's too
much copying of string content going on everytime a FrankenString is
initialized; I'd like it to share string content with other
FrankenStrings or strs much like cStringIO does. I hope it's just a
matter of learning from cStringIO. To justify the "franken" part of the
name some more, I consider mixing in yet another ingredient and making
the thing behave like a buffer in that a FrankenString should be
possible to make from only part of a string without copying data.

After that, the thing about seeking and telling iterators over
characters or search results comes in. I don't think it will make much
difference in performance now that the stupid character searching has
been done in C, but it'll hopefully make for more elegant Python code.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: is this pythonic?

2005-07-21 Thread Thomas Lotze
Mage wrote:

> Or is there better way?
> 
> for (i, url) in [(i,links[i]) for i in range(len(links))]:
>   ...
> 
> "links" is a list.

for i, url in enumerate(links):

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using gnu readline in my own python program?

2005-08-01 Thread Thomas Lotze
sboyle55 wrote:

> Hi...I'm a newbie to python, and very confused.  I'm writing a simple
> program and want the user to be able to edit a line that I display using
> the full gnu readline capabilitites.  (For example, control+a to go to the
> beginning of the line.)
> 
> Then I want to be able to read the line after it's been edited...

Probably the built-in function raw_input already does what you want. It
uses readline if available.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using gnu readline in my own python program?

2005-08-01 Thread Thomas Lotze
sboyle55 wrote:

> raw_input is an excellent suggestion, and almost exactly what I want.
> 
> But, I want to give the user a string to edit, not have them start from
> scratch inputting a string.



Take a look at the fancy_input function.

-- 
Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


StringIO objects sharing a buffer

2005-02-15 Thread Thomas Lotze
Hi,

I want to implement a tokenizer for some syntax. So I thought I'd subclass
StringIO and make my new class return tokens on next().

However, if I want to read tokens from two places in the string in turns,
I'd either need to do some housekeeping of file pointers outside the
tokenizer class (which is ugly) or use two tokenizers on the same data
buffer (which seems impossible to me using my preferred approach as a
file-like object has exactly one file pointer).

Is there a way for multiple StringIO objects to share a buffer of data, or
do I have to give up on subclassing StringIO for this purpose? (An
alternative would be a tokenizer class that has a StringIO instead of
being one and do the file pointer housekeeping in there.)

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Copying data between file-like objects

2005-02-15 Thread Thomas Lotze
Hi,

another question: What's the most efficient way of copying data between
two file-like objects?

f1.write(f2.read()) doesn't seem to me as efficient as it might be, as a
string containing all the contents of f2 will be created and thrown away.
In the case of two StringIO objects, this means there's a point when the
contents is held in memory three times.

Reading and writing a series of short blocks to avoid a large copy buffer
seems ugly to me, and string objects will be created and thrown away all
the time. Do I have to live with that?

(In C, I would do the same thing, only without having to create and throw
away anything while overwriting a copy buffer, and being used to doing
everything the pedestrian way, anyway.)

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Copying data between file-like objects

2005-02-15 Thread Thomas Lotze
Fredrik Lundh wrote:

> if f2 isn't too large, reading lots of data in one operation is often the most
> efficient way (trust me, the memory system is a lot faster than your disk)

Sure.

> if you don't know how large f2 can be, use shutil.copyfileobj:
> 
> >>> help(shutil.copyfileobj)
> Help on function copyfileobj in module shutil:
> 
> copyfileobj(fsrc, fdst, length=16384)
> copy data from file-like object fsrc to file-like object fdst

This sounds like what I was looking for. Thanks for the pointer.
However, the following doesn't seem like anything is being copied:

>>> from StringIO import StringIO
>>> from shutil import copyfileobj
>>> s = StringIO()
>>> s2 = StringIO()
>>> s.write('asdf')
>>> copyfileobj(s, s2)
>>> s2.getvalue()
''

> to copy stringio objects, you can use f1 = StringIO(f2.getvalue()).

But this should have the same problem as using read(): a string will be
created on the way which contains all the content.

> why you
> would want/need to do this is more than I can figure out, though...

Because I want to manipulate a copy of the data and be able to compare it
to the original afterwards.

Another thing I'd like to do is copy parts of a StringIO object's content
to another object. This doesn't seem possible with any shutil method. Any
idea on that?

What one can really wonder, I admit, is why the difference between holding
data two or three times in memory matters that much, especially if the
latter is only for a short time. But as I'm going to use the code that
handles the long string as a core component to some application, I'd like
to make it behave as well as possible.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Copying data between file-like objects

2005-02-16 Thread Thomas Lotze
Fredrik Lundh wrote:

> copyfileobj copies from the current location, and write leaves the file
> pointer at the end of the file.  a s.seek(0) before the copy fixes that.

Damn, this cannot be read from the documentation, and combined with the
fact that there's no length parameter for a portion to copy either, I
thought copying would mean copying all.

> getvalue() returns the contents of the f2 file as a string, and f1 will
> use that string as the buffer.  there's no extra copying.

Oh, good to know. Then StringIO(f2.getvalue()) or StringIO(f2.read())
would be the way to go.

>> Because I want to manipulate a copy of the data and be able to compare
>> it to the original afterwards.
> 
> why not just use a plain string (or a list of strings)?  your focus on
> StringIO sounds like a leftover from some C library you've been using in
> an earlier life ;-)

Because the data can be a lot, and modifying long strings means a lot of
slicing and copying partial strings around, if I understand right.
Modifying a StringIO buffer is possible in-place. Plus, it's easier to
teach an algorithm that works on a StringIO to use a file instead, so I
may be able to avoid reading stuff into memory altogether in certain
places without worrying about special cases.

> use a plain string and slicing.  (if you insist on using StringIO, use
> seek and read)

OK.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: accessor/mutator functions

2005-02-28 Thread Thomas Lotze
Dan Sommers wrote:

> I think I'd add a change_temperature_to method that accepts the target
> temperature and some sort of timing information, depending on how the rest
> of the program and/or thread is structured.

But then you put application logic into a library function. Doing this
consistently leads to a monster of a library that tries to account for all
possible applications. Where does this leave the KISS principle?

> In the case of simply reading the current temperature, and not knowing
> what's inside that device driver, I'd still lean away from exposing a
> current temperature attribute directly.  I think part of my thinking comes
> from my old Pascal days, when it made me cringe to think that "x:=b;"
> might actually execute a subroutine rather than just copy some memory
> around.

Then you also avoid lists, dicts and, ironically, methods. Accessing
methods means to access a callable attribute, after all, with all the
stuff going on behind the scenes on attribute access.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Semantics of propagated exceptions

2006-07-21 Thread Thomas Lotze
Hi,

I wonder how to solve the following problem the most pythonic way:

Suppose you have a function f which, as part of its protocol, raises some
standard exception E under certain, well-defined circumstances. Suppose
further that f calls other functions which may also raise E. How to best
distinguish whether an exception E raised by f has the meaning defined by
the protocol or just comes from details of the implementation?

As an example, let's inherit from dict and replace __getitem__. It is
supposed to raise a KeyError if an item is not found in the mapping. But
what if it does some magic to use default values:

def __getitem__(self, key):
if key in self:
return self[key]
defaults = foobar["default"]
return defaults[key]

If "default" is not in foobar, a KeyError is raised by that lookup and
propagates to the calling code. However, the problem is not "key can't be
found" but "I'm too stupid to find out whether key can be found". In a web
context where key identifies the resource requested, this might make the
difference between a 404 "Not found" and a 500 "Internal server error"
response.

Several solutions come to mind, neither of which I'm satisfied with:

- f might catch E exceptions from the implementation and raise some other
error in their stead, maybe with an appropriate message or treating the
traceback in some helpful way. This destroys the original exception.

- f might catch and re-raise E exceptions, setting some flag on them that
identifies them as protocol exceptions or not. This requires calling code
to know about the flag.

- Calling code might guess whether the exception comes from some inner
working of f from how deep in the calling stack the exception originated.
Obviously, this will not be easy or not even work at all if f calls
related functions which might also raise E with the protocol semantics.
This requires calling code to do some magic but keeps f from having to
catch and raise exceptions all over the place.

Some gut feeling tells me the first option is preferrable, but I'ld like
to read your opinions and maybe other alternatives.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Semantics of propagated exceptions

2006-08-01 Thread Thomas Lotze
Sorry for not answering such a long time. It's because my question
originated from a discussion within our company which moved out of focus
shortly after I posted, and over waiting for some response from them
before replying here, I forgot about it.


Steve Holden wrote:

>> - f might catch E exceptions from the implementation and raise some
>> other error in their stead, maybe with an appropriate message or
>> treating the traceback in some helpful way. This destroys the original
>> exception.
>> 
> My "solution", of course, takes this approach.

Good to see that my "gut feeling" as to the most pythonic approach seems
to coincide with the answers I've received ;o)

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Graphing Utilities.

2005-05-10 Thread Thomas Lotze
Kenneth Miller wrote:

> I am new to Python and i was wondering what graphing utlities would be
> available to me. I have already tried BLT and after weeks of unsuccesful
> installs i'd like to find something else. Anything someone would
> recommend?

You might also want to check out PyX: .

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Hi,

I'm trying to figure out what is the most pythonic way to interact with
a generator.

The task I'm trying to accomplish is writing a PDF tokenizer, and I want
to implement it as a Python generator. Suppose all the ugly details of
toknizing PDF can be handled (such as embedded streams of arbitrary
binary content). There remains one problem, though: In order to get
random file access, the tokenizer should not simply spit out a series of
tokens read from the file sequentially; it should rather be possible to
point it at places in the file at random.

I can see two possibilities to do this: either the current file position
has to be read from somewhere (say, a mutable object passed to the
generator) after each yield, or a new generator needs to be instantiated
every time the tokenizer is pointed to a new file position.

The first approach has both the disadvantage that the pointer value is
exposed and that due to the complex rules for hacking a PDF to tokens,
there will be a lot of yield statements in the generator code, which
would make for a lot of pointer assignments. This seems ugly to me.

The second approach is cleaner in that respect, but pointing the
tokenizer to some place has now the added semantics of creating a whole
new generator instance. The programmer using the tokenizer now needs to
remember to throw away any references to the generator each time the
pointer is reset, which is also ugly.

Does anybody here have a third way of dealing with this? Otherwise,
which ugliness is the more pythonic one?

Thanks a lot for any ideas.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Peter Hansen wrote:

> Thomas Lotze wrote:
>> I can see two possibilities to do this: either the current file position
>> has to be read from somewhere (say, a mutable object passed to the
>> generator) after each yield, [...]
> 
> The third approach, which is certain to be cleanest for this situation, is
> to have a custom class which stores the state information you need, and
> have the generator simply be a method in that class.

Which is, as far as the generator code is concerned, basically the same as
passing a mutable object to a (possibly standalone) generator. The object
will likely be called self, and the value is stored in an attribute of it.

Probably this is indeed the best way as it doesn't require the programmer
to remember any side-effects.

It does, however, require a lot of attribute access, which does cost some
cycles.

A related problem is skipping whitespace. Sometimes you don't care about
whitespace tokens, sometimes you do. Using generators, you can either set
a state variable, say on the object the generator is an attribute of,
before each call that requires a deviation from the default, or you can
have a second generator for filtering the output of the first. Again, both
solutions are ugly (the second more so than the first). One uses
side-effects instead of passing parameters, which is what one really
wants, while the other is dumb and slow (filtering can be done without
taking a second look at things).

All of this makes me wonder whether more elaborate generator semantics
(maybe even allowing for passing arguments in the next() call) would not
be useful. And yes, I have read the recent postings on PEP 343 - sigh.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Mike Meyer wrote:

> Yes, such a switch gets the desired behavior as a side effect. Then again,
> a generator that returns tokens has a desired behavior (advancing to the
> next token) as a side effect(*).

That's certainly true.

> If you think about these things as the
> state of the object, rather than "side effects", it won't seem nearly as
> ugly. In fact, part of the point of using a class is to encapsulate the
> state required for some activity in one place.
> 
> Wanting to do everything via parameters to methods is a very top-down way
> of looking at the problem. It's not necessarily correct in an OO
> environment.

What worries me about the approach of changing state before making a
next() call instead of doing it at the same time by passing a parameter is
that the state change is meant to affect only a single call. The picture
might fit better (IMO) if it didn't look so much like working around the
fact that the next() call can't take parameters for some technical reason.

I agree that decoupling state changes and next() calls would be perfectly
beautiful if they were decoupled in the problem one wants to model. They
aren't.

> *) It's noticable that some OO languages/libraries avoid this side
> effect: the read method updates an attribute, so you do the read then
> get the object read from the attribute. That's very OO, but not very
> pythonic.

Just out of curiosity: What makes you state that that behaviour isn't
pythonic? Is it because Python happens to do it differently, because of a
gut feeling, or because of some design principle behind Python I fail to
see right now?

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Peter Hansen wrote:

> Fair enough, but who cares what the generator code thinks?  It's what the
> programmer has to deal with that matters, and an object is going to have a
> cleaner interface than a generator-plus-mutable-object.

That's right, and among the choices discussed, the object is the one I do
prefer. I just don't feel really satisfied...

>> It does, however, require a lot of attribute access, which does cost
>> some cycles.
> 
> Hmm... "premature optimization" is all I have to say about that.

But when is the right time to optimize? There's a point when the thing
runs, does the right thing and - by the token of "make it run, make it
right, make it fast" - might get optimized. And if there are places in a
PDF library that might justly be optimized, the tokenizer is certainly one
of them as it gets called really often.

Still, I'm going to focus on cleaner code and, first and foremost, a clean
API if it comes to a decision between these goals and optimization - at
least as long as I'm talking about pure Python code.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-12 Thread Thomas Lotze
Thomas Lotze wrote:

> Does anybody here have a third way of dealing with this?

Sleeping a night sometimes is an insightful exercise *g*

I realized that there is a reason why fiddling with the pointer from
outside the generator defeats much of the purpose of using one. The
implementation using a simple method call instead of a generator needs
to store some internal state variables on an object to save them for the
next call, among them the pointer and a tokenization mode.

I could make the thing a generator by turning the single return
statement into a yield statement and adding a loop, leaving all the
importing and exporting of the pointer intact - after all, someone might
reset the pointer between next() calls.

This is, however, hardly using all the possibilities a generator allows.
I'd rather like to get rid of the mode switches by doing special things
where I detect the need for them, yielding the result, and proceeding as
before. But as soon as I move information from explicit (state variables
that can be reset along with the pointer) to implicit (the point where
the generator is suspended after yielding a token), resetting the
pointer will lead to inconsistencies.

So, it seems to me that if I do want to use generators for any practical
reason instead of just because generators are way cool, they need to be
instantiated anew each time the pointer is reset, for simple consistency
reasons.

Now a very simple idea struck me: If one is worried about throwing away
a generator as a side-effect of resetting the tokenization pointer, why
not define the whole tokenizer as not being resettable? Then the thing
needs to be re-instantiated very explicitly every time it is pointed
somewhere. While still feeling slightly awkward, it has lost the threat
of doing unexpected things.

Does this sound reasonable?

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-12 Thread Thomas Lotze
Thomas Lotze wrote:

> A related problem is skipping whitespace. Sometimes you don't care about
> whitespace tokens, sometimes you do. Using generators, you can either set
> a state variable, say on the object the generator is an attribute of,
> before each call that requires a deviation from the default, or you can
> have a second generator for filtering the output of the first.

Last night's sleep was really productive - I've also found another way
to tackle this problem, and it's really simple IMO. One could pass the
parameter at generator instantiation time and simply create two
generators behaving differently. They work on the same data and use the
same source code, only with a different parametrization.

All one has to care about is that they never get out of sync. If the
data pointer is an object attribute, it's clear how to do it. Otherwise,
both could acquire their data from a common generator that yields the
PDF content (or a buffer representing part of it) character by
character. This is even faster than keeping a pointer and using it as an
index on the data.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why python on debian without the module profile?

2005-06-13 Thread Thomas Lotze
kyo guan wrote:

> ImportError: No module named profile

They moved it to non-free because the module's license isn't DFSG
compliant.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-13 Thread Thomas Lotze
Thomas Lotze wrote:

> I'm trying to figure out what is the most pythonic way to interact with a
> generator.

JFTR, so you don't think I'd suddenly lost interest: I won't be able to
respond for a couple of days because I've just incurred a nice little
hospital session... will be back next week.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Minimally intrusive XML editing using Python

2009-11-18 Thread Thomas Lotze
I wonder what Python XML library is best for writing a program that makes
small modifications to an XML file in a minimally intrusive way. By that I
mean that information the program doesn't recognize is kept, as are
comments and whitespace, the order of attributes and even whitespace
around attributes. In short, I want to be able to change an XML file while
producing minimal textual diffs.

Most libraries don't allow controlling the order of and the whitespace
around attributes, so what's generally left to do is store snippets of
original text along with the model objects and re-use that for writing the
edited XML if the model wasn't modified by the program. Does a library
exist that helps with this? Does any XML library at all allow structured
access to the text representation of a tag with its attributes?

Thank you very much.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Thomas Lotze
Stefan Behnel wrote:

> Take a look at canonical XML (C14N). In short, that's the only way to get a
> predictable XML serialisation that can be used for textual diffs. It's
> supported by lxml.

Thank you for the pointer. IIUC, c14n is about changing an XML document so
that its textual representation is reproducible. While this representation
would certainly solve my problem if I were to deal with input that's
already in c14n form, it doesn't help me handling arbitrarily formatted
XML in a minimally intrusive way.

IOW, I don't want the XML document to obey the rules of a process, but
instead I want a process that respects the textual form my input happens
to have.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Thomas Lotze
Chris Rebert wrote:

> Have you considered using an XML-specific diff tool such as:

I'm afraid I'll have to fall back to using such a thing if I don't find a
solution to what I actually want to do.

I do realize that XML isn't primarily about its textual representation, so
I guess I shouldn't be surprised if what I'm looking for doesn't exist.
Still, it would be nice if it did...

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-23 Thread Thomas Lotze
Please consider this a reply to any unanswered messages I received in
response to my original post.

Dave Angel wrote:

> What's your real problem, or use case?  Are you just concerned with 
> diffing, or are others likely to read the xml, and want it formatted the 
> way it already is?

I'd like to put the XML under revision control along with other stuff.
Other people should be able to make sense of the diffs and I'd rather not
require them to configure their tools to use some XML differ.

> And how general do you need this tool to be?  For 
> example, if the only thing you're doing is modifying existing attributes 
> or existing tags, the "minimal change" would be pretty unambiguous.  But 
> if you're adding tags, or adding content on what was an empty element, 
> then the requirement gets fuzzy  And finding an existing library for 
> something "fuzzy" is unlikely.

Sure. I guess it's something like an 80/20 problem: Changing attributes in
a way that keeps the rest of the XML intact will go a long way and as
we're talking about XML that is supposed to be looked at by humans, I
would base any further requirements on the assumption that it's
pretty-printed in some way so that removing an element, for example, can
be defined by touching as few lines as possible, and adding one can be
restricted to adding a line in the appropriate place. If more complex
stuff isn't as well-defined, that would be entirely OK with me.

> Sample input, change list, and desired output would be very  useful.

I'd like to be able to reliably produce a diff like this using a program
that lets me change the value in some useful way, which might be dragging
a point across a map with the mouse in this example:

--- foo.gpx 2009-05-30 19:45:45.0 +0200
+++ bar.gpx 2009-11-23 17:41:36.0 +0100
@@ -11,7 +11,7 @@
   0.792244
   2d
 
-
+
   508.30
 2009-05-30T16:37:10Z
   15.15


-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list