Re: [Python-Dev] Possible bug in codecs readline? It breaks lines apart.

2005-01-09 Thread Irmen de Jong
Simon Percivall wrote:
It looks like the readline method broke at revision 1.36 of codecs.py,
when it was modified, yes.
Okay. I've created a bug report 1098990: codec readline() splits lines apart
--Irmen
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] an idea for improving struct.unpack api

2005-01-09 Thread Ilya Sandler
> (a) A higher-level API can and should be constructed which acts like a
> (binary) stream but has additional methods for reading and writing
> values using struct format codes (or, preferably, somewhat
> higher-level type names, as suggested). Instances of this API should
> be constructable from a stream or from a "buffer" (e.g. a string).


Ok, I think it's getting much bigger than what I was initially aiming for
;-)...

One more comment though regarding unpack_at

> Then the definition would be:
>
> def unpack_at(fmt, buf, pos):
> size = calcsize(fmt)
> end = pos + size
> data = buf[pos:end]
> if len(data) < size:
> raise struct.error("not enough data for format")
> ret = unpack(fmt, data)
> ret = ret + (end,)
> return ret

While I see usefulness of this, I think it's a too limited, eg.
  result=unpack_at(fmt,buf, offset)
  offset=result.pop()
feels quite unnatural...
So my feeling is that adding this new API is not worth the trouble.
Especially if there are plans for anything higher level...

Instead, I would suggest that even a very limited initial
implementation of StructReader() like object suggested by Raymond would
be more useful...

class StructReader: #or maybe call it Unpacker?
def __init__(self, buf):
self._buf=buf
self._offset=0
def unpack(self, format):
"""unpack at current offset, advance internal offset
  accordingly"""
  size=struct.calcize(format)
  self._pos+=size
  ret=struct.unpack(format, self._buf[self._pos:self._pos+size)
  return ret
 #or may be just make _offset public??
 def tell(self):
"return current offset"
return self._offset
 def seek(self, offset, whence=0):
"set current offset"
self._offset=offset

This solves the original offset tracking problem completely (at least as
far as inconvenience is concerned, improving unpack() perfomance
would require struct reader to be written in C) , while allowing to add
the rest later.

E.g the original "hdr+variable number of data items" code would
look:

 buf=StructReader(rec)
 hdr=buf.unpack("")
 for i in range(hdr[0]):
item=buf.unpack( "")


Ilya


PS with unpack_at() this code would look like:

 offset=0
 hdr=buf.unpack("", offset)
 offset=hdr.pop()
 for i in range(hdr[0]):
item=buf.unpack( "",offset)
offset=item.pop()




On Sat, 8 Jan 2005, Guido van Rossum wrote:

> First, let me say two things:
>
> (a) A higher-level API can and should be constructed which acts like a
> (binary) stream but has additional methods for reading and writing
> values using struct format codes (or, preferably, somewhat
> higher-level type names, as suggested). Instances of this API should
> be constructable from a stream or from a "buffer" (e.g. a string).
>
> (b) -1 on Ilya's idea of having a special object that acts as an
> input-output integer; it is too unpythonic (no matter your objection).
>
> [Paul Moore]
> > OTOH, Nick's idea of returning a tuple with the new offset might make
> > your example shorter without sacrificing readability:
> >
> > result, newpos = struct.unpack('>l', self.__buf, self.__pos)
> > self.__pos = newpos # retained "newpos" for readability...
> > return result
>
> This is okay, except I don't want to overload this on unpack() --
> let's pick a different function name like unpack_at().
>
> > A third possibility - rather than "magically" adding an additional
> > return value because you supply a position, you could have a "where am
> > I?" format symbol (say & by analogy with the C "address of" operator).
> > Then you'd say
> >
> > result, newpos = struct.unpack('>l&', self.__buf, self.__pos)
> >
> > Please be aware, I don't have a need myself for this feature - my
> > interest is as a potential reader of others' code...
>
> I think that adding more magical format characters is probably not
> doing the readers of this code a service.
>
> I do like the idea of not introducing an extra level of tuple to
> accommodate the position return value but instead make it the last
> item in the tuple when using unpack_at().
>
> Then the definition would be:
>
> def unpack_at(fmt, buf, pos):
> size = calcsize(fmt)
> end = pos + size
> data = buf[pos:end]
> if len(data) < size:
> raise struct.error("not enough data for format")
> # if data is too long that would be a bug in buf[pos:size] and
> cause an error below
> ret = unpack(fmt, data)
> ret = ret + (end,)
> return ret
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible bug in codecs readline? It breaks lines apart.

2005-01-09 Thread Irmen de Jong
Okay. I've created a bug report 1098990: codec readline() splits lines 
apart
Btw, I've set it to group Python 2.5, is that correct?
Or should bugs that relate to the current CVS trunk have no group?
Thx
Irmen.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: csv module TODO list

2005-01-09 Thread Andrew McNamara
>I'd love to see a 'split' and a 'join' function in the csv module to
>just convert between string and list without having to bother about
>files. 
>
>Something like
>
>csv.split(aStr [, dialect='excel'[, fmtparam]])  -> list object
>
>and
>
>csv.join(aList, e[, dialect='excel'[, fmtparam]]) -> str object
>
>Feasible?

Yes, it's feasible, although newlines can be embedded in within fields
of a CSV record, hence the use of the iterator, rather than working with
strings. In your example above, if the parser gets to the end of the
string and finds it's still within a field, I'd propose just raising
an exception.

No promises, however - I only have a finite ammount of time to work on
this at the moment.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] an idea for improving struct.unpack api

2005-01-09 Thread Raymond Hettinger
> Instead, I would suggest that even a very limited initial
> implementation of StructReader() like object suggested by Raymond
would
> be more useful...

I have a draft patch also.
Let's work out improvements off-list (perhaps on ASPN).
Feel free to email me directly.


Raymond

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: [Csv] Minor change to behaviour of csv module

2005-01-09 Thread Andrew McNamara
>> Andrew explains that in the CSV module, escape characters are not
>> properly removed.
>>
>> Magnus writes:
>>> IMO this is the *only* reasonable behaviour. I don't understand why
>>> the escape character should be left in; this is one of the reason why
>>> UNIX-style colon-separated values don't work with the current module.
>>
>> Andrew writes back later:
>>> Thinking about this further, I suspect we have to retain the current
>>> behaviour, as broken as it is, as the default: it's conceivable that
>>> someone somewhere is post-processing the result to remove the 
>>> backslashes,
>>> and if we fix the csv module, we'll break their code.
>>
>> I'm with Magnus on this. No one has 4 year old code using the CSV 
>> module.
>> The existing behavior is just simply WRONG. Sure, of course we should
>> try to maintain backward compatibility, but surely SOME cases don't
>> require it, right? Can't we treat this misbehavior as an outright bug?
>
>+1 -- the nonremoval of escape characters smells like a bug to me, too.

Okay, I'm glad the community agrees (less work, less crustification).

For what it's worth, it wasn't a bug so much as a misfeature. I was
explicitly adding the escape character back in. The intention was to
make the feature more forgiving on users who accidently set the escape
character - in other words, only special (quoting, escaping, field
delimiter) characters received special treatment. With the benefit of
hindsight, that was an inadequately considered choice.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] csv module and universal newlines

2005-01-09 Thread Andrew McNamara
This item, from the TODO list, has been bugging me for a while:

>* Reader and universal newlines don't interact well, reader doesn't
>  honour Dialect's lineterminator setting. All outstanding bug id's
>  (789519, 944890, 967934 and 1072404) are related to this - it's 
>  a difficult problem and further discussion is needed.

The csv parser consumes lines from an iterator, but it also has it's own
idea of end-of-line conventions, which are currently only used by the
writer, not the reader, which is a source of much confusion. The writer,
by default, also attempts to emit a \r\n sequence, which results in more
confusion unless the file is opened in binary mode.

I'm looking for suggestions for how we can mitigate these problems
(without breaking things for existing users).

The standard file iterator includes the end-of-line characters in the
returned string. One potentional solution is, then, to ignore the line
chunking done by the file iterator, and logically concatenate the source
lines until the csv parser's idea of lineterminator is seen - but this
defeats negates the benefits of using an iterator.

Another option might be to provide a new interface that relies on a
file-like object being supplied. The lineterminator character would only
be used with this interface, with the current interface falling back to
using only \n. Rather a drastic solution.

Any other ideas?

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com