[ python-Feature Requests-1706256 ] Give Partial the ability to skip positionals

2007-04-26 Thread SourceForge.net
Feature Requests item #1706256, was opened at 2007-04-24 02:18
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1706256&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Calvin Spealman (ironfroggy)
Assigned to: Nobody/Anonymous (nobody)
Summary: Give Partial the ability to skip positionals

Initial Comment:
There are some situations where you want to skip positional arguments in a use 
of a partial function. In other words, you want to create a partial that 
applies positional arguments out of order or without applying values to one or 
more lower positional arguments. In some cases keyword arguments can be used 
instead, but this has two obvious drawbacks. Firstly, it causes the caller to 
rely on the name of a positional in a callee, which breaks encapsulation. 
Secondly, on the case of the function being applied to being a builtin, it 
fails completely, as they will not take positional arguments by name at all. I 
propose a class attribute to the partial type, 'skip', which will be a 
singleton to pass to a partial object signifying this skipping of positionals. 
The following example demonstrates.

from functools import partial

def add(a, b):
return a + b

append_abc = partial(add, partial.skip, "abc")
assert append_abc("xyz") == "xyzabc"

Obviously this example would break if used as partial(add, b="abc") and the 
maintainer of add changed the positional names to 'first' and 'second' or 'pre' 
and 'post', which is perfectly reasonable. We do not need to expect the names 
of our positional arguments are depended upon. It would also break when someone 
gets smart and replaces the add function with operator.add, of course.

--

>Comment By: Martin v. Löwis (loewis)
Date: 2007-04-26 09:50

Message:
Logged In: YES 
user_id=21627
Originator: NO

Based on this feedback, I'm rejecting the patch. Thanks for proposing it,
anyway.

--

Comment By: Alexander Belopolsky (belopolsky)
Date: 2007-04-26 08:32

Message:
Logged In: YES 
user_id=835142
Originator: NO

I am actually -1 on the concept as well.  If you go to the trouble of
having a skip singleton in the language, then partial application syntax
should just be call syntax as in add(skip, 2). However this will not be
python anymore.  Other languages that have partial application support use
special syntax such as add(,2) or add(:,2).  Since adding syntax is out of
the question, it is hard to argue that partial(add, partial.skip, 2) is so
much better than lambda x: add(x,2).

--

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-26 08:15

Message:
Logged In: YES 
user_id=80475
Originator: NO

-1 on the concept for this patch.  We're working too hard to avoid simple
uses of lambda or def.  The additonal complexity and impact on readability
isn't work it.

--

Comment By: Alexander Belopolsky (belopolsky)
Date: 2007-04-26 06:14

Message:
Logged In: YES 
user_id=835142
Originator: NO

If you remove partial_type.tp_dict = PyDict_New(); at line 309, the patch
will pass test_functools.

A few comments:

Design:
1. partial.skip should be an instance of a singleton class (look at
NoneType implementation in object.c)
2. repr(partial.skip) should be 'functools.partial.skip' (easy to
implement once #1 is done)

Implementation:
1.  In the loop over pto->args you know that i < npargs, so you can use
PyTuple_GET_ITEM and there is no need to check for arg==NULL
2. You should check PyTuple_GetItem(args, pull_index) for null and return
with error if too few arguments is supplied.  Better yet, find number of
supplied args outside the loop and raise your own error if pool_index grows
to that number.
3. It looks like you are leaking references. I don't see where you decref
ptoargscopy and arg after concatenation.


--

Comment By: Calvin Spealman (ironfroggy)
Date: 2007-04-26 06:10

Message:
Logged In: YES 
user_id=112166
Originator: YES

Hmm, I didn't get such an error here on VC with the extra argument. I'll
change that here, too.

I figured the breaking of the subclassing of partial was related to what I
do with tp_dict, but I don't understand how. How did I blow it up? And,
yes, I will write tests for the new functionality, of course.

--

Comment By: Alexander Belopolsky (belopolsky)
Date: 2007-04-26 05:34

Message:
Logged In: YES 
us

[ python-Feature Requests-1692592 ] Stripping debugging symbols from compiled C extensions

2007-04-26 Thread SourceForge.net
Feature Requests item #1692592, was opened at 2007-04-02 01:00
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1692592&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Distutils
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Viktor Ferenczi (complex)
Assigned to: Nobody/Anonymous (nobody)
Summary: Stripping debugging symbols from compiled C extensions

Initial Comment:
It would be nice to automatically strip debugging symbols from compiled library 
files (such as .so files) if a C extension module is not build in debug mode. 
This could somehow reduce memory footprint and storage requirements for 
extension modules.

Distutils already does this for cygwin and emx compilers with the following 
code in cygwinccompiler.py and emxccompiler.py:

# who wants symbols and a many times larger output file
# should explicitly switch the debug mode on
# otherwise we let dllwrap/ld strip the output file
# (On my machine: 10KB < stripped_file < ??100KB
#   unstripped_file = stripped_file + XXX KB
#  ( XXX=254 for a typical python extension))
if not debug:
extra_preargs.append("-s")

This code should be somehow integrated into base compiler classes such as 
UnixCCompiler. I've added the following at the beginning of UnixCCompiler.link 
function:

if not debug:
if extra_preargs is None:
extra_preargs = []
extra_preargs.append("-s")

This works for me with gcc under Linux (Debian Sarge). I does not provide a 
patch, since this could be the best solution for this.

--

>Comment By: Martin v. Löwis (loewis)
Date: 2007-04-26 09:57

Message:
Logged In: YES 
user_id=21627
Originator: NO

Just for the record: stripping symbols does *not* reduce memory footprint.
It does (only) reduce storage requirements.

One problem with implementing the feature is to make it work portably on
all systems. Adding -s to the linker command line is most likely not
portable, i.e. some systems may not support that. Python currently ignores
the concept of stripping entirely (not just for extension modules, but also
for the interpreter itself, partially due to the problem that adding that
further complicates portability. 

On Linux, this problem is commonly solved by distributors performing the
stripping as part of the packaging utilities. E.g. for Debian, just add
dh_strip into the debian/rules file, and the packaging
will strip all binaries according to the Debian policy.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1692592&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1682241 ] Problems with urllib2 read()

2007-04-26 Thread SourceForge.net
Bugs item #1682241, was opened at 2007-03-16 12:00
Message generated for change (Comment added) made by ironfroggy
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1682241&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lucas Malor (lucas_malor)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problems with urllib2 read()

Initial Comment:
urllib2 objects opened with urlopen() does not have the method seek() as file 
objects. So reading only some bytes from opened urls is pratically forbidden.

An example: I tried to open an url and check if it's a gzip file. If IOError is 
raised I read the file (to do this I applied the #1675951 patch: 
https://sourceforge.net/tracker/index.php?func=detail&aid=1675951&group_id=5470&atid=305470
 )

But after I tried to open the file as gzip, if it's not a gzip file the current 
position in the urllib object is on the second byte. So read() returns the data 
from the 3rd to the last byte. You can't check the header of the file before 
storing it on hd. Well, so what is urlopen() for? If I must store the file by 
url on hd and reload it, I can use urlretrieve() ...

--

Comment By: Calvin Spealman (ironfroggy)
Date: 2007-04-26 09:55

Message:
Logged In: YES 
user_id=112166
Originator: NO

I have to agree that this is not a bug. HTTP responses are strings, not
random access files. Adding a seek would have disastrous performance
penalties. If you think the work around is too complicated, I can't
understand why.

--

Comment By: Zacherates (maenpaa)
Date: 2007-03-20 21:39

Message:
Logged In: YES 
user_id=1421845
Originator: NO

> I use the method you wrote, but this must be done manually,
> and I don't know why.
read() is a stream processing method, whereas seek() is a random access
processing method.  HTTP resources are in essence streams so they implement
read() but not seek().  Trying to shoehorn a stream to act like a random
access file has some rather important technical implications.  For example:
what happens when an HTTP resource is larger than available memory and we
try to maintain a full featured seek() implementation?

> so what is urlopen() for?
Fetching a webpage or RSS feed and feeding it to a parser, for example.

StringIO is a class that was designed to implement feature complete,
random access, file-like object behavior that can be wrapped around a
stream.  StringIO can and should be used as an adapter for when you have a
stream that you need random access to.  This allows designers the freedom
to simply implement a good read() implementation and let clients wrap the
output in a StringIO if needed.

If in your application you always want random access and you don't have to
deal with large files:
def my_urlopen(*args, **kwargs):
   return StringIO.StringIO(urllib2.urlopen(*args, **kwargs).read())

Python makes delegation trivially easy.

In essence, urlfiles (the result of urllib2.urlopen()) and regular files
(the result of open()) behave differently because they implement different
interfaces.  If you use the common interface (read), then you can treat
them equally.  If you use the specialized interface (seek, tell, etc.)
you'll have trouble.  The solution is wrap the general objects in a
specialized object that implements the desired interface, StringIO.

--

Comment By: Lucas Malor (lucas_malor)
Date: 2007-03-20 04:59

Message:
Logged In: YES 
user_id=1403274
Originator: YES

> If you need to seek, you can wrap the file-like object in a
> StringIO (which is what urllib would have to do internally
> [...] )

I think it's really a bug, or at least a non-pythonic method.
I use the method you wrote, but this must be done manually,
and I don't know why. Actually without this "trick" you can't
handle url and file objects together as they don't work in
the same manner. I think it's not too complicated using the
internal StringIO object in urllib class when I must seek()
or use other file-like methods.

> You can check the type of the response content before you try
> to uncompress it via the Content-Encoding header of the
> response

It's not a generic solution

(thanks anyway for suggested solutions :) )

--

Comment By: Zacherates (maenpaa)
Date: 2007-03-19 22:43

Message:
Logged In: YES 
user_id=1421845
Originator: NO

I'd contend that this is not a bug:
 * If you need to seek, you can wrap the file-like object in a StringIO
(which is what urllib would have to do internally, thus incurring the
StringIO overhead for 

[ python-Bugs-1708316 ] doctest work with Windows PyReadline

2007-04-26 Thread SourceForge.net
Bugs item #1708316, was opened at 2007-04-26 12:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708316&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: manuelg (manuelg_)
Assigned to: Nobody/Anonymous (nobody)
Summary: doctest work with Windows PyReadline

Initial Comment:
doctest crashes when working with Windows PyReadline (PyReadline is a component 
of Windows IPython)

PyReadline expects "_SpoofOut" to have an "encoding" attribute

E
==
ERROR: testDocTest (__main__.TestDocTest)
--
Traceback (most recent call last):
  File "test_freecell_solver.py", line 26, in testDocTest
r = doctest.testmod(freecell_solver)
  File "c:\Python25\Lib\doctest.py", line 1799, in testmod
runner.run(test)
  File "c:\Python25\Lib\doctest.py", line 1335, in run
self.debugger = _OutputRedirectingPdb(save_stdout)
  File "c:\Python25\Lib\doctest.py", line 320, in __init__
pdb.Pdb.__init__(self, stdout=out)
  File "c:\Python25\Lib\pdb.py", line 66, in __init__
import readline
  File "C:\Python25\Lib\site-packages\readline.py", line 5, in 
from pyreadline import *
  File "C:\Python25\Lib\site-packages\pyreadline\__init__.py", line 10, in 

from rlmain import *
  File "C:\Python25\Lib\site-packages\pyreadline\rlmain.py", line 13, in 

import clipboard,logger,console
  File "C:\Python25\Lib\site-packages\pyreadline\console\__init__.py", line 
14,in 
from console import *
  File "C:\Python25\Lib\site-packages\pyreadline\console\console.py", line 
118,in 
consolecodepage=sys.stdout.encoding
AttributeError: _SpoofOut instance has no attribute 'encoding'

This is an easy fix with 2 lines of code to doctest.py

right after doctest.py imports "sys", store the "sys.stdout.encoding"

_sys_stdout_encoding = sys.stdout.encoding

Then add this as an attribute "encoding" in the "_SpoofOut" class

# Override some StringIO methods.
class _SpoofOut(StringIO):



encoding = _sys_stdout_encoding



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708316&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1708326 ] find_module doc ambiguity

2007-04-26 Thread SourceForge.net
Bugs item #1708326, was opened at 2007-04-26 13:18
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708326&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Andrew McNabb (amcnabb)
Assigned to: Nobody/Anonymous (nobody)
Summary: find_module doc ambiguity

Initial Comment:
The doc string for find_module doesn't make it clear that you can do:

stats_path = imp.find_module('scipy/stats')

It makes it sound like you would have to do:

scipy_path = imp.find_module('scipy')[1]
stats_path = imp.find_module('scipy', stats_path)[1]

However, the shorter snippet seems to work just fine.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708326&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1708362 ] Why not sequential?

2007-04-26 Thread SourceForge.net
Bugs item #1708362, was opened at 2007-04-26 22:38
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708362&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lucas Malor (lucas_malor)
Assigned to: Nobody/Anonymous (nobody)
Summary: Why not sequential?

Initial Comment:
In my opinion it's not complicated, it's convoluted. I must use two object to 
handle one data stream.

Furthermore it's a waste of resources. I must copy data to another object. 
Luckily in my script I download and handle only little files. But what if a 
python program must handle big files?

If seek() can't be used (an except is raised), urllib could use a sequential 
access method.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708362&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1708362 ] Why not sequential?

2007-04-26 Thread SourceForge.net
Bugs item #1708362, was opened at 2007-04-26 22:38
Message generated for change (Settings changed) made by lucas_malor
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708362&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Lucas Malor (lucas_malor)
Assigned to: Nobody/Anonymous (nobody)
Summary: Why not sequential?

Initial Comment:
In my opinion it's not complicated, it's convoluted. I must use two object to 
handle one data stream.

Furthermore it's a waste of resources. I must copy data to another object. 
Luckily in my script I download and handle only little files. But what if a 
python program must handle big files?

If seek() can't be used (an except is raised), urllib could use a sequential 
access method.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708362&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1682241 ] Problems with urllib2 read()

2007-04-26 Thread SourceForge.net
Bugs item #1682241, was opened at 2007-03-16 17:00
Message generated for change (Comment added) made by lucas_malor
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1682241&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lucas Malor (lucas_malor)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problems with urllib2 read()

Initial Comment:
urllib2 objects opened with urlopen() does not have the method seek() as file 
objects. So reading only some bytes from opened urls is pratically forbidden.

An example: I tried to open an url and check if it's a gzip file. If IOError is 
raised I read the file (to do this I applied the #1675951 patch: 
https://sourceforge.net/tracker/index.php?func=detail&aid=1675951&group_id=5470&atid=305470
 )

But after I tried to open the file as gzip, if it's not a gzip file the current 
position in the urllib object is on the second byte. So read() returns the data 
from the 3rd to the last byte. You can't check the header of the file before 
storing it on hd. Well, so what is urlopen() for? If I must store the file by 
url on hd and reload it, I can use urlretrieve() ...

--

>Comment By: Lucas Malor (lucas_malor)
Date: 2007-04-26 22:41

Message:
Logged In: YES 
user_id=1403274
Originator: YES

In my opinion it's not complicated, it's convoluted. I must use two
object
to handle one data stream.

Furthermore it's a waste of resources. I must copy data to another
object.
Luckily in my script I download and handle only little files. But what if
a
python program must handle big files?

If seek() can't be used (an except is raised), urllib could use a
sequential access method.

--

Comment By: Calvin Spealman (ironfroggy)
Date: 2007-04-26 15:55

Message:
Logged In: YES 
user_id=112166
Originator: NO

I have to agree that this is not a bug. HTTP responses are strings, not
random access files. Adding a seek would have disastrous performance
penalties. If you think the work around is too complicated, I can't
understand why.

--

Comment By: Zacherates (maenpaa)
Date: 2007-03-21 02:39

Message:
Logged In: YES 
user_id=1421845
Originator: NO

> I use the method you wrote, but this must be done manually,
> and I don't know why.
read() is a stream processing method, whereas seek() is a random access
processing method.  HTTP resources are in essence streams so they implement
read() but not seek().  Trying to shoehorn a stream to act like a random
access file has some rather important technical implications.  For example:
what happens when an HTTP resource is larger than available memory and we
try to maintain a full featured seek() implementation?

> so what is urlopen() for?
Fetching a webpage or RSS feed and feeding it to a parser, for example.

StringIO is a class that was designed to implement feature complete,
random access, file-like object behavior that can be wrapped around a
stream.  StringIO can and should be used as an adapter for when you have a
stream that you need random access to.  This allows designers the freedom
to simply implement a good read() implementation and let clients wrap the
output in a StringIO if needed.

If in your application you always want random access and you don't have to
deal with large files:
def my_urlopen(*args, **kwargs):
   return StringIO.StringIO(urllib2.urlopen(*args, **kwargs).read())

Python makes delegation trivially easy.

In essence, urlfiles (the result of urllib2.urlopen()) and regular files
(the result of open()) behave differently because they implement different
interfaces.  If you use the common interface (read), then you can treat
them equally.  If you use the specialized interface (seek, tell, etc.)
you'll have trouble.  The solution is wrap the general objects in a
specialized object that implements the desired interface, StringIO.

--

Comment By: Lucas Malor (lucas_malor)
Date: 2007-03-20 09:59

Message:
Logged In: YES 
user_id=1403274
Originator: YES

> If you need to seek, you can wrap the file-like object in a
> StringIO (which is what urllib would have to do internally
> [...] )

I think it's really a bug, or at least a non-pythonic method.
I use the method you wrote, but this must be done manually,
and I don't know why. Actually without this "trick" you can't
handle url and file objects together as they don't work in
the same manner. I think it's not too complicated using the
internal StringIO object in urllib class when I must seek()
or use other file-like methods.

> You can chec

[ python-Bugs-1708362 ] Why not sequential?

2007-04-26 Thread SourceForge.net
Bugs item #1708362, was opened at 2007-04-26 22:38
Message generated for change (Settings changed) made by lucas_malor
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708362&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
>Group: Trash
>Status: Deleted
>Resolution: Duplicate
Priority: 5
Private: No
Submitted By: Lucas Malor (lucas_malor)
Assigned to: Nobody/Anonymous (nobody)
Summary: Why not sequential?

Initial Comment:
In my opinion it's not complicated, it's convoluted. I must use two object to 
handle one data stream.

Furthermore it's a waste of resources. I must copy data to another object. 
Luckily in my script I download and handle only little files. But what if a 
python program must handle big files?

If seek() can't be used (an except is raised), urllib could use a sequential 
access method.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1708362&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1682241 ] Problems with urllib2 read()

2007-04-26 Thread SourceForge.net
Bugs item #1682241, was opened at 2007-03-16 12:00
Message generated for change (Comment added) made by maenpaa
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1682241&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lucas Malor (lucas_malor)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problems with urllib2 read()

Initial Comment:
urllib2 objects opened with urlopen() does not have the method seek() as file 
objects. So reading only some bytes from opened urls is pratically forbidden.

An example: I tried to open an url and check if it's a gzip file. If IOError is 
raised I read the file (to do this I applied the #1675951 patch: 
https://sourceforge.net/tracker/index.php?func=detail&aid=1675951&group_id=5470&atid=305470
 )

But after I tried to open the file as gzip, if it's not a gzip file the current 
position in the urllib object is on the second byte. So read() returns the data 
from the 3rd to the last byte. You can't check the header of the file before 
storing it on hd. Well, so what is urlopen() for? If I must store the file by 
url on hd and reload it, I can use urlretrieve() ...

--

Comment By: Zacherates (maenpaa)
Date: 2007-04-26 23:36

Message:
Logged In: YES 
user_id=1421845
Originator: NO

> In my opinion it's not complicated, it's convoluted. I must use two
> object to handle one data stream.

seek() is not a stream operation. It is a random access operation
(file-like != stream). If you were only trying to use stream operations
then you wouldn't have these problems.   

Each class provides a separate functionality, urllib gets the file while
StringIO stores it.  The fact that these responsibilities are given to
different classes should not be surprising since the represent separately
useful concepts that abstract different things.  It's not convoluted, it's
good design.  If every class tried to do everything, pretty soon you're
adding solve_my_business_problem_using_SOA() to __builtins__ and nobody
wants that.


> Furthermore it's a waste of resources. I must copy data to another
> object. Luckily in my script I download and handle only little files.
But what if
> a python program must handle big files?

This is exactly why urllib *doesn't* provide seek. Deep down in the
networking library there's a socket with a 8KiB buffer talking to the HTTP
server. No matter how big the file you're getting with urllib, once that
buffer is full the socket starts dropping packets. 

To provide seek(), urllib would need to keep an entire copy of the file
that was retrieved, (or provide mark()/seek(), but those have wildly
different semantics from the seek()s were used to in python, and besides
they're too Java).  This works fine if you're only working with small
files, but you raise a good point: "But what if a python program must
handle big files?".  What about really big files (say a Knoppix DVD ISO)? 
Sure you could use urlretrieve, but what if urlretrive is implemented in
terms of urlopen?

Sure urllib could implement seek (with the same semantics as file.seek())
but that would mean breaking urllib for any resource big enough that you
don't want the whole thing in memory.


>> You can check the type of the response content before you try
>> to uncompress it via the Content-Encoding header of the
>> response

>It's not a generic solution

The point of this suggestion is not that this is the be all and end all
solution, but that code that *needs* seek can probably be rewritten so that
it does not.  Either that or you could implement BufferedReader with the
methods mark() and seek() and wrap the result of urlopen.


--

Comment By: Lucas Malor (lucas_malor)
Date: 2007-04-26 16:41

Message:
Logged In: YES 
user_id=1403274
Originator: YES

In my opinion it's not complicated, it's convoluted. I must use two
object
to handle one data stream.

Furthermore it's a waste of resources. I must copy data to another
object.
Luckily in my script I download and handle only little files. But what if
a
python program must handle big files?

If seek() can't be used (an except is raised), urllib could use a
sequential access method.

--

Comment By: Calvin Spealman (ironfroggy)
Date: 2007-04-26 09:55

Message:
Logged In: YES 
user_id=112166
Originator: NO

I have to agree that this is not a bug. HTTP responses are strings, not
random access files. Adding a seek would have disastrous performance
penalties. If you think the work around is too complicated, I can't
understand why.