Re: How much sanity checking is required for function inputs?
On 04/23/2016 06:29 PM, Ian Kelly wrote: Python enums are great. Sadly, they're still not quite as awesome as Java enums. What fun things can Java enums do? -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: How much sanity checking is required for function inputs?
On 04/23/2016 06:21 PM, Michael Selik wrote: On Sat, Apr 23, 2016 at 9:01 PM Christopher Reimer wrote: Hmm... What do we use Enum for? :) You can use Enum in certain circumstances to replace int or str constants. It can help avoid mistyping mistakes and might help your IDE give auto-complete suggestions. I haven't found a good use for them myself, but I'd been mostly stuck in Python 2 until recently. enum34 is the backport, aenum is the turbo charged version. https://pypi.python.org/pypi/enum34 https://pypi.python.org/pypi/aenum -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: A pickle problem!
On 04/21/2016 11:43 PM, Paulo da Silva wrote: class C(pd.DataFrame): Note also that subclassing pandas is not always encouraged: http://pandas.pydata.org/pandas-docs/stable/internals.html#subclassing-pandas-data-structures Cheers, Fabien -- https://mail.python.org/mailman/listinfo/python-list
RE: Remove directory tree without following symlinks
> From: eryk...@gmail.com > Date: Sat, 23 Apr 2016 15:22:35 -0500 > Subject: Re: Remove directory tree without following symlinks > To: python-list@python.org > > On Sat, Apr 23, 2016 at 4:34 AM, Albert-Jan Roskam > wrote: >> >>> From: eryk...@gmail.com >>> Date: Fri, 22 Apr 2016 13:28:01 -0500 >>> On Fri, Apr 22, 2016 at 12:39 PM, Albert-Jan Roskam >>> wrote: FYI, Just today I found out that shutil.rmtree raises a WindowsError if the dir is read-only (or its contents). Using 'ignore_errors', won't help. Sure, no error is raised, but the dir is not deleted either! A 'force' option would be a nice improvement. >>> >>> Use the onerror handler to call os.chmod(path, stat.S_IWRITE). For >>> example, see pip's rmtree_errorhandler: >>> >>> https://github.com/pypa/pip/blob/8.1.1/pip/utils/__init__.py#L105 >> >> Thanks, that looks useful indeed. I thought about os.chmod, but with >> os.walk. That seemed expensive. So I used subprocess.call('rmdir "%s" /s /q' >> % dirname). That's Windows only, of course, but aside of that, is using >> subprocess less preferable? > > I assume you used shell=True in the above call, and not an external > rmdir.exe. There are security concerns with using the shell if you're > not in complete control of the command line. > > As to performance, cmd's rmdir wins without question, not only because > it's implemented in C, but also because it uses the stat data from the > WIN32_FIND_DATA returned by FindFirstFile/FindNextFile to check for > FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_READONLY. > > On the other hand, Python wins when it comes to working with deeply > nested directories. Paths in cmd are limited to MAX_PATH characters. > rmdir uses DOS 8.3 short names (i.e. cAlternateFileName in > WIN32_FIND_DATA), but that could still exceed MAX_PATH for a deeply > nested tree, or the volume may not even have 8.3 DOS filenames. > shutil.rmtree allows you to work around the DOS limit by prefixing the > path with "\\?\". For example: > >>>> subprocess.call(r'rmdir /q/s Z:\Temp\long', shell=True) > The path Z:\Temp\long\aa > > > > a is too long. > 0 > >>>> shutil.rmtree(r'\\?\Z:\Temp\long') >>>> os.path.exists(r'Z:\Temp\long') > False > > Using "\\?\" requires a path that's fully qualified, normalized > (backslash only), and unicode (i.e. decode a Python 2 str). Aww, I kinda forgot about that already, but I came across this last year [1]. Apparently, shutil.rmtree(very_long_path) failed under Win 7, even with the "silly prefix". I believe very_long_path was a Python2-str. It seems useful if shutil or os.path would automatically prefix paths with "\\?\". It is rarely really needed, though. (in my case it was needed to copy a bunch of MS Outlook .msg files, which automatically get the subject line as the filename, and perhaps the first sentence of the mail of the mail has no subject). [1] https://mail.python.org/pipermail/python-list/2015-June/693156.html -- https://mail.python.org/mailman/listinfo/python-list
Re: How much sanity checking is required for function inputs?
On 04/23/2016 06:00 PM, Christopher Reimer wrote: Hmm... What do we use Enum for? :) from enum import Enum class Piece(Enum): king = 'one space, any direction' queen = 'many spaces, any direction' bishop = 'many spaces, diagonal' knight = 'two spaces cardinal, one space sideways, cannot be blocked' rook = 'many spaces, cardinal' pawn = 'first move: one or two spaces forward; subsequent moves: one space forward; attack: one space diagonal' --> list(Piece) [ , , , blocked'>, , moves: one space forward; attack: one space diagonal'>, ] --> p = Piece.bishop --> p in Piece True --> p is Piece.rook False --> p is Piece.bishop True -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On Sun, Apr 24, 2016 at 1:20 AM, Ethan Furman wrote: > On 04/23/2016 06:29 PM, Ian Kelly wrote: > >> Python enums are great. Sadly, they're still not quite as awesome as Java >> enums. > > > What fun things can Java enums do? Everything that Python enums can do, plus: * You can override methods of individual values, not just the class as a whole. Good for implementing the strategy pattern, or for defining a default method implementation that one or two values do differently. In Python you can emulate the same thing by adding the method directly to the instance dict of the enum value, so this isn't really all that much of a difference. * Java doesn't have the hokey notion of enum instances being distinct from their "value". The individual enum members *are* the values. Whereas in Python an enum member is an awkward class instance that contains a value of some other type. Python tries to get away from the C-like notion that enums are ints by making the enum members non-comparable, but then gives us IntEnum as a way to work around it if we really want to. Since Java enums don't depend on any other type for their values, there's nothing inviting the user to treat enums as ints in the first place. * As a consequence of the above, Java doesn't conflate enum values with their parameters. The Python enum docs give us this interesting example of an enum that takes arguments from its declaration: >>> class Planet(Enum): ... MERCURY = (3.303e+23, 2.4397e6) ... VENUS = (4.869e+24, 6.0518e6) ... EARTH = (5.976e+24, 6.37814e6) ... MARS= (6.421e+23, 3.3972e6) ... JUPITER = (1.9e+27, 7.1492e7) ... SATURN = (5.688e+26, 6.0268e7) ... URANUS = (8.686e+25, 2.5559e7) ... NEPTUNE = (1.024e+26, 2.4746e7) ... def __init__(self, mass, radius): ... self.mass = mass # in kilograms ... self.radius = radius # in meters ... @property ... def surface_gravity(self): ... # universal gravitational constant (m3 kg-1 s-2) ... G = 6.67300E-11 ... return G * self.mass / (self.radius * self.radius) ... >>> Planet.EARTH.value (5.976e+24, 6378140.0) >>> Planet.EARTH.surface_gravity 9.802652743337129 This is incredibly useful, but it has a flaw: the value of each member of the enum is just the tuple of its arguments. Suppose we added a value for COUNTER_EARTH describing a hypothetical planet with the same mass and radius existing on the other side of the sun. [1] Then: >>> Planet.EARTH is Planet.COUNTER_EARTH True Because they have the same "value", instead of creating a separate member, COUNTER_EARTH gets defined as an alias for EARTH. To work around this, one would have to add a third argument to the above to pass in an additional value for the sole purpose of distinguishing (or else adapt the AutoNumber recipe to work with this example). This example is a bit contrived since it's generally not likely to come up with floats, but it can easily arise (and in my experience frequently does) when the arguments are of more discrete types. It's notable that the Java enum docs feature this very same example but without this weakness. [2] * Speaking of AutoNumber, since Java enums don't have the instance/value distinction, they effectively do this implicitly, only without generating a bunch of ints that are entirely irrelevant to your enum type. With Python enums you have to follow a somewhat arcane recipe to avoid specifying values, which just generates some values and then hides them away. And it also breaks the Enum alias feature: >>> class Color(AutoNumber): ... red = default = () # not an alias! ... blue = () ... >>> Color.red is Color.default False Anyroad, I think that covers all my beefs with the way enums are implemented in Python. Despite the above, they're a great feature, and I use them and appreciate that we have them. [1] https://en.wikipedia.org/wiki/Counter-Earth [2] https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On 04/24/2016 08:20 AM, Ian Kelly wrote: On Sun, Apr 24, 2016 at 1:20 AM, Ethan Furman wrote: On 04/23/2016 06:29 PM, Ian Kelly wrote: Python enums are great. Sadly, they're still not quite as awesome as Java enums. What fun things can Java enums do? Everything that Python enums can do, plus: * You can override methods of individual values, not just the class as a whole. Good for implementing the strategy pattern, or for defining a default method implementation that one or two values do differently. In Python you can emulate the same thing by adding the method directly to the instance dict of the enum value, so this isn't really all that much of a difference. All non-dunder methods, at least. * Java doesn't have the hokey notion of enum instances being distinct from their "value". The individual enum members *are* the values. Whereas in Python an enum member is an awkward class instance that contains a value of some other type. Python tries to get away from the C-like notion that enums are ints by making the enum members non-comparable, but then gives us IntEnum as a way to work around it if we really want to. Since Java enums don't depend on any other type for their values, there's nothing inviting the user to treat enums as ints in the first place. How does Java share enums with other programs, computers, and/or languages? As far as value-separate-from-instance: if you want/need them to be the same thing, mix-in the type: class Planet(float, Enum): ... [see below for "no-alias" ideas/questions] NB: The enum and the value are still different ('is' fails) but equal. * As a consequence of the above, Java doesn't conflate enum values with their parameters. The Python enum docs give us this interesting example of an enum that takes arguments from its declaration: class Planet(Enum): ... MERCURY = (3.303e+23, 2.4397e6) ... VENUS = (4.869e+24, 6.0518e6) ... EARTH = (5.976e+24, 6.37814e6) ... MARS= (6.421e+23, 3.3972e6) ... JUPITER = (1.9e+27, 7.1492e7) ... SATURN = (5.688e+26, 6.0268e7) ... URANUS = (8.686e+25, 2.5559e7) ... NEPTUNE = (1.024e+26, 2.4746e7) ... def __init__(self, mass, radius): ... self.mass = mass # in kilograms ... self.radius = radius # in meters ... @property ... def surface_gravity(self): ... # universal gravitational constant (m3 kg-1 s-2) ... G = 6.67300E-11 ... return G * self.mass / (self.radius * self.radius) ... Planet.EARTH.value (5.976e+24, 6378140.0) Planet.EARTH.surface_gravity 9.802652743337129 This is incredibly useful, but it has a flaw: the value of each member of the enum is just the tuple of its arguments. Suppose we added a value for COUNTER_EARTH describing a hypothetical planet with the same mass and radius existing on the other side of the sun. [1] Then: Planet.EARTH is Planet.COUNTER_EARTH True Because they have the same "value", instead of creating a separate member, COUNTER_EARTH gets defined as an alias for EARTH. To work around this, one would have to add a third argument to the above to pass in an additional value for the sole purpose of distinguishing (or else adapt the AutoNumber recipe to work with this example). This example is a bit contrived since it's generally not likely to come up with floats, but it can easily arise (and in my experience frequently does) when the arguments are of more discrete types. It's notable that the Java enum docs feature this very same example but without this weakness. [2] One reason for this is that Python enums are lookup-able via the value: >>> Planet(9.80265274333129) Planet.EARTH Do Java enums not have such a feature, or this "feature" totally unnecessary in Java? I could certainly add a "no-alias" feature to aenum. What would be the appropriate value-lookup behaviour in such cases? - return the first match - return a list of matches - raise an error - disable value-lookups for that Enum * Speaking of AutoNumber, since Java enums don't have the instance/value distinction, they effectively do this implicitly, only without generating a bunch of ints that are entirely irrelevant to your enum type. With Python enums you have to follow a somewhat arcane recipe to avoid specifying values, which just generates some values and then hides them away. And it also breaks the Enum alias feature: class Color(AutoNumber): ... red = default = () # not an alias! ... blue = () ... Color.red is Color.default False Unfortunately, the empty tuple tends to be a singleton, so there is no way to tell that red and default are (supposed to be) the same and blue is (supposed to be) different: --> a = b = () --> c = () --> a is b True --> a is c True If you have an idea on how to make that work I am interested. Anyroad, I think that covers all my beefs with the way enums are implemented in Python. Despite the above, they're a great feature, and I use them and appreciate tha
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On Mon, Apr 25, 2016 at 2:04 AM, Ethan Furman wrote: > Unfortunately, the empty tuple tends to be a singleton, so there is no way > to tell that red and default are (supposed to be) the same and blue is > (supposed to be) different: > > --> a = b = () > --> c = () > --> a is b > True > --> a is c > True > > If you have an idea on how to make that work I am interested. Easy: allow an empty list to have the same meaning as an empty tuple. Every time you have [] in your source code, you're guaranteed to get a new (unique) empty list, and then multiple assignment will work. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
On Sun, Apr 24, 2016 at 1:05 PM, Derek Klinge wrote: > I have been writing a python script to explore Euler's Method of > approximating Euler's Number. I was hoping there might be a way to make > this process work faster, as for sufficiently large eulerSteps, the process > below becomes quite slow and sometimes memory intensive. I'm hoping someone > can give me some insight as to how to optimize these algorithms, or ways I > might decrease memory usage. I have been thinking about finding a way > around importing the math module, as it seems a bit unneeded except as an > easy reference. Are you sure memory is the real problem here? (The first problem you have, incidentally, is a formatting one. All your indentation has been lost. Try posting your code again, in a way that doesn't lose leading spaces/tabs, and then we'll be better able to figure out what's going on.) If I'm reading your code correctly, you have two parts: 1) class EulersNumber, which iterates up to some specific count 2) Module-level functions, which progressively increase the count of constructed EulersNumbers. Between them, you appear to have an O(n*n) algorithm for finding a "sufficiently-accurate" representation. You're starting over from nothing every time. If, instead, you were to start from the previous approximation and add another iteration, that ought to be immensely faster. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On 04/24/2016 09:10 AM, Chris Angelico wrote: On Mon, Apr 25, 2016 at 2:04 AM, Ethan Furman wrote: Unfortunately, the empty tuple tends to be a singleton, so there is no way to tell that red and default are (supposed to be) the same and blue is (supposed to be) different: --> a = b = () --> c = () --> a is b True --> a is c True If you have an idea on how to make that work I am interested. Easy: allow an empty list to have the same meaning as an empty tuple. Every time you have [] in your source code, you're guaranteed to get a new (unique) empty list, and then multiple assignment will work. *sigh* Where were you three years ago? ;) Actually, thinking about it a bit more, if we did that then one could not use an empty list as an enum value. Why would one want to? No idea, but to make it nearly impossible I'd want a much better reason than a minor inconvenience: class Numbers: def __init__(self, value=0): self.value = value def __call__(self, value=None): if value is None: value = self.value self.value = value + 1 return value a = Numbers() class SomeNumbers(Enum): one = a() two = a() five = a(5) six = seis = a() One extra character, and done. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On Mon, Apr 25, 2016 at 2:42 AM, Ethan Furman wrote: >> Easy: allow an empty list to have the same meaning as an empty tuple. >> Every time you have [] in your source code, you're guaranteed to get a >> new (unique) empty list, and then multiple assignment will work. > > > *sigh* > > Where were you three years ago? ;) > > Actually, thinking about it a bit more, if we did that then one could not > use an empty list as an enum value. Why would one want to? No idea, but to > make it nearly impossible I'd want a much better reason than a minor > inconvenience: I would normally expect enumerated values to be immutable and hashable, but that isn't actually required by the code AIUI. Under what circumstances is it useful to have mutable enum values? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
Sorry about the code indentation, I was using Pythonista (iOS), and it did not have any problem with that indentation... Here is a new set of the code: ## Write a method to approximate Euler's Number using Euler's Method import math class EulersNumber(): def __init__(self,n): self.eulerSteps = n self.e = self.EulersMethod(self.eulerSteps) def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) return x + h * d def EulersMethod(self, numberOfSteps): # Repeate linear approximation over an even range e = 1 # e**0 = 1 for step in range(numberOfSteps): e = self.linearApproximation(e,1.0/numberOfSteps,e) # if f(x)= e**x, f'(x)=f(x) return e def EulerStepWithGuess(accuracy,guessForN): n = guessForN e = EulersNumber(n) while abs(e.e - math.e) > abs(accuracy): n +=1 e = EulersNumber(n) print('n={} \te= {} \tdelta(e)={}'.format(n,e.e,abs(e.e - math.e))) return e def EulersNumberToAccuracy(PowerOfTen): x = 1 theGuess = 1 thisE = EulersNumber(1) while x <= abs(PowerOfTen): thisE = EulerStepWithGuess(10**(-1*x),theGuess) theGuess = thisE.eulerSteps * 10 x += 1 return thisE My problem is this: my attempt at Euler's Method involves creating a list of numbers that is n long. Is there a way I can iterate over the linear approximation method without creating a list of steps (maybe recursion, I am a bit new at this). Ideally I'd like to perform the linearApproximation method a arbitrary number of times (hopefully >10**10) and keep feeding the answers back into itself to get the new answer. I know this will be computationally time intensive, but how do I minimize memory usage (limit the size of my list)? I also may be misunderstanding the problem, in which case I am open to looking at it from a different perspective. Thanks, Derek On Sun, Apr 24, 2016 at 9:22 AM Chris Angelico wrote: > On Sun, Apr 24, 2016 at 1:05 PM, Derek Klinge > wrote: > > I have been writing a python script to explore Euler's Method of > > approximating Euler's Number. I was hoping there might be a way to make > > this process work faster, as for sufficiently large eulerSteps, the > process > > below becomes quite slow and sometimes memory intensive. I'm hoping > someone > > can give me some insight as to how to optimize these algorithms, or ways > I > > might decrease memory usage. I have been thinking about finding a way > > around importing the math module, as it seems a bit unneeded except as an > > easy reference. > > Are you sure memory is the real problem here? > > (The first problem you have, incidentally, is a formatting one. All > your indentation has been lost. Try posting your code again, in a way > that doesn't lose leading spaces/tabs, and then we'll be better able > to figure out what's going on.) > > If I'm reading your code correctly, you have two parts: > > 1) class EulersNumber, which iterates up to some specific count > 2) Module-level functions, which progressively increase the count of > constructed EulersNumbers. > > Between them, you appear to have an O(n*n) algorithm for finding a > "sufficiently-accurate" representation. You're starting over from > nothing every time. If, instead, you were to start from the previous > approximation and add another iteration, that ought to be immensely > faster. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
I think my e-mail client may be stripping the indentation, here it is with 4-space indentation ## Write a method to approximate Euler's Number using Euler's Method import math class EulersNumber(): def __init__(self,n): self.eulerSteps = n self.e = self.EulersMethod(self.eulerSteps) def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) return x + h * d def EulersMethod(self, numberOfSteps): # Repeate linear approximation over an even range e = 1 # e**0 = 1 for step in range(numberOfSteps): e = self.linearApproximation(e,1.0/numberOfSteps,e) # if f(x)= e**x, f'(x)=f(x) return e def EulerStepWithGuess(accuracy,guessForN): n = guessForN e = EulersNumber(n) while abs(e.e - math.e) > abs(accuracy): n +=1 e = EulersNumber(n) print('n={} \te= {} \tdelta(e)={}'.format(n,e.e,abs(e.e - math.e))) return e def EulersNumberToAccuracy(PowerOfTen): x = 1 theGuess = 1 thisE = EulersNumber(1) while x <= abs(PowerOfTen): thisE = EulerStepWithGuess(10**(-1*x),theGuess) theGuess = thisE.eulerSteps * 10 x += 1 return thisE On Sun, Apr 24, 2016 at 10:02 AM Derek Klinge wrote: > Sorry about the code indentation, I was using Pythonista (iOS), and it did > not have any problem with that indentation... > > Here is a new set of the code: > ## Write a method to approximate Euler's Number using Euler's Method > import math > > class EulersNumber(): > def __init__(self,n): > self.eulerSteps = n > self.e = self.EulersMethod(self.eulerSteps) > def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) > return x + h * d > def EulersMethod(self, numberOfSteps): # Repeate linear approximation over > an even range > e = 1 # e**0 = 1 > for step in range(numberOfSteps): > e = self.linearApproximation(e,1.0/numberOfSteps,e) # if f(x)= e**x, > f'(x)=f(x) > return e > > def EulerStepWithGuess(accuracy,guessForN): > n = guessForN > e = EulersNumber(n) > while abs(e.e - math.e) > abs(accuracy): > n +=1 > e = EulersNumber(n) > print('n={} \te= {} \tdelta(e)={}'.format(n,e.e,abs(e.e - math.e))) > return e > > def EulersNumberToAccuracy(PowerOfTen): > x = 1 > theGuess = 1 > thisE = EulersNumber(1) > while x <= abs(PowerOfTen): > thisE = EulerStepWithGuess(10**(-1*x),theGuess) > theGuess = thisE.eulerSteps * 10 > x += 1 > return thisE > > My problem is this: my attempt at Euler's Method involves creating a list > of numbers that is n long. Is there a way I can iterate over the linear > approximation method without creating a list of steps (maybe recursion, I > am a bit new at this). Ideally I'd like to perform the linearApproximation > method a arbitrary number of times (hopefully >10**10) and keep feeding the > answers back into itself to get the new answer. I know this will be > computationally time intensive, but how do I minimize memory usage (limit > the size of my list)? I also may be misunderstanding the problem, in which > case I am open to looking at it from a different perspective. > > Thanks, > Derek > > On Sun, Apr 24, 2016 at 9:22 AM Chris Angelico wrote: > >> On Sun, Apr 24, 2016 at 1:05 PM, Derek Klinge >> wrote: >> > I have been writing a python script to explore Euler's Method of >> > approximating Euler's Number. I was hoping there might be a way to make >> > this process work faster, as for sufficiently large eulerSteps, the >> process >> > below becomes quite slow and sometimes memory intensive. I'm hoping >> someone >> > can give me some insight as to how to optimize these algorithms, or >> ways I >> > might decrease memory usage. I have been thinking about finding a way >> > around importing the math module, as it seems a bit unneeded except as >> an >> > easy reference. >> >> Are you sure memory is the real problem here? >> >> (The first problem you have, incidentally, is a formatting one. All >> your indentation has been lost. Try posting your code again, in a way >> that doesn't lose leading spaces/tabs, and then we'll be better able >> to figure out what's going on.) >> >> If I'm reading your code correctly, you have two parts: >> >> 1) class EulersNumber, which iterates up to some specific count >> 2) Module-level functions, which progressively increase the count of >> constructed EulersNumbers. >> >> Between them, you appear to have an O(n*n) algorithm for finding a >> "sufficiently-accurate" representation. You're starting over from >> nothing every time. If, instead, you were to start from the previous >> approximation and add another iteration, that ought to be immensely >> faster. >> >> ChrisA >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > -- https://mail.python.org/mailman/listinfo/python-list
Challenge: Shadow lots of built-ins
This is mostly just for the fun of it, but every now and then I have a discussion with people about why it's legal to shadow Python's built-in names, and it'd be handy to have a go-to piece of demo code. So here's the challenge: Write a short, readable block of code that shadows as many built-ins as possible. The rules: 1) The code has to be readable on its own. Doesn't have to be fully functional (it's okay to presume the existence of a back-end database, for instance), but a human should be able to parse it easily. 2) PEP 8, please, for consistency. 3) Code should be Python 3.x compatible. 4) Every shadowed name MUST make sense. You would have to plausibly use this exact same name in some other language. 5) Have fun! Enjoy writing suboptimal code! :) Here's a starter. def zip_all(root): """Compress a directory, skipping dotfiles Returns the created zip file and a list of stuff that got dropped into the bin. """ bin = [] with zipfile.ZipFile("temp.zip", "w") as zip: for root, dirs, files in os.walk("."): for dir in dirs: if dir.startswith("."): dirs.remove(dir) bin.append(os.path.join(root, dir)) for file in files: if not file.startswith("."): zip.write(os.path.join(root, file)) return zip, bin That's only four, and I know you folks can do way better than that! ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
On Mon, Apr 25, 2016 at 3:06 AM, Derek Klinge wrote: > I think my e-mail client may be stripping the indentation, here it is with > 4-space indentation I think it is. Both your reposted versions have indentation lost. You may need to use a different client. My posts come from the Gmail web client and indentation usually comes through just fine (tabs are sometimes lost, but spaces never are). FWIW, I have "Rich Text" disabled - not sure if that makes a difference. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
On Mon, Apr 25, 2016 at 3:02 AM, Derek Klinge wrote: > My problem is this: my attempt at Euler's Method involves creating a list of > numbers that is n long. Is there a way I can iterate over the linear > approximation method without creating a list of steps (maybe recursion, I am > a bit new at this). Ideally I'd like to perform the linearApproximation > method a arbitrary number of times (hopefully >10**10) and keep feeding the > answers back into itself to get the new answer. I know this will be > computationally time intensive, but how do I minimize memory usage (limit > the size of my list)? I also may be misunderstanding the problem, in which > case I am open to looking at it from a different perspective. def EulersMethod(self, numberOfSteps): # Repeate linear approximation over an even range e = 1 # e**0 = 1 for step in range(numberOfSteps): e = self.linearApproximation(e,1.0/numberOfSteps,e) # if f(x)= e**x, f'(x)=f(x) return e This is your code, right? I'm not seeing anywhere in here that creates a list of numbers. It does exactly what you're hoping for: it feeds the answer back to itself for the next step. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On 24/04/2016 17:47, Chris Angelico wrote: On Mon, Apr 25, 2016 at 2:42 AM, Ethan Furman wrote: Easy: allow an empty list to have the same meaning as an empty tuple. Every time you have [] in your source code, you're guaranteed to get a new (unique) empty list, and then multiple assignment will work. *sigh* Where were you three years ago? ;) Actually, thinking about it a bit more, if we did that then one could not use an empty list as an enum value. Why would one want to? No idea, but to make it nearly impossible I'd want a much better reason than a minor inconvenience: I would normally expect enumerated values to be immutable and hashable, And, perhaps, to be actual enumerations. (So that in the set (a,b,c,d), you don't know nor care about the underlying values, except that they are distinct.) -- Bartc -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
Doesn't range(n) create a list n long? On Sun, Apr 24, 2016 at 10:21 AM Chris Angelico wrote: > On Mon, Apr 25, 2016 at 3:02 AM, Derek Klinge > wrote: > > My problem is this: my attempt at Euler's Method involves creating a > list of > > numbers that is n long. Is there a way I can iterate over the linear > > approximation method without creating a list of steps (maybe recursion, > I am > > a bit new at this). Ideally I'd like to perform the linearApproximation > > method a arbitrary number of times (hopefully >10**10) and keep feeding > the > > answers back into itself to get the new answer. I know this will be > > computationally time intensive, but how do I minimize memory usage (limit > > the size of my list)? I also may be misunderstanding the problem, in > which > > case I am open to looking at it from a different perspective. > > def EulersMethod(self, numberOfSteps): # Repeate linear approximation > over an even range > e = 1 # e**0 = 1 > for step in range(numberOfSteps): > e = self.linearApproximation(e,1.0/numberOfSteps,e) # if f(x)= > e**x, f'(x)=f(x) > return e > > This is your code, right? > > I'm not seeing anywhere in here that creates a list of numbers. It > does exactly what you're hoping for: it feeds the answer back to > itself for the next step. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
On Mon, Apr 25, 2016 at 3:56 AM, Derek Klinge wrote: > Doesn't range(n) create a list n long? Not in Python 3. If your code is running on Python 2, use xrange instead of range. I rather doubt that's your problem, though. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On Mon, Apr 25, 2016 at 3:54 AM, BartC wrote: > On 24/04/2016 17:47, Chris Angelico wrote: >> >> On Mon, Apr 25, 2016 at 2:42 AM, Ethan Furman wrote: Easy: allow an empty list to have the same meaning as an empty tuple. Every time you have [] in your source code, you're guaranteed to get a new (unique) empty list, and then multiple assignment will work. >>> >>> >>> >>> *sigh* >>> >>> Where were you three years ago? ;) >>> >>> Actually, thinking about it a bit more, if we did that then one could not >>> use an empty list as an enum value. Why would one want to? No idea, but >>> to >>> make it nearly impossible I'd want a much better reason than a minor >>> inconvenience: >> >> >> I would normally expect enumerated values to be immutable and >> hashable, > > > And, perhaps, to be actual enumerations. (So that in the set (a,b,c,d), you > don't know nor care about the underlying values, except that they are > distinct.) Not necessarily; often, the Python enumeration has to sync up with someone else's, possibly in C. It might not matter that BUTTON_OK is 1 and BUTTON_CANCEL is 2, but you have to make sure that everyone agrees on those meanings. So when you build the Python module, it's mandatory that those values be exactly what they are documented as. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How much sanity checking is required for function inputs?
On Sun, 24 Apr 2016 04:40 pm, Michael Selik wrote: > I think we're giving mixed messages because we're conflating "constants" > and globals that are expected to change. When you talk about "state", that usually means "the current state of the program", not constants. math.pi is not "state". > In our case here, I think two clients in the same process sharing state > might be a feature rather than a bug. Or at least it has the same behavior > as the current implementation. I don't think so. Two clients sharing state is exactly what makes thread programming with shared state so exciting. Suppose you import the decimal module, and set the global context: py> import decimal py> decimal.setcontext(decimal.ExtendedContext) py> decimal.getcontext().prec = 18 py> decimal.Decimal(1)/3 Decimal('0.33') Great. Now a millisecond later you do the same calculation: py> decimal.Decimal(1)/3 Decimal('0.3') WTF just happened here??? The answer is, another client of the module, one you may not even know about, has set the global context: decimal.getcontext().prec = 5 and screwed you over but good. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
from __future__ import print_function
Hi All, I want details explanation(why this statement used,when it can be used,etc) of following statement in python code "from __future__ import print_function" Thanks in advance. -- https://mail.python.org/mailman/listinfo/python-list
Re: from __future__ import print_function
On Sun, Apr 24, 2016 at 2:05 PM, San wrote: > Hi All, > I want details explanation(why this statement used,when it can be used,etc) > of following statement in python code > > "from __future__ import print_function" > > Thanks in advance. > -- > https://mail.python.org/mailman/listinfo/python-list It lets python 2.7 use python 3.x print function instead of the 2.7 print statement. You might like some of the options, and your code will be easier to upgrade to 3.x if you decide to do that -- Joel Goldstick http://joelgoldstick.com/blog http://cc-baseballstats.info/stats/birthdays -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On 04/24/2016 09:47 AM, Chris Angelico wrote: I would normally expect enumerated values to be immutable and hashable, but that isn't actually required by the code AIUI. Under what circumstances is it useful to have mutable enum values? Values can be anything. The names are immutable and hashable. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
On Mon, Apr 25, 2016 at 4:03 AM, Derek Klinge wrote: > Ok, from the gmail web client: Bouncing this back to the list, and removing quote markers for other people's copy/paste convenience. ## Write a method to approximate Euler's Number using Euler's Method import math class EulersNumber(): def __init__(self,n): self.eulerSteps = n self.e= self.EulersMethod(self.eulerSteps) def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) return x + h * d def EulersMethod(self, numberOfSteps): # Repeate linear approximation over an even range e = 1# e**0 = 1 for step in range(numberOfSteps): e = self.linearApproximation(e,1.0/numberOfSteps,e) # if f(x)= e**x, f'(x)=f(x) return e def EulerStepWithGuess(accuracy,guessForN): n = guessForN e = EulersNumber(n) while abs(e.e - math.e) > abs(accuracy): n +=1 e = EulersNumber(n) print('n={} \te= {} \tdelta(e)={}'.format(n,e.e,abs(e.e - math.e))) return e def EulersNumberToAccuracy(PowerOfTen): x = 1 theGuess = 1 thisE = EulersNumber(1) while x <= abs(PowerOfTen): thisE = EulerStepWithGuess(10**(-1*x),theGuess) theGuess = thisE.eulerSteps * 10 x += 1 return thisE > To see an example of my problem try something like EulersNumberToAccuracy(-10) Yep, I see it. I invoked your script as "python3 -i euler.py" and then made that call interactively. It quickly ran through the first few iterations, and then had one CPU core saturated; but at no time did memory usage look too bad. You may be correct in Python 2, though - it started using about 4GB of RAM (not a problem to me - I had about 9GB available when I started it), and then I halted it. The Python 3 version has been running for a few minutes now. n=135914023 e= 2.718281818459972 delta(e)=9.999073125044333e-09 'top' says: PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 7467 rosuav20 0 32432 9072 4844 R 100.0 0.1 3:58.44 python3 In other words, it's saturating one CPU core ("%CPU 100.0"), but its memory usage (VIRT/RES/SHR) is very low. At best, this process can be blamed for 0.1% of memory. Adding these lines to the top makes it behave differently in Python 2: try: range = xrange except NameError: pass The Py3 behaviour won't change, but Py2 should now have the same kind of benefit (the xrange object is an iterable that doesn't need a concrete list of integers). ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On Mon, Apr 25, 2016 at 4:12 AM, Ethan Furman wrote: > On 04/24/2016 09:47 AM, Chris Angelico wrote: > >> I would normally expect enumerated values to be immutable and >> hashable, but that isn't actually required by the code AIUI. Under >> what circumstances is it useful to have mutable enum values? > > > Values can be anything. The names are immutable and hashable. I know they *can* be, because I looked in the docs; but does it make sense to a human? Sure, we can legally do this: >>> class Color(Enum): ... red = 1 ... green = 2 ... blue = 3 ... break_me = [0xA0, 0xF0, 0xC0] ... >>> Color([0xA0, 0xF0, 0xC0]) >>> Color([0xA0, 0xF0, 0xC0]).value.append(1) >>> Color([0xA0, 0xF0, 0xC0]).value.append(1) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/enum.py", line 241, in __call__ return cls.__new__(cls, value) File "/usr/local/lib/python3.6/enum.py", line 476, in __new__ raise ValueError("%r is not a valid %s" % (value, cls.__name__)) ValueError: [160, 240, 192] is not a valid Color but I don't think it's a good thing to ever intentionally do. It's fine for the Enum class to not enforce it (it means you can use arbitrary objects as values, and that's fine), but if you actually do this, then . At some point, we're moving beyond the concept of "enumeration" and settling on "types.SimpleNamespace". ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On 04/24/2016 11:27 AM, Chris Angelico wrote: On Mon, Apr 25, 2016 at 4:12 AM, Ethan Furman wrote: Values can be anything. The names are immutable and hashable. I know they *can* be, because I looked in the docs; but does it make sense to a human? Sure, we can legally do this: Well, not me. ;) --> class Color(Enum): ... red = 1 ... green = 2 ... blue = 3 ... break_me = [0xA0, 0xF0, 0xC0] ... --> Color([0xA0, 0xF0, 0xC0]) --> Color([0xA0, 0xF0, 0xC0]).value.append(1) --> Color([0xA0, 0xF0, 0xC0]).value.append(1) If you are looking up by value, you have to use the current value. Looks like pebkac error to me. ;) At some point, we're moving beyond the concept of "enumeration" and settling on "types.SimpleNamespace". Sure. But like most things in Python I'm not going to enforce it. And if somebody somewhere has a really cool use-case for it, more power to 'em. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: from __future__ import print_function
On 04/24/2016 11:14 AM, Joel Goldstick wrote: On Sun, Apr 24, 2016 at 2:05 PM, San wrote: I want details explanation(why this statement used,when it can be used,etc) of following statement in python code "from __future__ import print_function" It lets python 2.7 use python 3.x print function instead of the 2.7 print statement. You might like some of the options, and your code will be easier to upgrade to 3.x if you decide to do that When it can be used: at the top of a python module; it must be the first executable line (only comments and doc-strings can be before it). -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing Python enums to Java, was: How much sanity checking is required for function inputs?
On Sun, Apr 24, 2016 at 10:04 AM, Ethan Furman wrote: > On 04/24/2016 08:20 AM, Ian Kelly wrote: >> * Java doesn't have the hokey notion of enum instances being distinct >> from their "value". The individual enum members *are* the values. >> Whereas in Python an enum member is an awkward class instance that >> contains a value of some other type. Python tries to get away from the >> C-like notion that enums are ints by making the enum members >> non-comparable, but then gives us IntEnum as a way to work around it >> if we really want to. Since Java enums don't depend on any other type >> for their values, there's nothing inviting the user to treat enums as >> ints in the first place. > > > How does Java share enums with other programs, computers, and/or languages? Java enums are serializable using the name. If you need it to be interoperable with other languages where they're int-based, then you could attach that value as a field. But that would just be data; you wouldn't be making that value an integral part of the Java enum just because some other language does it that way. >> Because they have the same "value", instead of creating a separate >> member, COUNTER_EARTH gets defined as an alias for EARTH. To work >> around this, one would have to add a third argument to the above to >> pass in an additional value for the sole purpose of distinguishing (or >> else adapt the AutoNumber recipe to work with this example). This >> example is a bit contrived since it's generally not likely to come up >> with floats, but it can easily arise (and in my experience frequently >> does) when the arguments are of more discrete types. It's notable that >> the Java enum docs feature this very same example but without this >> weakness. [2] > > > One reason for this is that Python enums are lookup-able via the value: > Planet(9.80265274333129) > Planet.EARTH > > Do Java enums not have such a feature, or this "feature" totally unnecessary > in Java? It's unnecessary. If you want to look up an enum constant by something other than name, you'd provide a static method or mapping. I'd argue that it's unnecessary in Python too for the same reason. But as long as Python enums make a special distinction of their value, there might as well be a built-in way to do it. > I could certainly add a "no-alias" feature to aenum. What would be the > appropriate value-lookup behaviour in such cases? > > - return the first match > - return a list of matches > - raise an error > - disable value-lookups for that Enum Probably the third or fourth, as I think that value lookup would generally not be useful in such cases, and it can be overridden if desired. > Cool. The stdlib Enum (and therefore the enum34 backport) is unlikely to > change much. However, aenum has a few fun things going on, and I'm happy to > add more: > > - NamedTuple (metaclass-based) > - NamedConstant (no aliases, no by-value lookups) > - Enum > - magic auto-numbering > class Number(Enum, auto=True): > one, two, three > def by_seven(self): > return self.value * 7 > - auto-setting of attributes >class Planet(Enum, init='mass radius'): > MERCURY = 3.303e23, 2.4397e6 > EARTH = 5.976e24, 6.37814e6 > NEPTUNE = 1.024e26, 2.4746e7 >--> Planet.EARTH.mass >5.976e24 Neat! -- https://mail.python.org/mailman/listinfo/python-list
Scraping email to make invoice
I would like to write a Pythons script to automate a tedious process and could use some advice. The source content will be an email that has 5-10 PO (purchase order) numbers and information for freelance work done. The target content will be an invoice. (There will be an email like this every week). Right now, the "recommended" way to go (from the company) from source to target is manually copying and pasting all the tedious details of the work done into the invoice. But this is laborious, error-prone...and just begging for automation. There is no human judgment necessary whatsoever in this. I'm comfortable with "scraping" a text file and have written scripts for this, but could use some pointers on other parts of this operation. 1. INPUT: What's the best way to scrape an email like this? The email is to a Gmail account, and the content shows up in the email as a series of basically 6x7 tables (HTML?), one table per PO number/task. I know if the freelancer were to copy and paste the whole set of tables into a text file and save it as plain text, Python could easily scrape that file, but I'd much prefer to save the user those steps. Is there a relatively easy way to go from the Gmail email to generating the invoice directly? (I know there is, but wasn't sure what is state of the art these days). 2. OUPUT: The invoice will have boilerplate content on top and then an Excel table at bottom that is mostly the same information from the source content. Ideally, so that the invoice looks good, the invoice should be a Word document. For the first pass at this, it looked best by laying out the entire invoice in Excel and then copy and pasting it into a Word doc as an image (since otherwise the columns ran over for some reason). In any case, the goal is to create a single page invoice that looks like a clean, professional looking invoice. 3. UI: I am comfortable with making GUI apps, so could use this as the interface for the (somewhat computer-uncomfortable) user. But the less user actions necessary, the better. The emails always come from the same sender, and always have the same boilerplate language ("Below please find your Purchase Order (PO)"), so I'm envisioning a small GUI window with a single button that says "MAKE NEWEST INVOICE" and the user presses it and it automatically searches the user's email for PO # emails and creates the newest invoice. I'm guessing I could keep a sqlite database or flat file on the computer to just track what is meant by "newest", and then the output would have the date created in the file, so the user can be sure what has been invoiced. I'm hoping I can write this in a couple of days. Any suggestions welcome! Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: Remove directory tree without following symlinks
On Sun, Apr 24, 2016 at 5:42 AM, Albert-Jan Roskam wrote: > Aww, I kinda forgot about that already, but I came across this last > year [1]. Apparently, shutil.rmtree(very_long_path) failed under Win 7, > even with the "silly prefix". I believe very_long_path was a > Python2-str. > [1] > https://mail.python.org/pipermail/python-list/2015-June/693156.html Python 2's str branch of the os functions gets implemented on Windows using the [A]NSI API, such as FindFirstFileA and FindNextFileA to implement listdir(). Generally the ANSI API is a light wrapper around the [W]ide-character API. It simply decodes byte strings to UTF-16 and calls the wide-character function (or a common internal function). IIRC, in Windows 7, byte strings are decoded using a per-thread buffer with size MAX_PATH (260), so prefixing the path with "\\?\" won't help. You have to use the wide-character API. Windows 10, on the other hand, decodes using a dynamically allocated buffer, so you can usually get away with using a long byte string. But not with Python 2 os.listdir(), which uses a stack-allocated MAX_PATH+5 buffer in the str branch. For example: Python 2 os.mkdir works: >>> path = os.path.normpath('//?/C:/Temp/long/' + 'a' * 255) >>> os.makedirs(path) but os.listdir requires unicode: >>> os.listdir(path) Traceback (most recent call last): File "", line 1, in TypeError: must be (buffer overflow), not str >>> os.listdir(path.decode('mbcs')) [] Also, the str branch of listdir appends "/*.*", with a forward slash, so it's incompatible with the "\\?\" prefix, even for short paths: >>> os.listdir(r'\\?\C:\Temp') Traceback (most recent call last): File "", line 1, in WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: '?\\C:\\Temp/*.*' > It seems useful if shutil or os.path would automatically prefix paths > with "\\?\". It is rarely really needed, though. (in my case it was > needed to copy a bunch of MS Outlook .msg files, which automatically > get the subject line as the filename, and perhaps the first sentence > of the mail of the mail has no subject). I doubt a change like that would get backported to 2.7. Recently there was a lengthy discussion about adding an __fspath__ protocol to Python 3. Possibly this can be automatically handled in the __fspath__ implementation of pathlib.WindowsPath and the DirEntry type returned by os.scandir. -- https://mail.python.org/mailman/listinfo/python-list
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 54:
m using a python web applic (adagios, a nagios configuration tool). when attempting a certain operation on the client side browser i get the above error. the client side is ubunti 14.04. servers side is debian 8. browser is ff or chrome. both show: echo $LANG en_US.UTF-8 before i dive into the code, r there any OS level things to try? here's the full error traceback: Traceback (most recent call last): File "/opt/adagios/adagios/views.py", line 43, in wrapper result = view_func(request, *args, **kwargs) File "/opt/adagios/adagios/objectbrowser/views.py", line 191, in edit_object c['form'] = PynagForm(pynag_object=my_object, initial=my_object._original_attributes) File "/opt/adagios/adagios/objectbrowser/forms.py", line 312, in __init__ self.fields[field_name] = self.get_pynagField(field_name, css_tag="inherited") File "/opt/adagios/adagios/objectbrowser/forms.py", line 418, in get_pynagField _('%(inherited_value)s (inherited from template)') % {'inherited_value': smart_str(inherited_value)} UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 54: ordinal not in range(128) tnx in advance for any assistance, ams avraham Posts: 2 Joined: Wed Mar 25, 2015 8:58 am Top -- https://mail.python.org/mailman/listinfo/python-list
Re: Scraping email to make invoice
On 04/24/2016 08:58 PM, CM wrote: I would like to write a Pythons script to automate a tedious process and could use some advice. The source content will be an email that has 5-10 PO (purchase order) numbers and information for freelance work done. The target content will be an invoice. (There will be an email like this every week). Right now, the "recommended" way to go (from the company) from source to target is manually copying and pasting all the tedious details of the work done into the invoice. But this is laborious, error-prone...and just begging for automation. There is no human judgment necessary whatsoever in this. I'm comfortable with "scraping" a text file and have written scripts for this, but could use some pointers on other parts of this operation. 1. INPUT: What's the best way to scrape an email like this? The email is to a Gmail account, and the content shows up in the email as a series of basically 6x7 tables (HTML?), one table per PO number/task. I know if the freelancer were to copy and paste the whole set of tables into a text file and save it as plain text, Python could easily scrape that file, but I'd much prefer to save the user those steps. Is there a relatively easy way to go from the Gmail email to generating the invoice directly? (I know there is, but wasn't sure what is state of the art these days). 2. OUPUT: The invoice will have boilerplate content on top and then an Excel table at bottom that is mostly the same information from the source content. Ideally, so that the invoice looks good, the invoice should be a Word document. For the first pass at this, it looked best by laying out the entire invoice in Excel and then copy and pasting it into a Word doc as an image (since otherwise the columns ran over for some reason). In any case, the goal is to create a single page invoice that looks like a clean, professional looking invoice. 3. UI: I am comfortable with making GUI apps, so could use this as the interface for the (somewhat computer-uncomfortable) user. But the less user actions necessary, the better. The emails always come from the same sender, and always have the same boilerplate language ("Below please find your Purchase Order (PO)"), so I'm envisioning a small GUI window with a single button that says "MAKE NEWEST INVOICE" and the user presses it and it automatically searches the user's email for PO # emails and creates the newest invoice. I'm guessing I could keep a sqlite database or flat file on the computer to just track what is meant by "newest", and then the output would have the date created in the file, so the user can be sure what has been invoiced. I'm hoping I can write this in a couple of days. Any suggestions welcome! Thanks. INPUT: What's the best way to scrape an email like this? -- Like what? You need to explain what exactly your input is or show an example. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
On 24 April 2016 at 19:21, Chris Angelico wrote: > On Mon, Apr 25, 2016 at 4:03 AM, Derek Klinge wrote: >> Ok, from the gmail web client: > > Bouncing this back to the list, and removing quote markers for other > people's copy/paste convenience. > > ## Write a method to approximate Euler's Number using Euler's Method > import math > > class EulersNumber(): > def __init__(self,n): > self.eulerSteps = n > self.e= self.EulersMethod(self.eulerSteps) > def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) > return x + h * d > def EulersMethod(self, numberOfSteps): # Repeate linear > approximation over an even range > e = 1# e**0 = 1 > for step in range(numberOfSteps): > e = self.linearApproximation(e,1.0/numberOfSteps,e) # if > f(x)= e**x, f'(x)=f(x) > return e > > > def EulerStepWithGuess(accuracy,guessForN): > n = guessForN > e = EulersNumber(n) > while abs(e.e - math.e) > abs(accuracy): > n +=1 > e = EulersNumber(n) > print('n={} \te= {} \tdelta(e)={}'.format(n,e.e,abs(e.e - math.e))) > return e > > > def EulersNumberToAccuracy(PowerOfTen): > x = 1 > theGuess = 1 > thisE = EulersNumber(1) > while x <= abs(PowerOfTen): > thisE = EulerStepWithGuess(10**(-1*x),theGuess) > theGuess = thisE.eulerSteps * 10 > x += 1 > return thisE > > >> To see an example of my problem try something like >> EulersNumberToAccuracy(-10) Now that I can finally see your code I can see what the problem is. So essentially you want to calculate Euler's number in the following way: e = exp(1) and exp(t) is the solution of the initial value problem with ordinary differential equation dx/dt = x and initial condition x(0)=1. So you're using Euler's method to numerically solve the ODE from t=0 to t=1. Which gives you an estimate for x(1) = exp(1) = e. Euler's method solves this by going in steps from t=0 to t=1 with some step size e.g. dt = 0.1. You get a sequence of values x[n] where x[0] = x(0) = 1 # initial condition x[1] = x[0] + dt*f(x[0]) = x[0] + dt*x[0] x[2] = x[1] + dt*x[1] # etc. In order to get to t=1 in N steps you set dt = 1/N. So simplifying your code (all the classes and functions are just confusing the situation here): N = 1000 dt = 1.0 / N x = 1 for n in range(N): x = x + dt*x print(x) When I run that I get: 2.71692393224 Okay that's great but actually you want to be able to set the accuracy required and then steadily increase N until it's big enough to achieve the expected accuracy so you do this: import math error = 1 accuracy = 1e-2 N = 1 while error > accuracy: dt = 1.0 / N x = 1 for n in range(N): x = x + dt*x error = abs(math.e - x) N += 1 print(x) But what happens here? You have a loop in a loop. The inner loop takes n over N values. The outer loop takes N from 1 up to Nmin where Nmin is the smallest value of N such that we achieve the desired accuracy. This is a classic case of a quadratic performance algorithm. As you make the accuracy smaller you're implicitly increasing Nmin. However the algorithmic performance is quadratic in Nmin i.e. O(Nmin**2). The problem is the nested loops. If you have an outer loop that increases the length of an inner loop by 1 at each step then you have a quadratic algorithm akin to: # This loop is O(M**2) for n in range(N): for N in range(M): # do stuff To see that it is quadratic see: https://en.wikipedia.org/wiki/Triangular_number The simplest fix here is to replace N+=1 with N*=2. Instead of increasing the number of steps by one if the accuracy is not small enough then you should double the number of steps. That will give you an O(Nmin) algorithm. https://en.wikipedia.org/wiki/1/2_%2B_1/4_%2B_1/8_%2B_1/16_%2B_%E2%8B%AF A better method is to do a bit of algebra before putting down the code: x[1] = x[0] + h*x[0] = x[0]*(1+h) = x[0]*(1+1/N) = (1+1/N) x[2] = x[1]*(1+1/N) = (1+1/N)**2 ... x[n] = (1 + 1/n)**n So doing the loop for Euler's method is equivalent to just writing: x = (1 + 1.0/N)**N This considered as a sequence in N is well known as a sequence that converges to e. In fact this is how the number e was first discovered: https://en.wikipedia.org/wiki/E_%28mathematical_constant%29#Compound_interest Python can compute this much quicker than your previous version: N = 1 for _ in range(40): N *= 2 print((1 + 1.0/N) ** N) Which runs instantly and gives: 2.25 2.44140625 2.56578451395 2.63792849737 2.67699012938 2.69734495257 2.70773901969 2.71299162425 2.71563200017 2.71695572947 2.71761848234 2.71795008119 2.71811593627 2.71819887772 2.71824035193 2.7182610899 2.71827145911 2.71827664377 2.71827923611 2.71828053228 2.71828118037 2.71828150441 2.71828166644 2.71828174745 2.71828178795 2.71828180821 2.71828181833 2.7182818234 2.71828182593 2.71828182
asyncio and subprocesses
Is this a bug in the asyncio libraries? This code: ''' proc = yield from asyncio.create_subprocess_exec(*cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env) # read all data from subprocess pipe, copy to nextCoro ln = yield from proc.stdout.read(1024) while ln: yield from nextCoro.process(ln) ln = yield from proc.stdout.read(1024) ''' will throw this exception: Traceback (most recent call last): File "/usr/project/bulk_aio.py", line 52, in db_source ln = yield from proc.stdout.read(1024) File "/usr/lib/python3.4/asyncio/streams.py", line 462, in read self._maybe_resume_transport() File "/usr/lib/python3.4/asyncio/streams.py", line 349, in _maybe_resume_transport self._transport.resume_reading() File "/usr/lib/python3.4/asyncio/unix_events.py", line 364, in resume_reading self._loop.add_reader(self._fileno, self._read_ready) AttributeError: 'NoneType' object has no attribute 'add_reader' The exception always happens at the end of the subprocess's run, in what would be the last read. Whether it happens correlates with the time needed for nextCoro.process. If each iteration takes more than about a millisecond, the exception will be thrown. It *seems* that when the transport loses the pipe connection, it schedules the event loop for removal immediately, and the _loop gets set to None before the data can all be read. This is the sequence in unix_events.py _UnixReadPipeTransport._read_ready. Is there a workaround to avoid this exception? Is this a fixed bug, already? I am using Python 3.4.2 as distributed in Ubuntu Lucid, with built-in asyncio. Thank you. David -- https://mail.python.org/mailman/listinfo/python-list
Re: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 54:
arthur sherman wrote: > m using a python web applic (adagios, a nagios configuration tool). > when attempting a certain operation on the client side browser i get the > above error. the client side is ubunti 14.04. servers side is debian 8. > browser is ff or chrome. both show: > echo $LANG > en_US.UTF-8 > > before i dive into the code, r there any OS level things to try? > here's the full error traceback: > > Traceback (most recent call last): > File "/opt/adagios/adagios/views.py", line 43, in wrapper > result = view_func(request, *args, **kwargs) > File "/opt/adagios/adagios/objectbrowser/views.py", line 191, in > edit_object c['form'] = PynagForm(pynag_object=my_object, > initial=my_object._original_attributes) File > "/opt/adagios/adagios/objectbrowser/forms.py", line 312, in __init__ > self.fields[field_name] = self.get_pynagField(field_name, > css_tag="inherited") File "/opt/adagios/adagios/objectbrowser/forms.py", > line 418, in get_pynagField _('%(inherited_value)s (inherited from > template)') % {'inherited_value': smart_str(inherited_value)} > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 54: > ordinal not in range(128) Probably one of _(...) or smart_str(...) returns unicode and the other returns a non-ascii bytestring: >>> u"%s" % "\xe2" Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128) >>> "\xe2 %s" % u"foo" Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128) > before i dive into the code, r there any OS level things to try? Probably not. If you entered non-ascii text into a form, and can limit yourself to ascii-only then that might be a workaround... -- https://mail.python.org/mailman/listinfo/python-list
Re: Python TUI that will work on DOS/Windows and Unix/Linux
I noticed this trail on Google... if you're still interested, you could try out https://github.com/peterbrittain/asciimatics I ported it to Windows from Linux so exactly the same API works on both. -- https://mail.python.org/mailman/listinfo/python-list
Re: Scraping email to make invoice
On 04/24/2016 12:58 PM, CM wrote: > 1. INPUT: What's the best way to scrape an email like this? The > email is to a Gmail account, and the content shows up in the email as > a series of basically 6x7 tables (HTML?), one table per PO > number/task. I know if the freelancer were to copy and paste the > whole set of tables into a text file and save it as plain text, > Python could easily scrape that file, but I'd much prefer to save the > user those steps. Is there a relatively easy way to go from the Gmail > email to generating the invoice directly? (I know there is, but > wasn't sure what is state of the art these days). I would configure Gmail to allow IMAP access (you'll have to set up a special password for this most likely), and then use an imap library from Python to directly find the relevant messages and access the email message body. If the body is HTML-formatted (sounds like it is) I would use either BeautifulSoup or lxml to parse it and get out the relevant information. > 2. OUPUT: The invoice will have boilerplate content on top and then > an Excel table at bottom that is mostly the same information from > the source content. Ideally, so that the invoice looks good, the > invoice should be a Word document. For the first pass at this, it > looked best by laying out the entire invoice in Excel and then copy > and pasting it into a Word doc as an image (since otherwise the > columns ran over for some reason). In any case, the goal is to create > a single page invoice that looks like a clean, professional looking > invoice. There are several libraries for creating Excel and Word files, especially the XML-based formats, though I have little experience with them. There are also nice libraries for emitting PDF if that would work better. > 3. UI: I am comfortable with making GUI apps, so could use this as > the interface for the (somewhat computer-uncomfortable) user. But > the less user actions necessary, the better. The emails always come > from the same sender, and always have the same boilerplate language > ("Below please find your Purchase Order (PO)"), so I'm envisioning a > small GUI window with a single button that says "MAKE NEWEST > INVOICE" and the user presses it and it automatically searches the > user's email for PO # emails and creates the newest invoice. I'm > guessing I could keep a sqlite database or flat file on the computer > to just track what is meant by "newest", and then the output would > have the date created in the file, so the user can be sure what has > been invoiced. Once you have a working script, your GUI interface would be pretty easy. Though it seems to me that it would be unnecessary. This process sounds like it should just run automatically from a cron job or something. > I'm hoping I can write this in a couple of days. The automated part should be possible, but personally I'd give myself a week. -- https://mail.python.org/mailman/listinfo/python-list
Re: How much sanity checking is required for function inputs?
On Sun, Apr 24, 2016 at 2:08 PM Steven D'Aprano wrote: > On Sun, 24 Apr 2016 04:40 pm, Michael Selik wrote: > > I think we're giving mixed messages because we're conflating > "constants" and globals that are expected to change. > > When you talk about "state", that usually means "the current state of the > program", not constants. math.pi is not "state". > Perhaps I was unclear. You provided an example of what I was trying to point out. -- https://mail.python.org/mailman/listinfo/python-list
Re: Optimizing Memory Allocation in a Simple, but Long Function
Actually, I'm not trying to speed it up, just be able to handle a large number of n. (Thank you Chris for the suggestion to use xrange, I am on a Mac using the stock Python 2.7) I am looking at the number of iterations of linear approximation that are required to get a more accurate representation. My initial data suggest that to get 1 more digit of e (the difference between the calculated and expected value falls under 10**-n), I need a little more than 10 times the number of iterations of linear approximation. I actually intend to compare these to other methods, including limit definition that you provided, as well as the geometric series definition. I am trying to provide some real world data for my students to prove the point that although there are many ways to calculate a value, some are much more efficient than others. I tried your recommendation (Oscar) of trying a (1+1/n)**n approach, which gave me very similar values, but when I took the difference between your method and mine I consistently got differences of ~10**-15. Perhaps this is due the binary representation of the decimals? Also, it seems to me if the goal is to use the smallest value of n to get a particular level of accuracy, changing your guess of N by doubling seems to have a high chance of overshoot. I found that I was able to predict relatively accurately a value of N for achieving a desired accuracy. By this I mean, that I found that if I wanted my to be accurate to one additional decimal place I had to multiply my value of N by approximately 10 (I found that the new N required was always < 10N +10). Derek On Sun, Apr 24, 2016 at 4:45 PM, Derek Klinge wrote: > Actually, I'm not trying to speed it up, just be able to handle a large > number of n. > (Thank you Chris for the suggestion to use xrange, I am on a Mac using the > stock Python 2.7) > > I am looking at the number of iterations of linear approximation that are > required to get a more accurate representation. > My initial data suggest that to get 1 more digit of e (the difference > between the calculated and expected value falls under 10**-n), I need a > little more than 10 times the number of iterations of linear approximation. > > I actually intend to compare these to other methods, including limit > definition that you provided, as well as the geometric series definition. > > I am trying to provide some real world data for my students to prove the > point that although there are many ways to calculate a value, some are much > more efficient than others. > > Derek > > On Sun, Apr 24, 2016 at 2:55 PM, Oscar Benjamin < > oscar.j.benja...@gmail.com> wrote: > >> On 24 April 2016 at 19:21, Chris Angelico wrote: >> > On Mon, Apr 25, 2016 at 4:03 AM, Derek Klinge >> wrote: >> >> Ok, from the gmail web client: >> > >> > Bouncing this back to the list, and removing quote markers for other >> > people's copy/paste convenience. >> > >> > ## Write a method to approximate Euler's Number using Euler's Method >> > import math >> > >> > class EulersNumber(): >> > def __init__(self,n): >> > self.eulerSteps = n >> > self.e= self.EulersMethod(self.eulerSteps) >> > def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) >> > return x + h * d >> > def EulersMethod(self, numberOfSteps): # Repeate linear >> > approximation over an even range >> > e = 1# e**0 >> = 1 >> > for step in range(numberOfSteps): >> > e = self.linearApproximation(e,1.0/numberOfSteps,e) # if >> > f(x)= e**x, f'(x)=f(x) >> > return e >> > >> > >> > def EulerStepWithGuess(accuracy,guessForN): >> > n = guessForN >> > e = EulersNumber(n) >> > while abs(e.e - math.e) > abs(accuracy): >> > n +=1 >> > e = EulersNumber(n) >> > print('n={} \te= {} \tdelta(e)={}'.format(n,e.e,abs(e.e - >> math.e))) >> > return e >> > >> > >> > def EulersNumberToAccuracy(PowerOfTen): >> > x = 1 >> > theGuess = 1 >> > thisE = EulersNumber(1) >> > while x <= abs(PowerOfTen): >> > thisE = EulerStepWithGuess(10**(-1*x),theGuess) >> > theGuess = thisE.eulerSteps * 10 >> > x += 1 >> > return thisE >> > >> > >> >> To see an example of my problem try something like >> EulersNumberToAccuracy(-10) >> >> Now that I can finally see your code I can see what the problem is. So >> essentially you want to calculate Euler's number in the following way: >> >> e = exp(1) and exp(t) is the solution of the initial value problem >> with ordinary differential equation dx/dt = x and initial condition >> x(0)=1. >> >> So you're using Euler's method to numerically solve the ODE from t=0 >> to t=1. Which gives you an estimate for x(1) = exp(1) = e. >> >> Euler's method solves this by going in steps from t=0 to t=1 with some >> step size e.g. dt = 0.1. You get a sequence of values x[n] where >> >>x[0] = x(0) = 1 # initial condition >>x[1] = x[0] +
Re: Optimizing Memory Allocation in a Simple, but Long Function
So I tried the recommended limit approach and got some interesting results. ## Write a method to approximate Euler's Number using Euler's Method import math class EulersNumber(): def __init__(self,n): self.n = n self.e = 2 def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) return x + h * d def EulersMethod(self): # Repeat linear approximation over an even range e = 1 # e**0 = 1 for step in xrange(self.n): e = self.linearApproximation(e,1.0/self.n,e) # if f(x)= e**x, f'(x)=f(x) self.e = e return e def LimitMethod(self): self.e = (1 + 1.0/self.n) ** self.n return self.e def SeriesMethod(self): self.e = sum([1.0/math.factorial(i) for i in range(self.n+1)]) return self.e I found that the pattern of an additional digit of accuracy corresponding to 10*n did not hold as strongly for that value (I can post data if desired). I also got some results that seem to contradict the mathematical definition. For example try EulersNumber(10**15).LimitMethod(), the definition places this limit at e, and yet the (python) answer is >3.035. Please let me know if I've fouled up the implementation somehow. Also my reasoning for writing this up as a class was to be able to get the value of n used to generate that value e. If there is some other way to do that, I'd be happy to try it out. Thanks, Derek Derek On Sun, Apr 24, 2016 at 8:12 PM, Derek Klinge wrote: > Actually, I'm not trying to speed it up, just be able to handle a large > number of n. > (Thank you Chris for the suggestion to use xrange, I am on a Mac using the > stock Python 2.7) > > I am looking at the number of iterations of linear approximation that are > required to get a more accurate representation. > My initial data suggest that to get 1 more digit of e (the difference > between the calculated and expected value falls under 10**-n), I need a > little more than 10 times the number of iterations of linear approximation. > > I actually intend to compare these to other methods, including limit > definition that you provided, as well as the geometric series definition. > > I am trying to provide some real world data for my students to prove the > point that although there are many ways to calculate a value, some are much > more efficient than others. > > I tried your recommendation (Oscar) of trying a (1+1/n)**n approach, which > gave me very similar values, but when I took the difference between your > method and mine I consistently got differences of ~10**-15. Perhaps this is > due the binary representation of the decimals? > > Also, it seems to me if the goal is to use the smallest value of n to get > a particular level of accuracy, changing your guess of N by doubling seems > to have a high chance of overshoot. I found that I was able to predict > relatively accurately a value of N for achieving a desired accuracy. By > this I mean, that I found that if I wanted my to be accurate to one > additional decimal place I had to multiply my value of N by approximately > 10 (I found that the new N required was always < 10N +10). > > Derek > > On Sun, Apr 24, 2016 at 4:45 PM, Derek Klinge > wrote: > >> Actually, I'm not trying to speed it up, just be able to handle a large >> number of n. >> (Thank you Chris for the suggestion to use xrange, I am on a Mac using >> the stock Python 2.7) >> >> I am looking at the number of iterations of linear approximation that are >> required to get a more accurate representation. >> My initial data suggest that to get 1 more digit of e (the difference >> between the calculated and expected value falls under 10**-n), I need a >> little more than 10 times the number of iterations of linear approximation. >> >> I actually intend to compare these to other methods, including limit >> definition that you provided, as well as the geometric series definition. >> >> I am trying to provide some real world data for my students to prove the >> point that although there are many ways to calculate a value, some are much >> more efficient than others. >> >> Derek >> >> On Sun, Apr 24, 2016 at 2:55 PM, Oscar Benjamin < >> oscar.j.benja...@gmail.com> wrote: >> >>> On 24 April 2016 at 19:21, Chris Angelico wrote: >>> > On Mon, Apr 25, 2016 at 4:03 AM, Derek Klinge >>> wrote: >>> >> Ok, from the gmail web client: >>> > >>> > Bouncing this back to the list, and removing quote markers for other >>> > people's copy/paste convenience. >>> > >>> > ## Write a method to approximate Euler's Number using Euler's Method >>> > import math >>> > >>> > class EulersNumber(): >>> > def __init__(self,n): >>> > self.eulerSteps = n >>> > self.e= self.EulersMethod(self.eulerSteps) >>> > def linearApproximation(self,x,h,d): # f(x+h)=f(x)+h*f'(x) >>> > return x + h * d >>> > def EulersMethod(self, numberOfSteps): # Repeate linear >>> > approximation over an even range >>> > e = 1# >>> e**0 = 1 >>> > for step in range(numberOfSteps): >>> >
Re: asyncio and subprocesses
On 4/24/2016 6:07 PM, David wrote: Is this a bug in the asyncio libraries? Is this a fixed bug, already? I am using Python 3.4.2 as distributed in Ubuntu Lucid, with built-in asyncio. The people who patch asyncio do not read this list. Either install a current release or try the tulip release on Pypy (its purpose is to make asyncio available on versions before 3.4 but should also run on 3.4. I don't know if it conflicts with the included asyncio. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: delete from pattern to pattern if it contains match
On Friday, April 22, 2016 at 3:20:53 PM UTC+5:30, Peter Otten wrote: > harirammano...@gmail.com wrote: > > >> @peter yes here it is not xml, but real data is an xml..believe me.. > > > > @peter this is the similar xml i am having, you can correlate. > > > > https://tomcat.apache.org/tomcat-5.5-doc/appdev/web.xml.txt > > This is still too vague. > > If you post the code you actually tried in a small standalone script > together with a small sample xml file that produces the same failure as your > actual data I or someone might help you fix it. yeah peter you are correct, i would have done that atleast by changing the strings, but i wasnt as here its an restricted data and the purpose...so i have taken sample xml data...ofcourse tags are missed.. -- https://mail.python.org/mailman/listinfo/python-list
Re: delete from pattern to pattern if it contains match
On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote: > Peter Otten writes: > > > harirammano...@gmail.com wrote: > > > >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen > >> wrote: > >>> harirammano...@gmail.com writes: > >>> > >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30, > >>> > hariram...@gmail.com wrote: > >>> >> HI All, > >>> >> > >>> >> can you help me out in doing below. > >>> >> > >>> >> file: > >>> >> > >>> >> guava > >>> >> fruit > >>> >> > >>> >> > >>> >> mango > >>> >> fruit > >>> >> > >>> >> > >>> >> orange > >>> >> fruit > >>> >> > >>> >> > >>> >> need to delete from start to end if it contains mango in a file... > >>> >> > >>> >> output should be: > >>> >> > >>> >> > >>> >> guava > >>> >> fruit > >>> >> > >>> >> > >>> >> orange > >>> >> fruit > >>> >> > >>> >> > >>> >> Thank you > >>> > > >>> > any one can guide me ? why xml tree parsing is not working if i have > >>> > root.tag and root.attrib as mentioned in earlier post... > >>> > >>> Assuming the real consists of lines between a start marker and end > >>> marker, a winning plan is to collect a group of lines, deal with it, and > >>> move on. > >>> > >>> The following code implements something close to the plan. You need to > >>> adapt it a bit to have your own source of lines and to restore the end > >>> marker in the output and to account for your real use case and for > >>> differences in taste and judgment. - The plan is as described above, but > >>> there are many ways to implement it. > >>> > >>> from io import StringIO > >>> > >>> text = '''\ > >>> > >>> guava > >>> fruit > >>> > >>> > >>> mango > >>> fruit > >>> > >>> > >>> orange > >>> fruit > >>> > >>> ''' > >>> > >>> def records(source): > >>> current = [] > >>> for line in source: > >>> if line.startswith(''): > >>> yield current > >>> current = [] > >>> else: > >>> current.append(line) > >>> > >>> def hasmango(record): > >>> return any('mango' in it for it in record) > >>> > >>> for record in records(StringIO(text)): > >>> hasmango(record) or print(*record) > >> > >> Hi, > >> > >> not workingthis is the output i am getting... > >> > >> \ > > > > This means that the line > > > >>> text = '''\ > > > > has trailing whitespace in your copy of the script. > > That's a nuisance. I wish otherwise undefined escape sequences in > strings raised an error, similar to a stray space after a line > continuation character. > > >> > >>guava > >> fruit > >> > >> > >>orange > >> fruit > > > > Jussi forgot to add the "..." line to the group. > > I didn't forget. I meant what I said when I said the OP needs to adapt > the code to (among other things) restore the end marker in the output. > If they can't be bothered to do anything at all, it's their problem. > > It was already known that this is not the actual format of the data. > > > To fix this change the generator to > > > > def records(source): > > current = [] > > for line in source: > > current.append(line) > > if line.startswith(''): > > yield current > > current = [] > > Oops, I notice that I forgot to start a new record only on encountering > a '' line. That should probably be done, unless the format is > intended to be exactly a sequence of "\n- -\n\n". > > >>> hasmango(record) or print(*record) > > > > The > > > > print(*record) > > > > inserts spaces between record entries (i. e. at the beginning of all > > lines except the first) and adds a trailing newline. > > Yes, I forgot about the space. Sorry about that. > > The final newline was intentional. Perhaps I should have added the end > marker there instead (given my preference to not drag it together with > the data lines), like so: > >print(*record, sep = "", end = "\n") > > Or so: > >print(*record, sep = "") >print("") > > Or so: > >for line in record: >print(line.rstrip("\n") >else: >print("") > > Or: > >for line in record: >print(line.rstrip("\n") >else: >if record and not record[-1].strip() == "": >print("") > > But all this is beside the point that to deal with the stated problem > one might want to obtain access to a whole record *first*, then check if > it contains "mango" in the intended way (details missing but at least > "mango\n" as a full line counts as an occurrence), and only *then* print > the whole record (if it doesn't contain "mango"). > > I can think of two other ways - one if the data can be accessed only > once - but they seem more complicated to me. Hm, well, if it's XML, as > stated in another branch of this thread and contrary to the form of the > example data in this branch, there's a third way that may be good, but > here I'm responding to a line-oriented format. > > > You can avoid this by specifying the delimiters explicitly: > >