[Python-ideas] Expose condition._waiters
Hi guys, I'm writing some code that uses `threading.Condition` and I found that I want to access condition._waiters. I want to do it in two different parts of my code for two different reasons: 1. When shutting down the thread that manages the condition, I want to be sure that there are no waiters on the condition, so I check whether `condition._waiters` is empty before exiting, otherwise I'll let them finish and only then exit. 2. When I do notify_all, I actually want to do as many notify actions as needed until there's a full round of notify_all in which none of the conditions for any of the waiters have been met. Only then do I want my code to continue. (It's because these waiters are waiting for resources that I'm giving them, each wanting a different number of resources, and I want to be sure that all of them are starved before I get more resources for them.) Do you think it'll be a good idea to add non-private functionality like that to threading.Condition? Thanks, Ram. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 12:25:16AM +, Elliot Gorokhovsky wrote: > Regarding generalization: the general technique for special-casing is you > just substitute all type checks with 1 or 0 by applying the type assumption > you're making. That's the only way to guarantee it's safe and compliant. I'm confused -- I don't understand how *removing* type checks can possible guarantee the code is safe and compliant. It's all very well and good when you are running tests that meet your type assumption, but what happens if they don't? If I sort a list made up of (say) mixed int and float (possibly including subclasses), does your "all type checks are 1 or 0" sort segfault? If not, why not? Where's the safety coming from? By the way, your emails in this thread have reminded me of a quote from the late Sir Terry Pratchett's novel "Maskerade" (the odd spelling is intentional): "What sort of person," said Salzella patiently, "sits down and *writes* a maniacal laugh? And all those exclamation marks, you notice? Five? A sure sign of someone who wears his underpants on his head." :-) -- Steve ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On 12 October 2016 at 11:16, Steven D'Aprano wrote: > On Wed, Oct 12, 2016 at 12:25:16AM +, Elliot Gorokhovsky wrote: > >> Regarding generalization: the general technique for special-casing is you >> just substitute all type checks with 1 or 0 by applying the type assumption >> you're making. That's the only way to guarantee it's safe and compliant. > > I'm confused -- I don't understand how *removing* type checks can > possible guarantee the code is safe and compliant. > > It's all very well and good when you are running tests that meet your > type assumption, but what happens if they don't? If I sort a list made > up of (say) mixed int and float (possibly including subclasses), does > your "all type checks are 1 or 0" sort segfault? If not, why not? > Where's the safety coming from? My understanding is that the code does a per-check that all the elements of the list are the same type (float, for example). This is a relatively quick test (O(n) pointer comparisons). If everything *is* a float, then an optimised comparison routine that skips all the type checks and goes straight to a float comparison (single machine op). Because there are more than O(n) comparisons done in a typical sort, this is a win. And because the type checks needed in rich comparison are much more expensive than a pointer check, it's a *big* win. What I'm *not* quite clear on is why Python 3's change to reject comparisons between unrelated types makes this optimisation possible. Surely you have to check either way? It's not that it's a particularly important question - if the optimisation works, it's not that big a deal what triggered the insight. It's just that I'm not sure if there's some other point that I've not properly understood. Paul ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
Hi Martti,
On 11.10.2016 14:42, Martti Kühne wrote:
Hello list
I love the "new" unpacking generalisations as of pep448. And I found
myself using them rather regularly, both with lists and dict.
Today I somehow expected that [*foo for foo in bar] was equivalent to
itertools.chain(*[foo for foo in bar]), which it turned out to be a
SyntaxError.
The dict equivalent of the above might then be something along the
lines of {**v for v in dict_of_dicts.values()}. In case the values
(which be all dicts) are records with the same keys, one might go and
prepend the keys with their former keys using
{
**dict(
("{}_{}".format(k, k_sub), v_sub)
for k_sub, v_sub in v.items()
) for k, v in dict_of_dicts.items()
}
Was anyone able to follow me through this?
Reading PEP448 it seems to me that it's already been considered:
https://www.python.org/dev/peps/pep-0448/#variations
The reason for not-inclusion were about concerns about acceptance
because of "strong concerns about readability" but also received "mild
support". I think your post strengthens the support given that you
"expected it to just work". This shows at least to me that the concerns
about readability/understandability are not justified much.
Personally, I find inclusion of */** expansion for comprehensions very
natural. It would again strengthen the meaning of */** for unpacking
which I am also in favor of.
Cheers,
Sven
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On 12 October 2016 at 21:35, Paul Moore wrote: > What I'm *not* quite clear on is why Python 3's change to reject > comparisons between unrelated types makes this optimisation possible. > Surely you have to check either way? It's not that it's a particularly > important question - if the optimisation works, it's not that big a > deal what triggered the insight. It's just that I'm not sure if > there's some other point that I've not properly understood. It's probably more relevant that cmp() went away, since that simplified the comparison logic to just PyObject_RichCompareBool, without the custom comparison function path. It *might* have still been possible to do something like this in the Py2 code (since the main requirement is to do the pre-check for consistent types if the first object is of a known type with an optimised fast path), but I don't know anyone that actually *likes* adding new special cases to already complex code and trying to figure out how to test whether or not they've broken anything :) Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
[Paul Moore] > My understanding is that the code does a per-check that all the > elements of the list are the same type (float, for example). This is a > relatively quick test (O(n) pointer comparisons). If everything *is* a > float, then an optimised comparison routine that skips all the type > checks and goes straight to a float comparison (single machine op). That matches my understanding. > Because there are more than O(n) comparisons done in a typical sort, > this is a win. If the types are in fact all the same, it should be a win even for n==2 (at n < 2 no comparisons are done; at n==2 exactly 1 comparison is done): one pointer compare + go-straight-to-C-float-"x And because the type checks needed in rich comparison And layers of function calls. > are much more expensive than a pointer check, it's a *big* win. Bingo :-) > What I'm *not* quite clear on is why Python 3's change to reject > comparisons between unrelated types makes this optimisation possible. It doesn't. It would also apply in Python 2. I simply expect the optimization will pay off more frequently in Python 3 code. For example, in Python 2 I used to create lists with objects of wildly mixed types, and sort them merely to bring objects of the same type next to each other. Things "like that" don't work at all in Python 3. > Surely you have to check either way? It's not that it's a particularly > important question - if the optimisation works, it's not that big a > deal what triggered the insight. It's just that I'm not sure if > there's some other point that I've not properly understood. Well, either your understanding is fine, or we're both confused :-) ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On 12 October 2016 at 23:58, Sven R. Kunze wrote: > Reading PEP448 it seems to me that it's already been considered: > https://www.python.org/dev/peps/pep-0448/#variations > > The reason for not-inclusion were about concerns about acceptance because of > "strong concerns about readability" but also received "mild support". I > think your post strengthens the support given that you "expected it to just > work". This shows at least to me that the concerns about > readability/understandability are not justified much. Readability isn't about "Do some people guess the same semantics for what it would mean?", as when there are only a few plausible interpretations, all the possibilities are going to get a respectable number of folks picking them as reasonable behaviour. Instead, readability is about: - Do people consistently guess the *same* interpretation? - Is that interpretation consistent with other existing uses of the syntax? - Is it more readily comprehensible than existing alternatives, or is it brevity for brevity's sake? This particular proposal fails on the first question (as too many people would expect it to mean the same thing as either "[*expr, for expr in iterable]" or "[*(expr for expr in iterable)]"), but it fails on the other two grounds as well. In most uses of *-unpacking it's adding entries to a comma-delimited sequence, or consuming entries in a comma delimited sequence (the commas are optional in some cases, but they're still part of the relevant contexts). The expansions removed the special casing of functions, and made these capabilities generally available to all sequence definition operations. Comprehensions and generator expressions, by contrast, dispense with the comma delimited format entirely, and instead use a format inspired by mathematical set builder notation (just modified to use keywords and Python expressions rather than symbols and mathematical expressions): https://en.wikipedia.org/wiki/Set-builder_notation#Sets_defined_by_a_predicate However, set builder notation doesn't inherently include the notion of flattening lists-of-lists. Instead, that's a *consumption* operation that happens externally after the initial list-of-lists has been built, and that's exactly how it's currently spelled in Python: "itertools.chain.from_iterable(subiter for subiter in iterable)". Regards, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
[Nick Coghlan] > It's probably more relevant that cmp() went away, since that > simplified the comparison logic to just PyObject_RichCompareBool, > without the custom comparison function path. Well, the current sort is old by now, and was written for Python 2. But it did anticipate that rich comparisons were the future, and deliberately restricted itself to using only "<" (Py_LT) comparisons. So, same as now, only the "<" path needed to be examined. > It *might* have still been possible to do something like this in the > Py2 code (since the main requirement is to do the pre-check for > consistent types if the first object is of a known type with an > optimised fast path), It shouldn't really matter whether it's a known type. For any type, if it's known that all the objects are of that type, that type's tp_richcompare slot can be read up once, and if non-NULL used throughout. That would save several levels of function call per comparison during the sort; although that's not factor-of-3-speedup potential, it should still be a significant win. > but I don't know anyone that actually *likes* adding new special cases > to already complex code and trying to figure out how to test whether > or not they've broken anything :) A nice thing about this one is that special cases are a one-time thing at the start, and don't change anything in the vast bulk of the current sorting code. So when it breaks, it should be pretty easy to figure out why ;-) ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti Kühne wrote: > Hello list > > I love the "new" unpacking generalisations as of pep448. And I found > myself using them rather regularly, both with lists and dict. > Today I somehow expected that [*foo for foo in bar] was equivalent to > itertools.chain(*[foo for foo in bar]), which it turned out to be a > SyntaxError. To me, that's a very strange thing to expect. Why would you expect that unpacking items in a list comprehension would magically lead to extra items in the resulting list? I don't think that makes any sense. Obviously we could program list comprehensions to act that way if we wanted to, but that would not be consistent with the ordinary use of list comprehensions. It would introduce a special case of magical behaviour that people will have to memorise, because it doesn't follow logically from the standard list comprehension design. The fundamental design principle of list comps is that they are equivalent to a for-loop with a single append per loop: [expr for t in iterable] is equivalent to: result = [] for t in iterable: result.append(expr) If I had seen a list comprehension with an unpacked loop variable: [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] I never in a million years would expect that running a list comprehension over a three-item sequence would magically expand to six items: [1, 'a', 2, 'b', 3, 'c'] I would expect that using the unpacking operator would give some sort of error, or *at best*, be a no-op and the result would be: [(1, 'a'), (2, 'b'), (3, 'c')] append() doesn't take multiple arguments, hence a error should be the most obvious result. But if not an error, imagine the tuple unpacked to two arguments 1 and 'a' (on the first iteration), then automatically packed back into a tuple (1, 'a') just as you started with. I think it is a clear, obvious and, most importantly, desirable property of list comprehensions with a single loop that they cannot be longer than the initial iterable that feeds them. They might be shorter, if you use the form [expr for t in iterable if condition] but they cannot be longer. So I'm afraid I cannot understand what reasoning lead you to expect that unpacking would apply this way. Wishful thinking perhaps? -- Steve ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP8 dictionary indenting addition
On Tue, Oct 11, 2016 at 10:24:47PM -0400, Terry Reedy wrote: > >>Heh--not to bikeshed, but my personal preference is to leave the > >>trailing space on the first line. This is because by the time I've > >>started a new line (and possibly have spent time fussing with > >>indentation for the odd cases that my editor doesn't get quite right) > >>I'll have forgotten that I need to start the line with a space :) > > I agree that the first version of the example, with space after 'very', > before the quote, is better. I used to think the same, until I got sick and tired of having my code output strings like: a very very verylong value that continues on the next line I learned the hard way that if I don't put the breaking space at the beginning of the next fragment, I probably wouldn't put it at the end of the previous fragment either. YMMV, I'm just reporting what works for me. -- Steve ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
Steve, you only need to allow multiple arguments to append(), then it makes perfect sense. בתאריך יום ד׳, 12 באוק' 2016, 18:43, מאת Steven D'Aprano < [email protected]>: > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti Kühne wrote: > > Hello list > > > > I love the "new" unpacking generalisations as of pep448. And I found > > myself using them rather regularly, both with lists and dict. > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > SyntaxError. > > To me, that's a very strange thing to expect. Why would you expect that > unpacking items in a list comprehension would magically lead to extra > items in the resulting list? I don't think that makes any sense. > > Obviously we could program list comprehensions to act that way if we > wanted to, but that would not be consistent with the ordinary use of > list comprehensions. It would introduce a special case of magical > behaviour that people will have to memorise, because it doesn't follow > logically from the standard list comprehension design. > > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > I never in a million years would expect that running a list > comprehension over a three-item sequence would magically expand to six > items: > > [1, 'a', 2, 'b', 3, 'c'] > > > I would expect that using the unpacking operator would give some sort > of error, or *at best*, be a no-op and the result would be: > > [(1, 'a'), (2, 'b'), (3, 'c')] > > > append() doesn't take multiple arguments, hence a error should be the > most obvious result. But if not an error, imagine the tuple unpacked to > two arguments 1 and 'a' (on the first iteration), then automatically > packed back into a tuple (1, 'a') just as you started with. > > I think it is a clear, obvious and, most importantly, desirable property > of list comprehensions with a single loop that they cannot be longer > than the initial iterable that feeds them. They might be shorter, if you > use the form > > [expr for t in iterable if condition] > > but they cannot be longer. > > So I'm afraid I cannot understand what reasoning lead you to > expect that unpacking would apply this way. Wishful thinking > perhaps? > > > > > -- > Steve > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Add a method to get the subset of a dictionnary.
Hi all,
It always bothered me to write something like this when i want to strip
keys from a dictionnary in Python:
a = {"foo": 1, "bar": 2, "baz": 3, "foobar": 42}
interesting_keys = ["foo", "bar", "baz"]
b = {k, v for k,v in a.items() if k in interesting_keys}
Wouldn't it be nice to have a syntactic sugar such as:
b = a.subset(interesting_keys)
I find this version more elegant/explicit. But maybe this feature is not
"worth it"
Cheers !
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On 12.10.2016 17:41, Nick Coghlan wrote: This particular proposal fails on the first question (as too many people would expect it to mean the same thing as either "[*expr, for expr in iterable]" or "[*(expr for expr in iterable)]") So, my reasoning would tell me: where have I seen * so far? *args and **kwargs! [...] is just the list constructor. So, putting those two pieces together is quite simple. I expect that Martti's reasoning was similar. Furthermore, your two "interpretations" would yield the very same result as [expr for expr in iterable] which doesn't match with my experience with Python so far; especially when it comes to special characters. They must mean something. So, a simple "no-op" would not match my expectations. but it fails on the other two grounds as well. Here I disagree with you. We use *args all the time, so we know what * does. I don't understand why this should not work in between brackets [...]. Well, it works in between [...] sometimes but not always, to be precise. And that's the problem, I guess. In most uses of *-unpacking it's adding entries to a comma-delimited sequence, or consuming entries in a comma delimited sequence (the commas are optional in some cases, but they're still part of the relevant contexts). The expansions removed the special casing of functions, and made these capabilities generally available to all sequence definition operations. I don't know what you mean by comma-delimited sequence. There are no commas. It's just a list of entries. * adds entries to this list. (At least from my point of view.) Comprehensions ... [are] inspired by mathematical set builder notation. Exactly. Inspired. I don't see any reason not to extend on this idea to make it more useful. "itertools.chain.from_iterable(subiter for subiter in iterable)". I have to admit that need to read that twice to get what it does. But that might just be me. Cheers, Sven ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a method to get the subset of a dictionnary.
Looks like it was discussed before: https://mail.python.org/pipermail/python-ideas/2012-January/013252.html ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] async objects
> On 7 Oct 2016, at 16:18, Nick Coghlan wrote: > > However, if you're running in a context that embeds CPython inside a > larger application (e.g. mod_wsgi inside Apache), then gevent's > assumptions about how the C thread states are managed may be wrong, > and hence you may be in for some "interesting" debugging sessions. The > same goes for any library that implements callbacks that end up > executing a greenlet switch when they weren't expecting it (e.g. while > holding a threading lock - that will protect you from other OS > threads, but not from other greenlets in the same thread) I can speak to this. It’s been my professional experience with gevent that choosing to obtain concurrency by using gevent as opposed to explicit async was a trade-off: we replaced a large amount of drudge work in writing a codebase with async/await pervasively throughout it with a smaller amount of dramatically (10x to 100x times) more intellectually challenging debugging work when unstated assumptions regarding thread-safety and concurrent access were violated. For many developers these trade offs are sensible and reasonable, but we should all remember that there are costs and advantages of most kinds of runtime model. I’m happy to have a language that lets me do all of these things than one that chooses one for me and says “that ought to be good enough for everyone”. Cory ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP8 dictionary indenting addition
Steven D'Aprano writes: > I learned the hard way that if I don't put the breaking space at > the beginning of the next fragment, I probably wouldn't put it at > the end of the previous fragment either. The converse applies in my case, so that actually doesn't matter to me. When I don't put it in, I don't put it in anywhere. What does matter to me is that I rarely make spelling errors (including typos) or omit internal spaces. That means I can get away with not reading strings carefully most of the time, and I don't. But omitted space at the joins of a continued string is frequent, and frequently caught when I'm following skimming down a suite to the next syntactic construct. But spaces at end never will be. Ie, space-at-beginning makes for more effective review for me. YMMV. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
אלעזר writes: > Steve, you only need to allow multiple arguments to append(), then it makes > perfect sense. No, because that would be explicit. Here it's implicit and ambiguous. Specifically, it requires guessing "operator associativity". That is something people have different intuitions about. > > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti Kühne wrote: > > > Hello list > > > > > > I love the "new" unpacking generalisations as of pep448. And I found > > > myself using them rather regularly, both with lists and dict. > > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > > SyntaxError. Which is what I myself would expect, same as *(1, 2) + 3 is a SyntaxError. I could see Nick's interpretation that *foo in such a context would actually mean (*foo,) (i.e., it casts iterables to tuples). I would certainly want [i, j for i, j in [[1, 2], [3, 4]]] to evaluate to [(1, 2), (3, 4)] (if it weren't a SyntaxError). > > To me, that's a very strange thing to expect. Why would you expect that > > unpacking items in a list comprehension would magically lead to extra > > items in the resulting list? I don't think that makes any sense. Well, that's what it does in display syntax for sequences. If you think of a comprehension as a "macro" that expands to display syntax, makes some sense. But as you and Nick point out, comprehensions are real operations, not macros which implicitly construct displays, then evaluate them to get the actual sequence. > > Wishful thinking perhaps? That was unnecessary. I know sometimes I fall into the trap of thinking there really ought to be concise syntax for a "simple" idea, and then making one up rather than looking it up. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
I've followed this discussion some, and every example given so far completely mystifies me and I have no intuition about what they should mean. On Oct 12, 2016 8:43 AM, "Steven D'Aprano" wrote: > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti Kühne wrote: > > Hello list > > > > I love the "new" unpacking generalisations as of pep448. And I found > > myself using them rather regularly, both with lists and dict. > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > SyntaxError. > > To me, that's a very strange thing to expect. Why would you expect that > unpacking items in a list comprehension would magically lead to extra > items in the resulting list? I don't think that makes any sense. > > Obviously we could program list comprehensions to act that way if we > wanted to, but that would not be consistent with the ordinary use of > list comprehensions. It would introduce a special case of magical > behaviour that people will have to memorise, because it doesn't follow > logically from the standard list comprehension design. > > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > I never in a million years would expect that running a list > comprehension over a three-item sequence would magically expand to six > items: > > [1, 'a', 2, 'b', 3, 'c'] > > > I would expect that using the unpacking operator would give some sort > of error, or *at best*, be a no-op and the result would be: > > [(1, 'a'), (2, 'b'), (3, 'c')] > > > append() doesn't take multiple arguments, hence a error should be the > most obvious result. But if not an error, imagine the tuple unpacked to > two arguments 1 and 'a' (on the first iteration), then automatically > packed back into a tuple (1, 'a') just as you started with. > > I think it is a clear, obvious and, most importantly, desirable property > of list comprehensions with a single loop that they cannot be longer > than the initial iterable that feeds them. They might be shorter, if you > use the form > > [expr for t in iterable if condition] > > but they cannot be longer. > > So I'm afraid I cannot understand what reasoning lead you to > expect that unpacking would apply this way. Wishful thinking > perhaps? > > > > > -- > Steve > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
What is the intuition behind [1, *x, 5]? The starred expression is replaced with a comma-separated sequence of its elements. The trailing comma Nick referred to is there, with the rule that [1,, 5] is the same as [1, 5]. All the examples follow this intuition, IIUC. Elazar בתאריך יום ד׳, 12 באוק' 2016, 22:22, מאת David Mertz : > I've followed this discussion some, and every example given so far > completely mystifies me and I have no intuition about what they should mean. > > On Oct 12, 2016 8:43 AM, "Steven D'Aprano" wrote: > > On Tue, Oct 11, 2016 at 02:42:54PM +0200, Martti Kühne wrote: > > Hello list > > > > I love the "new" unpacking generalisations as of pep448. And I found > > myself using them rather regularly, both with lists and dict. > > Today I somehow expected that [*foo for foo in bar] was equivalent to > > itertools.chain(*[foo for foo in bar]), which it turned out to be a > > SyntaxError. > > To me, that's a very strange thing to expect. Why would you expect that > unpacking items in a list comprehension would magically lead to extra > items in the resulting list? I don't think that makes any sense. > > Obviously we could program list comprehensions to act that way if we > wanted to, but that would not be consistent with the ordinary use of > list comprehensions. It would introduce a special case of magical > behaviour that people will have to memorise, because it doesn't follow > logically from the standard list comprehension design. > > The fundamental design principle of list comps is that they are > equivalent to a for-loop with a single append per loop: > > [expr for t in iterable] > > is equivalent to: > > result = [] > for t in iterable: > result.append(expr) > > > If I had seen a list comprehension with an unpacked loop variable: > > [*t for t in [(1, 'a'), (2, 'b'), (3, 'c')]] > > > I never in a million years would expect that running a list > comprehension over a three-item sequence would magically expand to six > items: > > [1, 'a', 2, 'b', 3, 'c'] > > > I would expect that using the unpacking operator would give some sort > of error, or *at best*, be a no-op and the result would be: > > [(1, 'a'), (2, 'b'), (3, 'c')] > > > append() doesn't take multiple arguments, hence a error should be the > most obvious result. But if not an error, imagine the tuple unpacked to > two arguments 1 and 'a' (on the first iteration), then automatically > packed back into a tuple (1, 'a') just as you started with. > > I think it is a clear, obvious and, most importantly, desirable property > of list comprehensions with a single loop that they cannot be longer > than the initial iterable that feeds them. They might be shorter, if you > use the form > > [expr for t in iterable if condition] > > but they cannot be longer. > > So I'm afraid I cannot understand what reasoning lead you to > expect that unpacking would apply this way. Wishful thinking > perhaps? > > > > > -- > Steve > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On Wed, Oct 12, 2016 at 12:38 PM, אלעזר wrote: > What is the intuition behind [1, *x, 5]? The starred expression is > replaced with a comma-separated sequence of its elements. > I've never actually used the `[1, *x, 5]` form. And therefore, of course, I've never taught it either (I teach Python for a living nowadays). I think that syntax already perhaps goes too far, actually; but I can understand it relatively easily by analogy with: a, *b, c = range(10) But the way I think about or explain either of those is "gather the extra items from the sequence." That works in both those contexts. In contrast: >>> *b = range(10) SyntaxError: starred assignment target must be in a list or tuple Since nothing was assigned to a non-unpacked variable, nothing is "extra items" in the same sense. So failure feels right to me. I understand that "convert an iterable to a list" is conceptually available for that line, but we already have `list(it)` around, so it would be redundant and slightly confusing. What seems to be wanted with `[*foo for foo in bar]` is basically just `flatten(bar)`. The latter feels like a better spelling, and the recipes in itertools docs give an implementation already (a one-liner). We do have a possibility of writing this: >>> [(*stuff,) for stuff in [range(-5,-1), range(5)]] [(-5, -4, -3, -2), (0, 1, 2, 3, 4)] That's not flattened, as it should not be. But it is very confusing to have `[(*stuff) for stuff in ...]` behave differently than that. It's much more natural—and much more explicit—to write: >>> [item for seq in [range(-5,-1), range(5)] for item in seq] [-5, -4, -3, -2, 0, 1, 2, 3, 4] ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a method to get the subset of a dictionnary.
That discussion seemed to mostly just conclude that dicts shouldn't have all set operations, and then it kind of just dropped off. No one really argued the subset part. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Oct 12, 2016 11:33 AM, "Riley Banks" wrote: > Looks like it was discussed before: > https://mail.python.org/pipermail/python-ideas/2012-January/013252.html > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On 12 October 2016 at 20:22, David Mertz wrote:
> I've followed this discussion some, and every example given so far
> completely mystifies me and I have no intuition about what they should mean.
Same here.
On 12 October 2016 at 20:38, אלעזר wrote:
> What is the intuition behind [1, *x, 5]? The starred expression is replaced
> with a comma-separated sequence of its elements.
>
> The trailing comma Nick referred to is there, with the rule that [1,, 5] is
> the same as [1, 5].
>
> All the examples follow this intuition, IIUC.
But intuition is precisely that - it's not based on rules, but on
people's instinctive understanding. When evaluating whether something
is intuitive, the *only* thing that matters is what people tell you
they do or don't understand by a given construct. And in this case,
people have been expressing differing interpretations, and confusion.
That says "not intuitive" loud and clear to me.
And yes, I find [1, *x, 5] intuitive. And I can't tell you why I find
it OK, but I find {**x for x in d.items()} non-intuitive. But just
because I can't explain it doesn't mean it's not true, or you can
"change my mind" about how I feel.
Paul
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On Wed, Oct 12, 2016 at 11:26 PM David Mertz wrote: > On Wed, Oct 12, 2016 at 12:38 PM, אלעזר wrote: > > What is the intuition behind [1, *x, 5]? The starred expression is > replaced with a comma-separated sequence of its elements. > > I've never actually used the `[1, *x, 5]` form. And therefore, of course, > I've never taught it either (I teach Python for a living nowadays). I > think that syntax already perhaps goes too far, actually; but I can > understand it relatively easily by analogy with: > > a, *b, c = range(10) > > It's not exactly "analogy" as such - it is the dual notion. Here you are using the "destructor" (functional terminology) but we are talking about "constructors". But nevermind. > But the way I think about or explain either of those is "gather the extra > items from the sequence." That works in both those contexts. In contrast: > > >>> *b = range(10) > SyntaxError: starred assignment target must be in a list or tuple > > Since nothing was assigned to a non-unpacked variable, nothing is "extra > items" in the same sense. So failure feels right to me. I understand that > "convert an iterable to a list" is conceptually available for that line, > but we already have `list(it)` around, so it would be redundant and > slightly confusing. > > But that's not a uniform treatment. It might have good reasons from readability point of view, but it is an explicit exception for the rule. The desired behavior would be equivalent to b = tuple(range(10)) and yes, there are Two Ways To Do It. I would think it should have been prohibited by PEP-8 and not by the compiler. Oh well. What seems to be wanted with `[*foo for foo in bar]` is basically just > `flatten(bar)`. The latter feels like a better spelling, and the recipes > in itertools docs give an implementation already (a one-liner). > > We do have a possibility of writing this: > > >>> [(*stuff,) for stuff in [range(-5,-1), range(5)]] > [(-5, -4, -3, -2), (0, 1, 2, 3, 4)] > > That's not flattened, as it should not be. But it is very confusing to > have `[(*stuff) for stuff in ...]` behave differently than that. It's much > more natural—and much more explicit—to write: > > >>> [item for seq in [range(-5,-1), range(5)] for item in seq] > [-5, -4, -3, -2, 0, 1, 2, 3, 4] > > The distinction between (x) and (x,) is already deep in the language. It has nothing to do with this thread >>> [1, *([2],), 3] [1, [2], 3] >>> [1, *([2]), 3] [1, 2, 3] So there. Just like in this proposal. Elazar. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On 12.10.2016 21:38, אלעזר wrote: What is the intuition behind [1, *x, 5]? The starred expression is replaced with a comma-separated sequence of its elements. The trailing comma Nick referred to is there, with the rule that [1,, 5] is the same as [1, 5]. I have to admit that I have my problems with this "comma-separated sequence" idea. For me, lists are just collections of items. There are no commas involved. I also think that thinking about commas here complicates the matter. What * does, it basically plugs in the items from the starred expression into its surroundings: [*[1,2,3]] = [1,2,3] Let's plug in two lists into its surrounding list: [*[1,2,3], *[1,2,3]] = [1,2,3,1,2,3] So, as the thing goes, it looks like as if * could just work anywhere inside those brackets: [*[1,2,3] for _ in range(3)] = [*[1,2,3], *[1,2,3], *[1,2,3]] = [1,2,3,1,2,3,1,2,3] I have difficulties to understand the problem of understanding the syntax. The * and ** variants just flow naturally whereas the "chain" equivalent is bit "meh". Cheers, Sven ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
To be honest, I don't have a clear picture of what {**x for x in d.items()}
should be. But I do have such picture for
dict(**x for x in many_dictionaries)
Elazar
On Wed, Oct 12, 2016 at 11:37 PM אלעזר wrote:
> On Wed, Oct 12, 2016 at 11:26 PM David Mertz wrote:
>
> On Wed, Oct 12, 2016 at 12:38 PM, אלעזר wrote:
>
> What is the intuition behind [1, *x, 5]? The starred expression is
> replaced with a comma-separated sequence of its elements.
>
> I've never actually used the `[1, *x, 5]` form. And therefore, of course,
> I've never taught it either (I teach Python for a living nowadays). I
> think that syntax already perhaps goes too far, actually; but I can
> understand it relatively easily by analogy with:
>
> a, *b, c = range(10)
>
>
> It's not exactly "analogy" as such - it is the dual notion. Here you are
> using the "destructor" (functional terminology) but we are talking about
> "constructors". But nevermind.
>
>
> But the way I think about or explain either of those is "gather the extra
> items from the sequence." That works in both those contexts. In contrast:
>
> >>> *b = range(10)
> SyntaxError: starred assignment target must be in a list or tuple
>
> Since nothing was assigned to a non-unpacked variable, nothing is "extra
> items" in the same sense. So failure feels right to me. I understand that
> "convert an iterable to a list" is conceptually available for that line,
> but we already have `list(it)` around, so it would be redundant and
> slightly confusing.
>
>
> But that's not a uniform treatment. It might have good reasons from
> readability point of view, but it is an explicit exception for the rule.
> The desired behavior would be equivalent to
>
> b = tuple(range(10))
>
> and yes, there are Two Ways To Do It. I would think it should have been
> prohibited by PEP-8 and not by the compiler. Oh well.
>
> What seems to be wanted with `[*foo for foo in bar]` is basically just
> `flatten(bar)`. The latter feels like a better spelling, and the recipes
> in itertools docs give an implementation already (a one-liner).
>
> We do have a possibility of writing this:
>
> >>> [(*stuff,) for stuff in [range(-5,-1), range(5)]]
> [(-5, -4, -3, -2), (0, 1, 2, 3, 4)]
>
> That's not flattened, as it should not be. But it is very confusing to
> have `[(*stuff) for stuff in ...]` behave differently than that. It's much
> more natural—and much more explicit—to write:
>
> >>> [item for seq in [range(-5,-1), range(5)] for item in seq]
> [-5, -4, -3, -2, 0, 1, 2, 3, 4]
>
>
> The distinction between (x) and (x,) is already deep in the language. It
> has nothing to do with this thread
>
> >>> [1, *([2],), 3]
> [1, [2], 3]
> >>> [1, *([2]), 3]
> [1, 2, 3]
>
> So there. Just like in this proposal.
>
> Elazar.
>
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 9:20 AM Tim Peters wrote: > > What I'm *not* quite clear on is why Python 3's change to reject > > comparisons between unrelated types makes this optimisation possible. > > It doesn't. It would also apply in Python 2. I simply expect the > optimization will pay off more frequently in Python 3 code. For > example, in Python 2 I used to create lists with objects of wildly > mixed types, and sort them merely to bring objects of the same type > next to each other. Things "like that" don't work at all in Python 3. > > > > Surely you have to check either way? It's not that it's a particularly > > important question - if the optimisation works, it's not that big a > > deal what triggered the insight. It's just that I'm not sure if > > there's some other point that I've not properly understood. > Yup. Actually, the initial version of this work was with Python 2. What happened was this: I had posted earlier something along the lines of "hey everybody let's radix sort strings instead of merge sort because that will be more fun ok". And everyone wrote me back "no please don't are you kidding". Tim Peters wrote back "try it but just fyi it's not gonna work". So I set off to try it. I had never used the C API before, but luckily I found some Python 2 documentation that gives an example of subclassing list, so I was able to mostly just copy-paste to get a working list extension module. I then copied over the implementation of listsort. My first question was how expensive python compares are vs C compares. And since python 2 has PyString_AS_STRING, which just gives you a char* pointer to a C string, I went in and replaced PyObject_RichCompareBool with strcmp and did a simple benchmark. And I was just totally blown away; it turns out you get something like a 40-50% improvement (at least on my simple benchmark). So that was the motivation for all this. Actually, if I wrote this for python 2, I might be able to get even better numbers (at least for strings), since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings are strcmp-able, so maybe if we go through and verify all the strings are UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff works to do this safely). My string special case currently just bypasses the typechecks and goes to unicode_compare(), which is still wayyy overkill for the common case of ASCII or Latin-1 strings, since it uses a for loop to go through and check characters, and strcmp uses compiler magic to do it in like, negative time or something. I even PyUnicode_READY the strings before comparing; I'm not sure if that's really necessary, but that's how PyUnicode_Compare does it. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Tue, Oct 11, 2016 at 9:56 PM Nick Coghlan wrote: > Once you get to the point of being able to do performance mentions on > a CPython build with a modified list.sort() implementation, you'll > want to take a look at the modern benchmark suite in > https://github.com/python/performance > Yup, that's the plan. I'm going to implement optimized compares for tuples, then implement this as a CPython build, and then run benchmark suites and write some rigorous benchmarks using perf/timeit. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Proposal for default character representation
Hello all, I want to share my thoughts about syntax improvements regarding character representation in Python. I am new to the list so if such a discussion or a PEP exists already, please let me know. So in short: Currently Python uses hexadecimal notation for characters for input and output. For example let's take a unicode string "абв.txt" (a file named with first three Cyrillic letters). Now printing it we get: u'\u0430\u0431\u0432.txt' So one sees that we have hex numbers here. Same is for typing in the strings which obviously also uses hex. Same is for some parts of the Python documentation, especially those about unicode strings. PROPOSAL: 1. Remove all hex notation from printing functions, typing, documention. So for printing functions leave the hex as an "option", for example for those who feel the need for hex representation, which is strange IMO. 2. Replace it with decimal notation, in this case e.g: u'\u0430\u0431\u0432.txt' becomes u'\u1072\u1073\u1074.txt' and similarly for other cases where raw bytes must be printed/inputed So to summarize: make the decimal notation standard for all cases. I am not going to go deeper, such as what digit amount (leading zeros) to use, since it's quite secondary decision. MOTIVATION: 1. Hex notation is hardly readable. It was not designed with readability in mind, so for reading it is not appropriate system, at least with the current character set, which is a mix of digits and letters (curious who was that wize person who invented such a set?). 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, I hope no need to explain why. So that's it, in short. Feel free to discuss and comment. Regards, Mikhail ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 5:36 AM Paul Moore wrote:
> On 12 October 2016 at 11:16, Steven D'Aprano wrote:
> > On Wed, Oct 12, 2016 at 12:25:16AM +, Elliot Gorokhovsky wrote:
> >
> >> Regarding generalization: the general technique for special-casing is
> you
> >> just substitute all type checks with 1 or 0 by applying the type
> assumption
> >> you're making. That's the only way to guarantee it's safe and compliant.
> >
> > I'm confused -- I don't understand how *removing* type checks can
> > possible guarantee the code is safe and compliant.
> >
> > It's all very well and good when you are running tests that meet your
> > type assumption, but what happens if they don't? If I sort a list made
> > up of (say) mixed int and float (possibly including subclasses), does
> > your "all type checks are 1 or 0" sort segfault? If not, why not?
> > Where's the safety coming from?
>
> My understanding is that the code does a pre-check that all the
> elements of the list are the same type (float, for example). This is a
> relatively quick test (O(n) pointer comparisons).
Yes, that's correct. I'd like to emphasize that I'm not "*removing* type
checks" -- I'm checking them in advance, and then substituting in the
values I already know are correct. To put it rigorously: there are
expressions of the form PyWhatever_Check. I can be eager or lazy about how
I calculate these. The current implementation is lazy: it waits until the
values are actually called for before calculating them. This is expensive,
because they are called for many, many times. My implementation is eager: I
calculate all the values in advance, and then if they all happen to be the
same, I plug in that value (1 or 0 as the case may be) wherever it appears
in the code. If they don't happen to all be the same, like for "mixed int
and float", then I just don't change anything and use the default
implementation.
The code for this is really very simple:
int keys_are_all_same_type = 1;
PyTypeObject* key_type = lo.keys[0]->ob_type;
for (i=0; i< saved_ob_size; i++){
if (lo.keys[i]->ob_type != key_type){
keys_are_all_same_type = 0;
break;
}
}
if (keys_are_all_same_type){
if (key_type == &PyUnicode_Type)
compare_function = unsafe_unicode_compare;
if (key_type == &PyLong_Type)
compare_function = unsafe_long_compare;
if (key_type == &PyFloat_Type)
compare_function = unsafe_float_compare;
else
compare_function = key_type->tp_richcompare;
} else {
compare_function = PyObject_RichCompare;
}
Those unsafe_whatever* functions are derived by substituting in, like I
said, the known values for the typechecks (known since
keys_are_all_same_type=1 and key_type = whatever) in the existing
implementations of the compare functions.
Hope everything is clear now!
Elliot
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 2:19 PM, Elliot Gorokhovsky wrote: [...] > So that was the motivation for all this. Actually, if I wrote this for > python 2, I might be able to get even better numbers (at least for strings), > since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings > are strcmp-able, so maybe if we go through and verify all the strings are > UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff > works to do this safely). My string special case currently just bypasses the > typechecks and goes to unicode_compare(), which is still wayyy overkill for > the common case of ASCII or Latin-1 strings, since it uses a for loop to go > through and check characters, and strcmp uses compiler magic to do it in > like, negative time or something. I even PyUnicode_READY the strings before > comparing; I'm not sure if that's really necessary, but that's how > PyUnicode_Compare does it. It looks like PyUnicode_Compare already has a special case to use memcmp when both of the strings fit into latin1: https://github.com/python/cpython/blob/cfc517e6eba37f1bd61d57bf0dbece9843bff9c8/Objects/unicodeobject.c#L10855-L10860 I suppose the for loops that are used for multibyte strings could potentially be sped up with SIMD or something, but that gets complicated fast, and modern compilers might even be doing it already. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
The semantics seem fairly obvious if you treat it as changing the method calls. For lists, * uses .extend() instead of .append(). Sets use * for .update() instead of .add(). Dicts use ** for .update() instead of __setitem__. In that case x should be a mapping (or iterable of pairs maybe), and all pairs in that should be added to the dict. In generator expressions * means yield from instead of just yield. The ** in dicts is needed to distinguish between set and dict comprehensions, since it doesn't use a colon. Spencer On 13 Oct. 2016, at 6:41 am, אלעזר mailto:[email protected]>> wrote: To be honest, I don't have a clear picture of what {**x for x in d.items()} should be. But I do have such picture for dict(**x for x in many_dictionaries) Elazar ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Thu, Oct 13, 2016 at 8:19 AM, Elliot Gorokhovsky wrote: > > My first question was how expensive python compares are vs C compares. And > since python 2 has PyString_AS_STRING, which just gives you a char* pointer > to a C string, I went in and replaced PyObject_RichCompareBool with strcmp > and did a simple benchmark. And I was just totally blown away; it turns out > you get something like a 40-50% improvement (at least on my simple > benchmark). > > So that was the motivation for all this. Actually, if I wrote this for > python 2, I might be able to get even better numbers (at least for strings), > since we can't use strcmp in python 3. (Actually, I've heard UTF-8 strings > are strcmp-able, so maybe if we go through and verify all the strings are > UTF-8 we can strcmp them? I don't know enough about how PyUnicode stuff > works to do this safely). I'm not sure what you mean by "strcmp-able"; do you mean that the lexical ordering of two Unicode strings is guaranteed to be the same as the byte-wise ordering of their UTF-8 encodings? I don't think that's true, but then, I'm not entirely sure how Python currently sorts strings. Without knowing which language the text represents, it's not possible to sort perfectly. https://en.wikipedia.org/wiki/Collation#Automated_collation """ Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in German dictionaries the word ökonomisch comes between offenbar and olfaktorisch, while Turkish dictionaries treat o and ö as different letters, placing oyun before öbür. """ Which means these lists would already be considered sorted, in their respective languages: rosuav@sikorsky:~$ python3 Python 3.7.0a0 (default:a78446a65b1d+, Sep 29 2016, 02:01:55) [GCC 6.1.1 20160802] on linux Type "help", "copyright", "credits" or "license" for more information. >>> sorted(["offenbar", "ökonomisch", "olfaktorisch"]) ['offenbar', 'olfaktorisch', 'ökonomisch'] >>> sorted(["oyun", "öbür", "parıldıyor"]) ['oyun', 'parıldıyor', 'öbür'] So what's Python doing? Is it a codepoint ordering? ChrisA ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 3:39 PM Nathaniel Smith wrote: > It looks like PyUnicode_Compare already has a special case to use > memcmp when both of the strings fit into latin1: > Wow! That's great! I didn't even try reading through unicode_compare, because I felt I might miss some subtle detail that would break everything. But ya, that's great! Since surely latin1 is the most common use case. So I'll just add a latin1 check in the check-loop, and then I'll have two unsafe_unicode_compare functions. I felt bad about not being able to get the same kind of string performance I had gotten with python2, so this is nice. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
> So what's Python doing? Is it a codepoint ordering? > ...ya...how is the python interpreter supposed to know what language strings are in? There is a unique ordering of unicode strings defined by the unicode standard, AFAIK. If you want to sort by natural language ordering, see here: https://pypi.python.org/pypi/natsort ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 12.10.2016 23:33, Mikhail V wrote: > Hello all, > > I want to share my thoughts about syntax improvements regarding > character representation in Python. > I am new to the list so if such a discussion or a PEP exists already, > please let me know. > > So in short: > > Currently Python uses hexadecimal notation > for characters for input and output. > For example let's take a unicode string "абв.txt" > (a file named with first three Cyrillic letters). > > Now printing it we get: > > u'\u0430\u0431\u0432.txt' Hmm, in Python3, I get: >>> s = "абв.txt" >>> s 'абв.txt' > So one sees that we have hex numbers here. > Same is for typing in the strings which obviously also uses hex. > Same is for some parts of the Python documentation, > especially those about unicode strings. > > PROPOSAL: > 1. Remove all hex notation from printing functions, typing, > documention. > So for printing functions leave the hex as an "option", > for example for those who feel the need for hex representation, > which is strange IMO. > 2. Replace it with decimal notation, in this case e.g: > > u'\u0430\u0431\u0432.txt' becomes > u'\u1072\u1073\u1074.txt' > > and similarly for other cases where raw bytes must be printed/inputed > So to summarize: make the decimal notation standard for all cases. > I am not going to go deeper, such as what digit amount (leading zeros) > to use, since it's quite secondary decision. > > MOTIVATION: > 1. Hex notation is hardly readable. It was not designed with readability > in mind, so for reading it is not appropriate system, at least with the > current character set, which is a mix of digits and letters (curious who > was that wize person who invented such a set?). > 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, > I hope no need to explain why. > > So that's it, in short. > Feel free to discuss and comment. The hex notation for \u is a standard also used in many other programming languages, it's also easier to parse, so I don't think we should change this default. Take e.g. >>> s = "\u123456" >>> s 'ሴ56' With decimal notation, it's not clear where to end parsing the digit notation. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 12 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 10/12/2016 05:33 PM, Mikhail V wrote: Hello all, Hello! New to this list so not sure if I can reply here... :) Now printing it we get: u'\u0430\u0431\u0432.txt' By "printing it", do you mean "this is the string representation"? I would presume printing it would show characters nicely rendered. Does it not for you? and similarly for other cases where raw bytes must be printed/inputed So to summarize: make the decimal notation standard for all cases. I am not going to go deeper, such as what digit amount (leading zeros) to use, since it's quite secondary decision. Since when was decimal notation "standard"? It seems to be quite the opposite. For unicode representations, byte notation seems standard. MOTIVATION: 1. Hex notation is hardly readable. It was not designed with readability in mind, so for reading it is not appropriate system, at least with the current character set, which is a mix of digits and letters (curious who was that wize person who invented such a set?). This is an opinion. I should clarify that for many cases I personally find byte notation much simpler. In this case, I view it as a toss up though for something like utf8-encoded text I would had it if I saw decimal numbers and not bytes. 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, I hope no need to explain why. Still not sure which "mixing" you refer to. So that's it, in short. Feel free to discuss and comment. Regards, Mikhail Cheers, Thomas ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
The comparison methods on Python's str are codepoint-by-codepoint. A neat fact about UTF-8 is that bytewise comparisons on UTF-8 are equivalent to doing codepoint comparisons. But this isn't relevant to Python's str, because Python's str never uses UTF-8. -n On Wed, Oct 12, 2016 at 2:45 PM, Elliot Gorokhovsky wrote: > >> So what's Python doing? Is it a codepoint ordering? > > > ...ya...how is the python interpreter supposed to know what language strings > are in? There is a unique ordering of unicode strings defined by the unicode > standard, AFAIK. > If you want to sort by natural language ordering, see here: > https://pypi.python.org/pypi/natsort > > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a method to get the subset of a dictionnary.
On 10/12/2016 12:06 PM, Enguerrand Pelletier wrote:
Hi all,
It always bothered me to write something like this when i want to strip
keys from a dictionnary in Python:
a = {"foo": 1, "bar": 2, "baz": 3, "foobar": 42}
interesting_keys = ["foo", "bar", "baz"]
If the keys are hashable, this should be a set.
b = {k, v for k,v in a.items() if k in interesting_keys}
Test code before posting. The above is a set comprehension creating a
set of tupes. For a dict, 'k, v' must be 'k:v'.
Wouldn't it be nice to have a syntactic sugar such as:
b = a.subset(interesting_keys)
It is pretty rare for the filter condition to be exactly 'key in
explicit_keys'. If it is, one can directly construct the dict from a
and explict_keys.
b = {k:a[k] for k in interesting_keys}
The syntactic sugar wrapping this would save 6 keypresses.
Interesting_keys can be any iterable. To guarantee no KeyErrors, add
'if k in a'.
--
Terry Jan Reedy
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > But this isn't relevant to Python's str, because Python's str never uses > UTF-8. > Really? I thought in python 3, strings are all unicode... so what encoding do they use, then? ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
I'm -1 on this. Just type "0431 unicode" on your favorite search engine. U+0431 is the codepoint, not whatever digits 0x431 has in decimal. That's a tradition and something external to Python. As a related concern, I think using decimal/octal on raw data is a terrible idea (e.g. On Linux, I always have to re-format the "cmp -l" to really grasp what's going on, changing it to hexadecimal). Decimal notation is hardly readable when we're dealing with stuff designed in base 2 (e.g. due to the visual separation of distinct bytes). How many people use "hexdump" (or any binary file viewer) with decimal output instead of hexadecimal? I agree that mixing representations for the same abstraction (using decimal in some places, hexadecimal in other ones) can be a bad idea. Actually, that makes me believe "decimal unicode codepoint" shouldn't ever appear in string representations. -- Danilo J. S. Bellini --- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap) ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add a method to get the subset of a dictionnary.
On 10/12/2016 5:52 PM, Terry Reedy wrote:
On 10/12/2016 12:06 PM, Enguerrand Pelletier wrote:
b = {k, v for k,v in a.items() if k in interesting_keys}
Test code before posting. The above is a set comprehension creating a
set of tupes.
I should have followed my own advice. The above is a SyntaxError until
'k,v' is wrapped in parens, '(k,v)'.
--
Terry Jan Reedy
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On 12 October 2016 at 22:57, Elliot Gorokhovsky wrote: > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: >> >> But this isn't relevant to Python's str, because Python's str never uses >> UTF-8. > > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? They are stored internally as arrays of code points, 1-byte (0-255) if all code points fit in that range, otherwise 2-byte or if needed 4 byte. See PEP 393 (https://www.python.org/dev/peps/pep-0393/) for details. Paul ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky < [email protected]> wrote: > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > >> But this isn't relevant to Python's str, because Python's str never uses >> UTF-8. >> > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? > No encoding is used. The actual code points are stored as integers of the same size. If all code points are less than 256, they are stored as 8-bit integers (bytes). If some code points are greater or equal to 256 but less than 65536, they are stored as 16-bit integers and so on. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Thu, Oct 13, 2016 at 8:51 AM, Nathaniel Smith wrote: > The comparison methods on Python's str are codepoint-by-codepoint. Thanks, that's what I wasn't sure of. ChrisA ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
Ah. That makes a lot of sense, actually. Anyway, so then Latin1 strings are memcmp-able, and others are not. That's fine; I'll just add a check for that (I think there are already helper functions for this) and then have two special-case string functions. Thanks! On Wed, Oct 12, 2016 at 4:08 PM Alexander Belopolsky < [email protected]> wrote: > > On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky < > [email protected]> wrote: > > On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith wrote: > > But this isn't relevant to Python's str, because Python's str never uses > UTF-8. > > > Really? I thought in python 3, strings are all unicode... so what encoding > do they use, then? > > > No encoding is used. The actual code points are stored as integers of the > same size. If all code points are less than 256, they are stored as 8-bit > integers (bytes). If some code points are greater or equal to 256 but less > than 65536, they are stored as 16-bit integers and so on. > ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On 10/12/2016 5:57 PM, Elliot Gorokhovsky wrote: On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith mailto:[email protected]>> wrote: But this isn't relevant to Python's str, because Python's str never uses UTF-8. Really? I thought in python 3, strings are all unicode... They are ... so what encoding do they use, then? Since 3.3, essentially ascii, latin1, utf-16 without surrogates (ucs2), or utf-32, depending on the hightest codepoint. This is the 'kind' field. If we go this route, I suspect that optimizing string sorting will take some experimentation. If the initial item is str, it might be worthwhile to record the highest 'kind' during the type scan, so that strncmp can be used if all are ascii or latin-1. -- Terry Jan Reedy ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP8 dictionary indenting addition
On 10/12/2016 1:40 PM, Stephen J. Turnbull wrote: Steven D'Aprano writes: > I learned the hard way that if I don't put the breaking space at > the beginning of the next fragment, I probably wouldn't put it at > the end of the previous fragment either. The converse applies in my case, so that actually doesn't matter to me. When I don't put it in, I don't put it in anywhere. What does matter to me is that I rarely make spelling errors (including typos) or omit internal spaces. That means I can get away with not reading strings carefully most of the time, and I don't. But omitted space at the joins of a continued string is frequent, and frequently caught when I'm following skimming down a suite to the next syntactic construct. But spaces at end never will be. Ie, space-at-beginning makes for more effective review for me. YMMV. I think that PEP 8 should not recommend either way. -- Terry Jan Reedy ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 6:14 PM, Elliot Gorokhovsky < [email protected]> wrote: > so then Latin1 strings are memcmp-able, and others are not. No. Strings of the same kind are "memcmp-able" regardless of their kind. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On 2016-10-12 23:34, Alexander Belopolsky wrote: On Wed, Oct 12, 2016 at 6:14 PM, Elliot Gorokhovsky mailto:[email protected]>> wrote: so then Latin1 strings are memcmp-able, and others are not. No. Strings of the same kind are "memcmp-able" regardless of their kind. Surely that's true only if they're big-endian. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 3:34 PM, Alexander Belopolsky wrote: > > On Wed, Oct 12, 2016 at 6:14 PM, Elliot Gorokhovsky > wrote: >> >> so then Latin1 strings are memcmp-able, and others are not. > > > No. Strings of the same kind are "memcmp-able" regardless of their kind. I don't think this is true on little-endian systems. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
Paul Moore wrote: What I'm *not* quite clear on is why Python 3's change to reject comparisons between unrelated types makes this optimisation possible. I think the idea was that it's likely to be *useful* a higher proportion of the time, because Python 3 programmers have to be careful that the types they're sorting are compatible. I'm not sure how true that is -- just because you *could* sort lists containing a random selection of types in Python 2 doesn't necessarily mean it was done often. -- Greg ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
Forgot to reply to all, duping my mesage... On 12 October 2016 at 23:48, M.-A. Lemburg wrote: > Hmm, in Python3, I get: > s = "абв.txt" s > 'абв.txt' I posted output with Python2 and Windows 7 BTW , In Windows 10 'print' won't work in cmd console at all by default with unicode but thats another story, let us not go into that. I think you get my idea right, it is not only about printing. > The hex notation for \u is a standard also used in many other > programming languages, it's also easier to parse, so I don't > think we should change this default. In programming literature it is used often, but let me point out that decimal is THE standard and is much much better standard in sence of readability. And there is no solid reason to use 2 standards at the same time. > > Take e.g. > s = "\u123456" s > 'ሴ56' > > With decimal notation, it's not clear where to end parsing > the digit notation. How it is not clear if the digit amount is fixed? Not very clear what did you mean. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 12 October 2016 at 23:58, Danilo J. S. Bellini wrote: > Decimal notation is hardly > readable when we're dealing with stuff designed in base 2 (e.g. due to the > visual separation of distinct bytes). Hmm what keeps you from separateting the logical units to be represented each by a decimal number? like 001 023 255 ... Do you really think this is less readable than its hex equivalent? Then you are probably working with hex numbers only, but I doubt that. > I agree that mixing representations for the same abstraction (using decimal > in some places, hexadecimal in other ones) can be a bad idea. "Can be"? It is indeed a horrible idea. Also not only for same abstraction but at all. > makes me believe "decimal unicode codepoint" shouldn't ever appear in string > representations. I use this site to look the chars up: http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html PS: that is rather peculiar, three negative replies already but with no strong arguments why it would be bad to stick to decimal only, only some "others do it so" and "tradition" arguments. The "base 2" argument could work at some grade but if stick to this criteria why not speak about octal/quoternary/binary then? Please note, I am talking only about readability _of the character set_ actually. And it is not including your habit issues, but rather is an objective criteria for using this or that character set. And decimal is objectively way more readable than hex standard character set, regardless of how strong your habits are. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 12 October 2016 at 23:50, Thomas Nyberg wrote: > Since when was decimal notation "standard"? Depends on what planet do you live. I live on planet Earth. And you? > opposite. For unicode representations, byte notation seems standard. How does this make it a good idea? Consider unicode table as an array with glyphs. Now the index of the array is suddenly represented in some obscure character set. How this index is other than index of any array or natural number? Think about it... >> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, >> I hope no need to explain why. > > Still not sure which "mixing" you refer to. Still not sure? These two words in brackets. Mixing those two systems. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Wed, Oct 12, 2016 at 4:26 PM Terry Reedy wrote: > I suspect that optimizing string sorting > will take some experimentation. If the initial item is str, it might be > worthwhile to record the highest 'kind' during the type scan, so that > strncmp can be used if all are ascii or latin-1. > My thoughts exactly. One other optimization along these lines: the reason ints don't give quite as shocking results as floats is that comparisons are a bit more expensive: one first has to check that the int would fit in a c long before comparing; if not, then a custom procedure has to be used. However, in practice ints being sorted are almost always smaller in absolute value than 2**32 or whatever. So I think, just as it might pay off to check for latin-1 and use strcmp, it may also pay off to check for fits-in-a-C-long and use a custom function for that case as well, since the performance would be precisely as awesome as the float performance that started this thread: comparisons would just be the cost of pointer dereference plus the cost of C long comparison, i.e. the minimum possible cost. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On Wed, Oct 12, 2016 at 06:32:12PM +0200, Sven R. Kunze wrote: > On 12.10.2016 17:41, Nick Coghlan wrote: > >This particular proposal fails on the first question (as too many > >people would expect it to mean the same thing as either "[*expr, for > >expr in iterable]" or "[*(expr for expr in iterable)]") > > So, my reasoning would tell me: where have I seen * so far? *args and > **kwargs! And multiplication. And sequence unpacking. > [...] is just the list constructor. Also indexing: dict[key] or sequence[item or slice]. The list constructor would be either list(...) or possibly list.__new__. [...] is either a list display: [1, 2, 3, 4] or a list comprehension. They are not the same thing, and they don't work the same way. The only similarity is that they use [ ] as delimiters, just like dict and sequence indexing. That doesn't mean that you can write: mydict[x for x in seq if condition] Not everything with [ ] is the same. > So, putting those two pieces together is quite simple. I don't see that it is simple at all. I don't see any connection between function *args and list comprehension loop variables. > Furthermore, your two "interpretations" would yield the very same result > as [expr for expr in iterable] which doesn't match with my experience > with Python so far; especially when it comes to special characters. They > must mean something. So, a simple "no-op" would not match my expectations. Just because something would otherwise be a no-op doesn't mean that it therefore has to have some magical meaning. Python has a few no-ops which are allowed, or required, by syntax but don't do anything. pass (x) # same as just x +1 # no difference between literals +1 and 1 -0 func((expr for x in iterable)) # redundant parens for generator expr There may be more. > >but it fails on the other two grounds as well. > > Here I disagree with you. We use *args all the time, so we know what * > does. I don't understand why this should not work in between brackets [...]. By this logic, *t should work... everywhere? while *args: try: raise *args except *args: del *args That's not how Python works. Just because syntax is common, doesn't mean it has to work everywhere. We cannot write: for x in import math: ... even though importing is common. *t doesn't work as the expression inside a list comprehension because that's not how list comps work. To make it work requires making this a special case and mapping [expr for t in iterable] to a list append, while [*expr for t in iterable] gets mapped to a list extend. Its okay to want that as a special feature, but understand what you are asking for: you're not asking for some restriction to be lifted, which will then automatically give you the functionality you expect. You're asking for new functionality to be added. Sequence unpacking inside list comprehensions as a way of flattening a sequence is completely new functionality which does not logically follow from the current semantics of comprehensions. > >In most uses of *-unpacking it's adding entries to a comma-delimited > >sequence, or consuming entries in a comma delimited sequence (the > >commas are optional in some cases, but they're still part of the > >relevant contexts). The expansions removed the special casing of > >functions, and made these capabilities generally available to all > >sequence definition operations. > > I don't know what you mean by comma-delimited sequence. There are no > commas. It's just a list of entries. * adds entries to this list. (At > least from my point of view.) Not all points of view are equally valid. -- Steve ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On Wed, Oct 12, 2016 at 04:11:55PM +, אלעזר wrote: > Steve, you only need to allow multiple arguments to append(), then it makes > perfect sense. I think you're missing a step. What will multiple arguments given to append do? There are two obvious possibilities: - collect all the arguments into a tuple, and append the tuple; - duplicate the functionality of list.extend neither of which appeals to me. -- Steve ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: unpacking generalisations for list comprehension
On Thu, Oct 13, 2016 at 2:35 AM Steven D'Aprano wrote: > On Wed, Oct 12, 2016 at 04:11:55PM +, אלעזר wrote: > > > Steve, you only need to allow multiple arguments to append(), then it > makes > > perfect sense. > > I think you're missing a step. What will multiple arguments given to > append do? There are two obvious possibilities: > > - collect all the arguments into a tuple, and append the tuple; > > - duplicate the functionality of list.extend > > > neither of which appeals to me. > The latter, of course. Similar to max(). Not unheard of. Elazar ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V wrote: > On 12 October 2016 at 23:58, Danilo J. S. Bellini > wrote: > >> Decimal notation is hardly >> readable when we're dealing with stuff designed in base 2 (e.g. due to the >> visual separation of distinct bytes). > > Hmm what keeps you from separateting the logical units to be represented each > by a decimal number? like 001 023 255 ... > Do you really think this is less readable than its hex equivalent? > Then you are probably working with hex numbers only, but I doubt that. Way WAY less readable, and I'm comfortable working in both hex and decimal. >> I agree that mixing representations for the same abstraction (using decimal >> in some places, hexadecimal in other ones) can be a bad idea. > "Can be"? It is indeed a horrible idea. Also not only for same abstraction > but at all. > >> makes me believe "decimal unicode codepoint" shouldn't ever appear in string >> representations. > I use this site to look the chars up: > http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html You're the one who's non-standard here. Most of the world uses hex for Unicode codepoints. http://unicode.org/charts/ HTML entities permit either decimal or hex, but other than that, I can't think of any common system that uses decimal for Unicode codepoints in strings. > PS: > that is rather peculiar, three negative replies already but with no strong > arguments why it would be bad to stick to decimal only, only some > "others do it so" and "tradition" arguments. "Others do it so" is actually a very strong argument. If all the rest of the world uses + to mean addition, and Python used + to mean subtraction, it doesn't matter how logical that is, it is *wrong*. Most of the world uses U+201C or "\u201C" to represent a curly double quote; if you us 0x93, you are annoyingly wrong, and if you use 8220, everyone has to do the conversion from that to 201C. Yes, these are all differently-valid standards, but that doesn't make it any less annoying. > Please note, I am talking only about readability _of the character > set_ actually. > And it is not including your habit issues, but rather is an objective > criteria for using this or that character set. > And decimal is objectively way more readable than hex standard character set, > regardless of how strong your habits are. How many decimal digits would you use to denote a single character? Do you have to pad everything to seven digits (\u034 for an ASCII quote)? And if not, how do you mark the end? This is not "objectively more readable" if the only gain is "no A-F" and the loss is "unpredictable length". ChrisA ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 13 October 2016 at 01:50, Chris Angelico wrote: > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V wrote: >> On 12 October 2016 at 23:58, Danilo J. S. Bellini >> wrote: >> >>> Decimal notation is hardly >>> readable when we're dealing with stuff designed in base 2 (e.g. due to the >>> visual separation of distinct bytes). >> >> Hmm what keeps you from separateting the logical units to be represented each >> by a decimal number? like 001 023 255 ... >> Do you really think this is less readable than its hex equivalent? >> Then you are probably working with hex numbers only, but I doubt that. > > Way WAY less readable, and I'm comfortable working in both hex and decimal. Please don't mix the readability and personal habit, which previuos repliers seems to do as well. Those two things has nothing to do with each other. If you are comfortable with old roman numbering system this does not make it readable. And I am NOT comfortable with hex, as well as most people would be glad to use single notation. But some of them think that they are cool because they know several numbering notations ;) But I bet few can actually understand which is more readable. > You're the one who's non-standard here. Most of the world uses hex for > Unicode codepoints. No I am not the one, many people find it silly to use different notations for same thing - index of the element, and they are very right about that. I am not silly, I refuse to use it and luckily I can. Also I know that decimal is more readable than hex so my choice is supportend by the understanding and not simply refusing. > >> PS: >> that is rather peculiar, three negative replies already but with no strong >> arguments why it would be bad to stick to decimal only, only some >> "others do it so" and "tradition" arguments. > > "Others do it so" is actually a very strong argument. If all the rest > of the world uses + to mean addition, and Python used + to mean > subtraction, it doesn't matter how logical that is, it is *wrong*. This actually supports my proposal perfectly, if everyone uses decimal why suddenly use hex for same thing - index of array. I don't see how your analogy contradicts with my proposal, it's rather supporting it. > quote; if you us 0x93, you are annoyingly wrong, Please don't make personal assessments here, I can use whatever I want, moreover I find this notation as silly as using different measurement systems without any reason and within one activity, and in my eyes this is annoyingly wrong and stupid, but I don't call nobody here stupid. But I do want that you could abstract yourself from your habit for a while and talk about what would be better for the future usage. > everyone has to do the conversion from that to 201C. Nobody need to do ANY conversions if use decimal, and as said everything is decimal: numbers, array indexes, ord() function returns decimal, you can imagine more examples so it is not only more readable but also more traditional. > How many decimal digits would you use to denote a single character? for text, three decimal digits would be enough for me personally, and in long perspective when the world's alphabetical garbage will dissapear, two digits would be ok. > you have to pad everything to seven digits (\u034 for an ASCII > quote)? Depends on case, for input - some separator, or padding is also ok, I don't have problems with both. For printing obviously don't show leading zeros, but rather spaces. But as said I find this Unicode only some temporary happening, it will go to history in some future and be used only to study extinct glyphs. Mikhail ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 2016-10-12 18:56, Mikhail V wrote: Please don't mix the readability and personal habit, which previuos repliers seems to do as well. Those two things has nothing to do with each other. You keep saying this, but it's quite incorrect. The usage of decimal notation is itself just a convention, and the only reason it's easy for you (and for many other people) is because you're used to it. If you had grown up using only hexadecimal or binary, you would find decimal awkward. There is nothing objectively better about base 10 than any other place-value numbering system. Decimal is just a habit. Now, it's true that base-10 is at this point effectively universal across human societies, and that gives it a certain claim to primacy. But base-16 (along with base 2) is also quite common in computing contexts. Saying we should dump hex notation because everyone understands decimal is like saying that all signs in Prague should only be printed in English because there are more English speakers in the entire world than Czech speakers. But that ignores the fact that there are more Czech speakers *in Prague*. Likewise, decimal may be more common as an overall numerical notation, but when it comes to referring to Unicode code points, hexadecimal is far and away more common. Just look at the Wikipedia page for Unicode, which says: "Normally a Unicode code point is referred to by writing "U+" followed by its hexadecimal number." That's it. You'll find the same thing on unicode.org. The unicode code point is hardly even a number in the usual sense; it's just a label that identifies the character. If you have an issue with using hex to represent unicode code points, your issue goes way beyond Python, and you need to take it up with the Unicode consortium. (Good luck with that.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On Thu, Oct 13, 2016 at 12:56 PM, Mikhail V wrote: > But as said I find this Unicode only some temporary happening, > it will go to history in some future and be > used only to study extinct glyphs. And what will we be using instead? Morbid curiosity trumping a plonking, for the moment. ChrisA ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On Oct 12, 2016 4:33 PM, "Mikhail V" wrote: > > Hello all, > > *snip* > > PROPOSAL: > 1. Remove all hex notation from printing functions, typing, > documention. > So for printing functions leave the hex as an "option", > for example for those who feel the need for hex representation, > which is strange IMO. > 2. Replace it with decimal notation, in this case e.g: > > u'\u0430\u0431\u0432.txt' becomes > u'\u1072\u1073\u1074.txt' > > and similarly for other cases where raw bytes must be printed/inputed > So to summarize: make the decimal notation standard for all cases. > I am not going to go deeper, such as what digit amount (leading zeros) > to use, since it's quite secondary decision. > If decimal notation isn't used for parsing, only for printing, it would be confusing as heck, but using it for both would break a lot of code in subtle ways (the worst kind of code breakage). > MOTIVATION: > 1. Hex notation is hardly readable. It was not designed with readability > in mind, so for reading it is not appropriate system, at least with the > current character set, which is a mix of digits and letters (curious who > was that wize person who invented such a set?). The Unicode standard. I agree that hex is hard to read, but the standard uses it to refer to the code points. It's great to be able to google code points and find the characters easily, and switching to decimal would screw it up. And I've never seen someone *need* to figure out the decimal version from the hex before. It's far more likely to google the hex #. TL;DR: I think this change would induce a LOT of short-term issues, despite it being up in the air if there's any long-term gain. So -1 from me. > 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, > I hope no need to explain why. > Indeed, you don't. :) > So that's it, in short. > Feel free to discuss and comment. > > Regards, > Mikhail > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan (ライアン) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On Oct 12, 2016 9:25 PM, "Chris Angelico" wrote: > > On Thu, Oct 13, 2016 at 12:56 PM, Mikhail V wrote: > > But as said I find this Unicode only some temporary happening, > > it will go to history in some future and be > > used only to study extinct glyphs. > > And what will we be using instead? > Emoji, of course! What else? > Morbid curiosity trumping a plonking, for the moment. > > ChrisA > ___ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Ryan (ライアン) [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 2016-10-13 00:50, Chris Angelico wrote:
On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V wrote:
On 12 October 2016 at 23:58, Danilo J. S. Bellini
wrote:
Decimal notation is hardly
readable when we're dealing with stuff designed in base 2 (e.g. due to the
visual separation of distinct bytes).
Hmm what keeps you from separateting the logical units to be represented each
by a decimal number? like 001 023 255 ...
Do you really think this is less readable than its hex equivalent?
Then you are probably working with hex numbers only, but I doubt that.
Way WAY less readable, and I'm comfortable working in both hex and decimal.
I agree that mixing representations for the same abstraction (using decimal
in some places, hexadecimal in other ones) can be a bad idea.
"Can be"? It is indeed a horrible idea. Also not only for same abstraction
but at all.
makes me believe "decimal unicode codepoint" shouldn't ever appear in string
representations.
I use this site to look the chars up:
http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html
You're the one who's non-standard here. Most of the world uses hex for
Unicode codepoints.
http://unicode.org/charts/
HTML entities permit either decimal or hex, but other than that, I
can't think of any common system that uses decimal for Unicode
codepoints in strings.
PS:
that is rather peculiar, three negative replies already but with no strong
arguments why it would be bad to stick to decimal only, only some
"others do it so" and "tradition" arguments.
"Others do it so" is actually a very strong argument. If all the rest
of the world uses + to mean addition, and Python used + to mean
subtraction, it doesn't matter how logical that is, it is *wrong*.
Most of the world uses U+201C or "\u201C" to represent a curly double
quote; if you us 0x93, you are annoyingly wrong, and if you use 8220,
everyone has to do the conversion from that to 201C. Yes, these are
all differently-valid standards, but that doesn't make it any less
annoying.
Please note, I am talking only about readability _of the character
set_ actually.
And it is not including your habit issues, but rather is an objective
criteria for using this or that character set.
And decimal is objectively way more readable than hex standard character set,
regardless of how strong your habits are.
How many decimal digits would you use to denote a single character? Do
you have to pad everything to seven digits (\u034 for an ASCII
quote)? And if not, how do you mark the end? This is not "objectively
more readable" if the only gain is "no A-F" and the loss is
"unpredictable length".
Well, Perl doesn't have \u or \U; instead it has extended \x, so you can
write, say, \x{201C}.
Still in hex, though, as nature intended! :-)
___
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
> From: Mikhail V > Sent: Wednesday, October 12, 2016 9:57 PM > Subject: Re: [Python-ideas] Proposal for default character representation Hello, and welcome to Python-ideas, where only a small portion of ideas go further, and where most newcomers that wish to improve the language get hit by the reality bat! I hope you enjoy your stay :) > On 13 October 2016 at 01:50, Chris Angelico wrote: > > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V > wrote: > > > > Way WAY less readable, and I'm comfortable working in both hex and > decimal. > > Please don't mix the readability and personal habit, which previuos > repliers seems to do as well. Those two things has nothing > to do with each other. If you are comfortable with old roman numbering > system this does not make it readable. > And I am NOT comfortable with hex, as well as most people would > be glad to use single notation. > But some of them think that they are cool because they know several > numbering notations ;) But I bet few can actually understand which is more > readable. I'll turn your argument around: Not being comfortable with hex does not make it unreadable; it's a matter of habit (as Brendan pointed out in his separate reply). > > You're the one who's non-standard here. Most of the world uses hex for > > Unicode codepoints. > No I am not the one, many people find it silly to use different notations > for same thing - index of the element, and they are very right about that. > I am not silly, I refuse to use it and luckily I can. Also I know that decimal > is more readable than hex so my choice is supportend by the > understanding and not simply refusing. Unicode code points are represented using hex notation virtually everywhere I ever saw it. Your Unicode-code-points-as-decimal website was a new discovery for me (and, I presume, many others on this list). Since it's widely used in the world, going against that effectively makes you non-standard. That doesn't mean it's necessarily a bad thing, but it does mean that your chances (or anyone's chances) of actually changing that are equal to zero (and this isn't some gross exaggeration), > > > >> PS: > >> that is rather peculiar, three negative replies already but with no strong > >> arguments why it would be bad to stick to decimal only, only some > >> "others do it so" and "tradition" arguments. > > > > "Others do it so" is actually a very strong argument. If all the rest > > of the world uses + to mean addition, and Python used + to mean > > subtraction, it doesn't matter how logical that is, it is *wrong*. > > This actually supports my proposal perfectly, if everyone uses decimal > why suddenly use hex for same thing - index of array. I don't see how > your analogy contradicts with my proposal, it's rather supporting it. I fail to see your point here. Where is that "everyone uses decimal"? Unless you stopped talking about representation in strings (which seems likely, as you're talking about indexing?), everything is represented as hex. > But I do want that you could abstract yourself from your habit for a while > and talk about what would be better for the future usage. I'll be that guy and tell you that you need to step back from your own idea for a while and consider your proposal and the current state of things. I'll also take the opportunity to reiterate that there is virtually no chance to change this behaviour. This doesn't, however, prevent you or anyone from talking about the topic, either for fun, or for finding other (related or otherwise) areas of interest that you think might be worth investigating further. A lot of threads actually branch off in different topics that came up when discussing, and that are interesting enough to pursue on their own. > > everyone has to do the conversion from that to 201C. > > Nobody need to do ANY conversions if use decimal, > and as said everything is decimal: numbers, array indexes, > ord() function returns decimal, you can imagine more examples > so it is not only more readable but also more traditional. You're mixing up more than just one concept here: - Integer literals; I assume this is what you meant, and you seem to forget (or maybe you didn't know, in which case here's to learning something new!) that 0xff is perfectly valid syntax, and store the integer with the value of 255 in base 10. - Indexing, and that's completely irrelevant to the topic at hand (also see above bullet point). - ord() which returns an integer (which can be interpreted in any base!), and that's both an argument for and against this proposal; the "against" side is actually that decimal notation has no defined boundary for when to stop (and before you argue that it does, I'll point out that the separations, e.g. grouping by the thousands, are culture-driven and not an international standard). There's actually a precedent for this in Python 2 with the \x escape (need I remind anyone why Python 3 was created again? :), but that's exactly a stone in the "don'
Re: [Python-ideas] Proposal for default character representation
On 13 October 2016 at 04:18, Brendan Barnwell wrote: > On 2016-10-12 18:56, Mikhail V wrote: >> >> Please don't mix the readability and personal habit, which previuos >> repliers seems to do as well. Those two things has nothing >> to do with each other. > > > You keep saying this, but it's quite incorrect. The usage of > decimal notation is itself just a convention, and the only reason it's easy > for you (and for many other people) is because you're used to it. If you > had grown up using only hexadecimal or binary, you would find decimal > awkward. Exactly, but this is not called "readability" but rather "acquired ability to read" or simply habit, which does not reflect the "readability" of the character set itself. > There is nothing objectively better about base 10 than any other > place-value numbering system. Sorry to say, but here you are totally wrong. Not to treat you personally for your fallacy, that is quite common among those who are not familiar with the topic, but you should consider some important points: --- 1. Each taken character set has certain grade of readability which depends solely on the form of its units (aka glyphs). 2. Linear string representation is superior to anything else (spiral, arc, etc.) 3. There exist glyphs which provide maximal readability, those are particular glyphs with particular constant form, and this form is absolutely independent from the encoding subject. 4. According to my personal studies (which does not mean it must be accepted or blindly believed in, but I have solid experience in this area and acting quite successful in it) the amount of this glyphs is less then 10, namely I am by 8 glyphs now. 5. Main measured parameter which reflects the readability (somewhat indirect however) is the pair-wize optical collision of each character pair of the set. This refers somewhat to legibility, or differentiation ability of glyphs. --- Less technically, you can understand it better if you think of your own words "There is nothing objectively better about base 10 than any other place-value numbering system." If this could be ever true than you could read with characters that are very similar to each other or something messy as good as with characters which are easily identifyable, collision resistant and optically consistent. But that is absurd, sorry. For numbers obviously you don't need so many character as for speech encoding, so this means that only those glyphs or even a subset of it should be used. This means anything more than 8 characters is quite worthless for reading numbers. Note that I can't provide here the works currently so don't ask me for that. Some of them would be probably available in near future. Your analogy with speech and signs is not correct because speech is different but numbers are numbers. But also for different speech, same character set must be used namely the one with superior optical qualities, readability. > Saying we should dump hex notation because everyone understands decimal is > like saying that all signs in Prague should only be printed in English We should dump hex notation because currently decimal is simply superiour to hex, just like Mercedes is superior to Lada, aand secondly, because it is more common for ALL people, so it is 2:0 for not using such notation. With that said, I am not against base-16 itself in the first place, but rather against the character set which is simply visually inconsistent and not readable. Someone just took arabic digits and added first latin letters to it. It could be forgiven for a schoolboy's exercises in drawing but I fail to understand how it can be accepted as a working notation for medium supposed to be human readable. Practically all this notation does, it reduces the time before you as a programmer become visual and brain impairments. > Just look at the Wikipedia page for Unicode, which says: "Normally a > Unicode code point is referred to by writing "U+" followed by its > hexadecimal number." That's it. Yeah that's it. And it sucks and migrated to coding standard, sucks twice. If a new syntax/standard is decided, there'll be only positive sides of using decimal vs hex. So nobody'll be hurt, this is only the question of remaking current implementation and is proposed only as a long-term theoretical improvement. > it's just > a label that identifies the character. Ok, but if I write a string filtering in Python for example then obviously I use decimal everywhere to compare index ranges, etc. so what is the use for me of that label? Just redundant conversions back and forth. Makes me sick actually. ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
Mikhail V wrote: And decimal is objectively way more readable than hex standard character set, regardless of how strong your habits are. That depends on what you're trying to read from it. I can look at a hex number and instantly get a mental picture of the bit pattern it represents. I can't do that with decimal numbers. This is the reason hex exists. It's used when the bit pattern represented by a number is more important to know than its numerical value. This is the case with Unicode code points. Their numerical value is irrelevant, but the bit pattern conveys useful information, such as which page and plane it belongs to, whether it fits in 1 or 2 bytes, etc. -- Greg ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
Mikhail V wrote: Consider unicode table as an array with glyphs. You mean like this one? http://unicode-table.com/en/ Unless I've miscounted, that one has the characters arranged in rows of 16, so it would be *harder* to look up a decimal index in it. -- Greg ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 13 October 2016 at 04:49, Emanuel Barry wrote: >> From: Mikhail V >> Sent: Wednesday, October 12, 2016 9:57 PM >> Subject: Re: [Python-ideas] Proposal for default character representation > > Hello, and welcome to Python-ideas, where only a small portion of ideas go > further, and where most newcomers that wish to improve the language get hit > by the reality bat! I hope you enjoy your stay :) Hi, thanks! I enjoy the conversation indeed , never had so much interesting in a discussion actually! > >> On 13 October 2016 at 01:50, Chris Angelico wrote: >> > On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V >> wrote: >> > >> > Way WAY less readable, and I'm comfortable working in both hex and >> decimal. >> >> Please don't mix the readability and personal habit, which previuos >> repliers seems to do as well. Those two things has nothing >> to do with each other. If you are comfortable with old roman numbering >> system this does not make it readable. >> And I am NOT comfortable with hex, as well as most people would >> be glad to use single notation. >> But some of them think that they are cool because they know several >> numbering notations ;) But I bet few can actually understand which is more >> readable. > > I'll turn your argument around: Not being comfortable with hex does not make > it unreadable; it's a matter of habit (as Brendan pointed out in his > separate reply). Matter of habit does not reflect the readability, see my last reply to Brandan. It is quite precise engeneering. And readability it is kind of serious stuff especially if you decide for programming carreer. Young people underestimate it and for oldies it is too late when they realize it :) And Python is all about readability and I like it. As for your other points, I'll need to read it with fresh head tomorrow, Of course I don't believe this would all suddenly happen with Python, or other programming language, it is just an idea anyway. And I do want to learn more actually. Especially want to see some example where it would be really beneficial to use hex, either technically (some low level binary related stuff?) or regarding comprehension, which is to my knowledge hardly possible. > - Indexing, and that's completely irrelevant to the topic at hand (also see > above bullet point). Eee how would I find if the character lies in certain range? With index here I meant it's numeric value, I just called it index for some reason, I don't know why. So its a table - value and corresponding glyph. Just consieder analogy: I make an 3d array, first index is my value, and 2nd 3rd is image pixels, so simply image stack. Why on earth would I use for 1st index some other literals than decimal. Did you see much code written with hex literals? Some low level things probably ... > - ord() which returns an integer (which can be interpreted in any base!), Yes so my idea is to stick to other notations than hex. for low level bit manipulation obviously two-character notation should be used, so again I fail to see something... Mikhail ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Proposal for default character representation
On 13 October 2016 at 08:02, Greg Ewing wrote: > Mikhail V wrote: >> >> Consider unicode table as an array with glyphs. > > > You mean like this one? > > http://unicode-table.com/en/ > > Unless I've miscounted, that one has the characters > arranged in rows of 16, so it would be *harder* to > look up a decimal index in it. > > -- > Greg Nice point finally, I admit, although quite minor. Where the data implies such pagings or alignment, the notation should be (probably) more binary-oriented. But: you claim to see bit patterns in hex numbers? Then I bet you will see them much better if you take binary notation (2 symbols) or quaternary notation (4 symbols), I guarantee. And if you take consistent glyph set for them also you'll see them twice better, also guarantee 100%. So not that the decimal is cool, but hex sucks (too big alphabet) and _the character set_ used for hex optically sucks. That is the point. On the other hand why would unicode glyph table which is to the biggest part a museum of glyphs would be necesserily paged in a binary-friendly manner and not in a decimal friendly manner? But I am not saying it should or not, its quite irrelevant for this particular case I think. Mikhail ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] INSANE FLOAT PERFORMANCE!!!
On Thu, Oct 13, 2016 at 5:17 PM, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > I'm not sure what you mean by "strcmp-able"; do you mean that the > > lexical ordering of two Unicode strings is guaranteed to be the same > > as the byte-wise ordering of their UTF-8 encodings? > > This is definitely not true for the Han characters. In Japanese, the > most commonly used lexical ordering is based on the pronunciation, > meaning that there are few characters (perhaps none) in common use > that has a unique place in lexical ordering (most individual > characters have multiple pronunciations, and even many whole personal > names do). Yeah, and even just with Latin-1 characters, you have (a) non-ASCII characters that sort between ASCII characters, and (b) characters that have different meanings in different languages, and should be sorted differently. So lexicographical ordering is impossible in a generic string sort. ChrisA ___ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
