Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-06 Thread Paul Rubin
Nick Mellor writes: > I came across itertools.dropwhile only today, then shortly afterwards > found Raymond Hettinger wondering, in 2007, whether to drop [sic] > dropwhile and takewhile from the itertools module > Almost nobody else of the 18 respondents seemed to be using them. What? I'm am

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-06 Thread Vlastimil Brom
2012/12/6 Neil Cerutti : > On 2012-12-05, Vlastimil Brom wrote: >> ... PARSNIP, certified organic > > I'm not sure on this one. > >> ('PARSNIP', ', certified organic') > > -- > Neil Cerutti > -- Well, I wasn't either, when I noticed this item, but given the specification: "2. Retain punctuation a

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-06 Thread Alexander Blinne
Am 05.12.2012 18:04, schrieb Nick Mellor: > Sample data Well let's see what def split_product(p): p = p.strip() w = p.split(" ") try: j = next(i for i,v in enumerate(w) if v.upper() != v) except StopIteration: return p, '' return " ".join(w[:j]), " ".join(w[j:]

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-06 Thread Neil Cerutti
On 2012-12-05, Vlastimil Brom wrote: > ... PARSNIP, certified organic I'm not sure on this one. > ('PARSNIP', ', certified organic') -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-06 Thread Vlastimil Brom
2012/12/5 Nick Mellor : > Neil, > > Further down the data, found another edge case: > > "Spring ONION from QLD" > > Following the spec, the whole line should be description (description starts > at first word that is not all caps.) This case breaks the latest groupby. > > N > -- > http://mail.pyth

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Neil Cerutti
On 2012-12-05, Nick Mellor wrote: > Neil, > > Further down the data, found another edge case: > > "Spring ONION from QLD" > > Following the spec, the whole line should be description > (description starts at first word that is not all caps.) This > case breaks the latest groupby. A-ha! I did chec

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Nick Mellor
Neil, Further down the data, found another edge case: "Spring ONION from QLD" Following the spec, the whole line should be description (description starts at first word that is not all caps.) This case breaks the latest groupby. N -- http://mail.python.org/mailman/listinfo/python-list

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Neil Cerutti
On 2012-12-05, Nick Mellor wrote: > Hi Neil, > > Here's some sample data. The live data is about 300 minor > variations on the sample data, about 20,000 lines. Thanks, Nick. This slight variation on my first groupby try seems to work for the test data. def prod_desc(s): prod = [] desc =

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread MRAB
On 2012-12-05 13:45, Chris Angelico wrote: On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote: takewhile mines for gold at the start of a sequence, dropwhile drops the dross at the start of a sequence. When you're using both over the same sequence and with the same condition, it seems odd t

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread MRAB
On 2012-12-05 17:04, Nick Mellor wrote: Hi Neil, Here's some sample data. The live data is about 300 minor variations on the sample data, about 20,000 lines. [snip] You have a duplicate: CELERY Mornington Peninsula IPM grower CELERY Mornington Peninsula IPM grower -- http://mail.python.or

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Nick Mellor
Hi Neil, Here's some sample data. The live data is about 300 minor variations on the sample data, about 20,000 lines. Nick Notes: 1. Whitespace is only used for word boundaries. Surplus whitespace is not significant and can be stripped 2. Retain punctuation and parentheses 3. Product is zer

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Ian Kelly
On Wed, Dec 5, 2012 at 6:45 AM, Chris Angelico wrote: > On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote: >> >> takewhile mines for gold at the start of a sequence, dropwhile drops the >> dross at the start of a sequence. > > When you're using both over the same sequence and with the same > co

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Neil Cerutti
On 2012-12-05, Ian Kelly wrote: > On Wed, Dec 5, 2012 at 7:34 AM, Neil Cerutti wrote: >> Well, shoot! Then this is a job for groupby, not takewhile. > > The problem with groupby is that you can't just limit it to two groups. > prod_desc("CAPSICUM RED fresh from QLD") > ['QLD', 'fresh from']

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Mark Lawrence
On 05/12/2012 13:45, Chris Angelico wrote: I tested it on Python 3.2 (yeah, time I upgraded, I know). Bad move, fancy wanting to go to the completely useless version of Python that simply can't handle unicode properly :) -- Cheers. Mark Lawrence. -- http://mail.python.org/mailman/listinfo

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Ian Kelly
On Wed, Dec 5, 2012 at 7:34 AM, Neil Cerutti wrote: > Well, shoot! Then this is a job for groupby, not takewhile. The problem with groupby is that you can't just limit it to two groups. >>> prod_desc("CAPSICUM RED fresh from QLD") ['QLD', 'fresh from'] Once you've got a false key from the group

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Neil Cerutti
On 2012-12-05, Chris Angelico wrote: > On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote: >> >> takewhile mines for gold at the start of a sequence, dropwhile >> drops the dross at the start of a sequence. > > When you're using both over the same sequence and with the same > condition, it seems

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Chris Angelico
On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote: > > takewhile mines for gold at the start of a sequence, dropwhile drops the > dross at the start of a sequence. When you're using both over the same sequence and with the same condition, it seems odd that you need to iterate over it twice. Per

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-05 Thread Neil Cerutti
On 2012-12-05, Nick Mellor wrote: > Hi Terry, > > For my money, and especially in your versions, despite several > expert solutions using other features, itertools has it. It > seems to me to need less nutting out than the other approaches. > It's short, robust, has a minimum of symbols, uses simp

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Nick Mellor
Hi Terry, For my money, and especially in your versions, despite several expert solutions using other features, itertools has it. It seems to me to need less nutting out than the other approaches. It's short, robust, has a minimum of symbols, uses simple expressions and is not overly clever. If

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Terry Reedy
On 12/4/2012 3:44 PM, Terry Reedy wrote: If the original string has no excess whitespace, description is what remains of s after product prefix is omitted. (Py 3 code) from itertools import takewhile def allcaps(word): return word == word.upper() def split_product_itertools(s): product =

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Steven D'Aprano
Ian, For the sanity of those of us reading this via Usenet using the Pan newsreader, could you please turn off HTML emailing? It's very distracting. Thanks, Steven On Tue, 04 Dec 2012 12:37:38 -0700, Ian Kelly wrote: [...] > On Tue, > Dec 4, 2012 at 11:48 AM, Alexander Blinne < href="mailto

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Vlastimil Brom
2012/12/4 Nick Mellor : > I love the way you guys can write a line of code that does the same as 20 of > mine :) > I can turn up the heat on your regex by feeding it a null description or > multiple white space (both in the original file.) I'm sure you'd adjust, but > at the cost of a more compl

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Terry Reedy
On 12/4/2012 8:57 AM, Nick Mellor wrote: I have a file full of things like this: "CAPSICUM RED fresh from Queensland" Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note th

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Alexander Blinne
Am 04.12.2012 20:37, schrieb Ian Kelly: > >>> def split_product(p): > ... w = p.split(" ") > ... j = next(i for i,v in enumerate(w) if v.upper() != v) > ... return " ".join(w[:j]), " ".join(w[j:]) > > > It still fails if the product description is empty. That's true..

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread MRAB
On 2012-12-04 19:37, Ian Kelly wrote: On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne mailto:n...@blinne.net>> wrote: Am 04.12.2012 19:28, schrieb DJC: (i for i,v in enumerate(w) if v.upper() != v).next() > Traceback (most recent call last): > File "", line 1, in

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Ian Kelly
On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne wrote: > Am 04.12.2012 19:28, schrieb DJC: > (i for i,v in enumerate(w) if v.upper() != v).next() > > Traceback (most recent call last): > > File "", line 1, in > > AttributeError: 'generator' object has no attribute 'next' > > Yeah, i saw

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Alexander Blinne
Am 04.12.2012 19:28, schrieb DJC: (i for i,v in enumerate(w) if v.upper() != v).next() > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'generator' object has no attribute 'next' Yeah, i saw this problem right after i sent the posting. It now is supposed to read

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread DJC
On 04/12/12 17:18, Alexander Blinne wrote: Another neat solution with a little help from http://stackoverflow.com/questions/1701211/python-return-the-index-of-the-first-element-of-a-list-which-makes-a-passed-fun def split_product(p): w = p.split(" ") j = (i for i,v in enumer

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Neil Cerutti
On 2012-12-04, Nick Mellor wrote: > I love the way you guys can write a line of code that does the > same as 20 of mine :) > > I can turn up the heat on your regex by feeding it a null > description or multiple white space (both in the original > file.) I'm sure you'd adjust, but at the cost of a

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Alexander Blinne
Another neat solution with a little help from http://stackoverflow.com/questions/1701211/python-return-the-index-of-the-first-element-of-a-list-which-makes-a-passed-fun >>> def split_product(p): ... w = p.split(" ") ... j = (i for i,v in enumerate(w) if v.upper() != v).next() ... retu

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Neil Cerutti
On 2012-12-04, Nick Mellor wrote: > Hi Neil, > > Nice! But fails if the first word of the description starts > with a capital letter. Darn edge cases. -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Nick Mellor
Hi Neil, Nice! But fails if the first word of the description starts with a capital letter. Nick On Wednesday, 5 December 2012 01:23:34 UTC+11, Neil Cerutti wrote: > On 2012-12-04, Nick Mellor wrote: > > > I have a file full of things like this: > > > > > > "CAPSICUM RED fresh from Queens

Re: Good use for itertools.dropwhile and itertools.takewhile

2012-12-04 Thread Neil Cerutti
On 2012-12-04, Nick Mellor wrote: > I have a file full of things like this: > > "CAPSICUM RED fresh from Queensland" > > Product names (all caps, at start of string) and descriptions > (mixed case, to end of string) all muddled up in the same > field. And I need to split them into two fields. Note