Nick Mellor writes:
> I came across itertools.dropwhile only today, then shortly afterwards
> found Raymond Hettinger wondering, in 2007, whether to drop [sic]
> dropwhile and takewhile from the itertools module
> Almost nobody else of the 18 respondents seemed to be using them.
What? I'm am
2012/12/6 Neil Cerutti :
> On 2012-12-05, Vlastimil Brom wrote:
>> ... PARSNIP, certified organic
>
> I'm not sure on this one.
>
>> ('PARSNIP', ', certified organic')
>
> --
> Neil Cerutti
> --
Well, I wasn't either, when I noticed this item, but given the specification:
"2. Retain punctuation a
Am 05.12.2012 18:04, schrieb Nick Mellor:
> Sample data
Well let's see what
def split_product(p):
p = p.strip()
w = p.split(" ")
try:
j = next(i for i,v in enumerate(w) if v.upper() != v)
except StopIteration:
return p, ''
return " ".join(w[:j]), " ".join(w[j:]
On 2012-12-05, Vlastimil Brom wrote:
> ... PARSNIP, certified organic
I'm not sure on this one.
> ('PARSNIP', ', certified organic')
--
Neil Cerutti
--
http://mail.python.org/mailman/listinfo/python-list
2012/12/5 Nick Mellor :
> Neil,
>
> Further down the data, found another edge case:
>
> "Spring ONION from QLD"
>
> Following the spec, the whole line should be description (description starts
> at first word that is not all caps.) This case breaks the latest groupby.
>
> N
> --
> http://mail.pyth
On 2012-12-05, Nick Mellor wrote:
> Neil,
>
> Further down the data, found another edge case:
>
> "Spring ONION from QLD"
>
> Following the spec, the whole line should be description
> (description starts at first word that is not all caps.) This
> case breaks the latest groupby.
A-ha! I did chec
Neil,
Further down the data, found another edge case:
"Spring ONION from QLD"
Following the spec, the whole line should be description (description starts at
first word that is not all caps.) This case breaks the latest groupby.
N
--
http://mail.python.org/mailman/listinfo/python-list
On 2012-12-05, Nick Mellor wrote:
> Hi Neil,
>
> Here's some sample data. The live data is about 300 minor
> variations on the sample data, about 20,000 lines.
Thanks, Nick.
This slight variation on my first groupby try seems to work for
the test data.
def prod_desc(s):
prod = []
desc =
On 2012-12-05 13:45, Chris Angelico wrote:
On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote:
takewhile mines for gold at the start of a sequence, dropwhile drops the dross
at the start of a sequence.
When you're using both over the same sequence and with the same
condition, it seems odd t
On 2012-12-05 17:04, Nick Mellor wrote:
Hi Neil,
Here's some sample data. The live data is about 300 minor variations on the
sample data, about 20,000 lines.
[snip]
You have a duplicate:
CELERY Mornington Peninsula IPM grower
CELERY Mornington Peninsula IPM grower
--
http://mail.python.or
Hi Neil,
Here's some sample data. The live data is about 300 minor variations on the
sample data, about 20,000 lines.
Nick
Notes:
1. Whitespace is only used for word boundaries. Surplus whitespace is not
significant and can be stripped
2. Retain punctuation and parentheses
3. Product is zer
On Wed, Dec 5, 2012 at 6:45 AM, Chris Angelico wrote:
> On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote:
>>
>> takewhile mines for gold at the start of a sequence, dropwhile drops the
>> dross at the start of a sequence.
>
> When you're using both over the same sequence and with the same
> co
On 2012-12-05, Ian Kelly wrote:
> On Wed, Dec 5, 2012 at 7:34 AM, Neil Cerutti wrote:
>> Well, shoot! Then this is a job for groupby, not takewhile.
>
> The problem with groupby is that you can't just limit it to two groups.
>
prod_desc("CAPSICUM RED fresh from QLD")
> ['QLD', 'fresh from']
On 05/12/2012 13:45, Chris Angelico wrote:
I tested it on Python 3.2 (yeah, time I upgraded, I know).
Bad move, fancy wanting to go to the completely useless version of
Python that simply can't handle unicode properly :)
--
Cheers.
Mark Lawrence.
--
http://mail.python.org/mailman/listinfo
On Wed, Dec 5, 2012 at 7:34 AM, Neil Cerutti wrote:
> Well, shoot! Then this is a job for groupby, not takewhile.
The problem with groupby is that you can't just limit it to two groups.
>>> prod_desc("CAPSICUM RED fresh from QLD")
['QLD', 'fresh from']
Once you've got a false key from the group
On 2012-12-05, Chris Angelico wrote:
> On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote:
>>
>> takewhile mines for gold at the start of a sequence, dropwhile
>> drops the dross at the start of a sequence.
>
> When you're using both over the same sequence and with the same
> condition, it seems
On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor wrote:
>
> takewhile mines for gold at the start of a sequence, dropwhile drops the
> dross at the start of a sequence.
When you're using both over the same sequence and with the same
condition, it seems odd that you need to iterate over it twice.
Per
On 2012-12-05, Nick Mellor wrote:
> Hi Terry,
>
> For my money, and especially in your versions, despite several
> expert solutions using other features, itertools has it. It
> seems to me to need less nutting out than the other approaches.
> It's short, robust, has a minimum of symbols, uses simp
Hi Terry,
For my money, and especially in your versions, despite several expert solutions
using other features, itertools has it. It seems to me to need less nutting out
than the other approaches. It's short, robust, has a minimum of symbols, uses
simple expressions and is not overly clever. If
On 12/4/2012 3:44 PM, Terry Reedy wrote:
If the original string has no excess whitespace, description is what
remains of s after product prefix is omitted. (Py 3 code)
from itertools import takewhile
def allcaps(word): return word == word.upper()
def split_product_itertools(s):
product =
Ian,
For the sanity of those of us reading this via Usenet using the Pan
newsreader, could you please turn off HTML emailing? It's very
distracting.
Thanks,
Steven
On Tue, 04 Dec 2012 12:37:38 -0700, Ian Kelly wrote:
[...]
> On Tue,
> Dec 4, 2012 at 11:48 AM, Alexander Blinne < href="mailto
2012/12/4 Nick Mellor :
> I love the way you guys can write a line of code that does the same as 20 of
> mine :)
> I can turn up the heat on your regex by feeding it a null description or
> multiple white space (both in the original file.) I'm sure you'd adjust, but
> at the cost of a more compl
On 12/4/2012 8:57 AM, Nick Mellor wrote:
I have a file full of things like this:
"CAPSICUM RED fresh from Queensland"
Product names (all caps, at start of string) and descriptions (mixed
case, to end of string) all muddled up in the same field. And I need
to split them into two fields. Note th
Am 04.12.2012 20:37, schrieb Ian Kelly:
> >>> def split_product(p):
> ... w = p.split(" ")
> ... j = next(i for i,v in enumerate(w) if v.upper() != v)
> ... return " ".join(w[:j]), " ".join(w[j:])
>
>
> It still fails if the product description is empty.
That's true..
On 2012-12-04 19:37, Ian Kelly wrote:
On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne mailto:n...@blinne.net>> wrote:
Am 04.12.2012 19:28, schrieb DJC:
(i for i,v in enumerate(w) if v.upper() != v).next()
> Traceback (most recent call last):
> File "", line 1, in
On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne wrote:
> Am 04.12.2012 19:28, schrieb DJC:
> (i for i,v in enumerate(w) if v.upper() != v).next()
> > Traceback (most recent call last):
> > File "", line 1, in
> > AttributeError: 'generator' object has no attribute 'next'
>
> Yeah, i saw
Am 04.12.2012 19:28, schrieb DJC:
(i for i,v in enumerate(w) if v.upper() != v).next()
> Traceback (most recent call last):
> File "", line 1, in
> AttributeError: 'generator' object has no attribute 'next'
Yeah, i saw this problem right after i sent the posting. It now is
supposed to read
On 04/12/12 17:18, Alexander Blinne wrote:
Another neat solution with a little help from
http://stackoverflow.com/questions/1701211/python-return-the-index-of-the-first-element-of-a-list-which-makes-a-passed-fun
def split_product(p):
w = p.split(" ")
j = (i for i,v in enumer
On 2012-12-04, Nick Mellor wrote:
> I love the way you guys can write a line of code that does the
> same as 20 of mine :)
>
> I can turn up the heat on your regex by feeding it a null
> description or multiple white space (both in the original
> file.) I'm sure you'd adjust, but at the cost of a
Another neat solution with a little help from
http://stackoverflow.com/questions/1701211/python-return-the-index-of-the-first-element-of-a-list-which-makes-a-passed-fun
>>> def split_product(p):
... w = p.split(" ")
... j = (i for i,v in enumerate(w) if v.upper() != v).next()
... retu
On 2012-12-04, Nick Mellor wrote:
> Hi Neil,
>
> Nice! But fails if the first word of the description starts
> with a capital letter.
Darn edge cases.
--
Neil Cerutti
--
http://mail.python.org/mailman/listinfo/python-list
Hi Neil,
Nice! But fails if the first word of the description starts with a capital
letter.
Nick
On Wednesday, 5 December 2012 01:23:34 UTC+11, Neil Cerutti wrote:
> On 2012-12-04, Nick Mellor wrote:
>
> > I have a file full of things like this:
>
> >
>
> > "CAPSICUM RED fresh from Queens
On 2012-12-04, Nick Mellor wrote:
> I have a file full of things like this:
>
> "CAPSICUM RED fresh from Queensland"
>
> Product names (all caps, at start of string) and descriptions
> (mixed case, to end of string) all muddled up in the same
> field. And I need to split them into two fields. Note
33 matches
Mail list logo