On 5/6/2016 11:44 AM, Peter Otten wrote:
DFS wrote:
There are up to 4 levels of categorization:
http://www.usdirectory.com/cat/g0 shows 21 Level 1 categories, and 390
Level 2. To get the Level 3 and 4 you have to drill-down using the
hyperlinks.
How to do it in python code is beyond my ski
DFS wrote:
> There are up to 4 levels of categorization:
> http://www.usdirectory.com/cat/g0 shows 21 Level 1 categories, and 390
> Level 2. To get the Level 3 and 4 you have to drill-down using the
> hyperlinks.
>
> How to do it in python code is beyond my skills at this point. Get the
> hre
On 5/6/2016 9:58 AM, DFS wrote:
On 5/6/2016 3:45 AM, Peter Otten wrote:
DFS wrote:
Should've looked earlier. Their master list of categories
http://www.usdirectory.com/cat/g0 shows a few commas, a bunch of dashes,
and the ampersands we talked about.
"OFFICE SERVICES, SUPPLIES & EQUIPMENT" g
On 5/6/2016 3:45 AM, Peter Otten wrote:
DFS wrote:
Should've looked earlier. Their master list of categories
http://www.usdirectory.com/cat/g0 shows a few commas, a bunch of dashes,
and the ampersands we talked about.
"OFFICE SERVICES, SUPPLIES & EQUIPMENT" gets removed because of the comma.
On Thu, 05 May 2016 19:31:33 -0400, DFS wrote:
> On 5/5/2016 1:39 AM, Stephen Hansen wrote:
>
>> Given:
>>
> input = [u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs
> & Gymnasiums (42)', 'Health Fitness Clubs', 'Name', 'Atlanta city
> guide', 'edit address', 'Tweet', 'PHY
DFS wrote:
> On 5/5/2016 1:39 AM, Stephen Hansen wrote:
>
>> Given:
>>
> input = [u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs &
> Gymnasiums (42)', 'Health Fitness Clubs', 'Name', 'Atlanta city
> guide', 'edit address', 'Tweet', 'PHYSICAL FITNESS CONSULTANTS &
> TR
Steven D'Aprano writes:
> On Fri, 6 May 2016 04:27 am, Jussi Piitulainen wrote:
>
>> Random832's pattern is fine. You need to use re.fullmatch with it.
>
> py> re.fullmatch
> Traceback (most recent call last):
> File "", line 1, in
> AttributeError: 'module' object has no attribute 'fullmatch'
On Fri, 6 May 2016 04:27 am, Jussi Piitulainen wrote:
> Random832's pattern is fine. You need to use re.fullmatch with it.
py> re.fullmatch
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'fullmatch'
--
Steven
--
https://mail.python
On 5/5/2016 1:39 AM, Stephen Hansen wrote:
Given:
input = [u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs & Gymnasiums (42)', 'Health Fitness Clubs',
'Name', 'Atlanta city guide', 'edit address', 'Tweet', 'PHYSICAL FITNESS CONSULTANTS & TRAINERS', 'HEALTH CLUBS &
GYMNASIUMS', 'H
On 5/5/2016 2:56 PM, Stephen Hansen wrote:
On Thu, May 5, 2016, at 05:31 AM, DFS wrote:
You are out of your mind.
Whoa, now. I might disagree with Steven D'Aprano about how to approach
this problem, but there's no need to be rude.
Seriously not trying to be rude - more smart-alecky than anyt
On 5/5/2016 1:54 PM, Steven D'Aprano wrote:
On Thu, 5 May 2016 10:31 pm, DFS wrote:
You are out of your mind.
That's twice you've tried to put me down, first by dismissing my comments
about text processing with "Linguist much", and now an outright insult. The
first time I laughed it off and m
On Thu, May 5, 2016, at 11:03 AM, Steven D'Aprano wrote:
> - Nobody could possibly want to support non-ASCII text. (Apart from the
> approximately 6.5 billion people in the world that don't speak English of
> course, an utterly insignificant majority.)
Oh, I'd absolutely want to support non-ASCII
On Thu, May 5, 2016, at 05:31 AM, DFS wrote:
> You are out of your mind.
Whoa, now. I might disagree with Steven D'Aprano about how to approach
this problem, but there's no need to be rude. Everyone's trying to help
you, after all.
--
Stephen Hansen
m e @ i x o k a i . i o
--
https://mail.pyt
On Thu, May 5, 2016, at 10:43 AM, Steven D'Aprano wrote:
> On Thu, 5 May 2016 11:32 pm, Stephen Hansen wrote:
>
> > On Thu, May 5, 2016, at 12:36 AM, Steven D'Aprano wrote:
> >> Oh, a further thought...
> >>
> >> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> >> > I don't even care about
On Thu, May 5, 2016, at 14:27, Jussi Piitulainen wrote:
> Random832's pattern is fine. You need to use re.fullmatch with it.
Heh, in my previous post I said "and one could easily imagine an API
that implicitly anchors at the end". So easy to imagine it turns out
that someone already did, as it tur
On Thu, May 5, 2016, at 14:03, Steven D'Aprano wrote:
> You failed to anchor the string at the beginning and end of the string,
> an easy mistake to make, but that's the point.
I don't think anchoring is properly a concern of the regex itself -
.match is anchored implicitly at the beginning, and o
Steven D'Aprano writes:
> On Fri, 6 May 2016 03:49 am, Jussi Piitulainen wrote:
>
>> Steven D'Aprano writes:
>>
>>> I get something like this:
>>>
>>> r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)"
>>>
>>>
>>> but it fails on strings like "AA & A & A". What am I doing wrong?
>>
>> It cannot spl
On Fri, 6 May 2016 03:49 am, Jussi Piitulainen wrote:
> Steven D'Aprano writes:
>
>> I get something like this:
>>
>> r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)"
>>
>>
>> but it fails on strings like "AA & A & A". What am I doing wrong?
>
> It cannot split the string as (LETTERS & LETTERS)(L
On Thu, 5 May 2016 11:21 pm, Random832 wrote:
> On Thu, May 5, 2016, at 03:36, Steven D'Aprano wrote:
>> Putting non-ASCII letters aside for the moment, how would you match these
>> specs as a regular expression?
>
> Well, obviously *your* language (not the OP's), given the cases you
> reject, is
Steven D'Aprano writes:
> I get something like this:
>
> r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)"
>
>
> but it fails on strings like "AA & A & A". What am I doing wrong?
It cannot split the string as (LETTERS & LETTERS)(LETTERS & LETTERS)
when the middle part is just one LETTER. That's som
On Thu, 5 May 2016 10:31 pm, DFS wrote:
> You are out of your mind.
That's twice you've tried to put me down, first by dismissing my comments
about text processing with "Linguist much", and now an outright insult. The
first time I laughed it off and made a joke about it. I won't do that
again.
Y
On Thu, 5 May 2016 11:32 pm, Stephen Hansen wrote:
> On Thu, May 5, 2016, at 12:36 AM, Steven D'Aprano wrote:
>> Oh, a further thought...
>>
>> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
>> > I don't even care about faster: Its overly complicated. Sometimes a
>> > regular expression rea
On Thu, 5 May 2016 11:13 pm, Random832 wrote:
> On Thu, May 5, 2016, at 04:41, Steven D'Aprano wrote:
>> > There's no situation where "&" and " " will exist in the given
>> > dataset, and recognizing that is important. You don't have to account
>> > for every bit of nonsense.
>>
>> Whenev
On Thu, 5 May 2016 06:17 pm, Peter Otten wrote:
>> I get something like this:
>>
>> r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)"
>>
>>
>> but it fails on strings like "AA & A & A". What am I doing wrong?
> test("^A+( *& *A+)*$")
Thanks Peter, that's nice!
--
Steven
--
https://mail.pyt
On 5/5/2016 9:32 AM, Stephen Hansen wrote:
On Thu, May 5, 2016, at 12:36 AM, Steven D'Aprano wrote:
Oh, a further thought...
On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
I don't even care about faster: Its overly complicated. Sometimes a
regular expression really is the clearest way to
On Thu, May 5, 2016, at 12:36 AM, Steven D'Aprano wrote:
> Oh, a further thought...
>
> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> > I don't even care about faster: Its overly complicated. Sometimes a
> > regular expression really is the clearest way to solve a problem.
>
> Putting no
On Thu, May 5, 2016, at 03:36, Steven D'Aprano wrote:
> Putting non-ASCII letters aside for the moment, how would you match these
> specs as a regular expression?
Well, obviously *your* language (not the OP's), given the cases you
reject, is "one or more sequences of letters separated by
space*-a
On Thu, May 5, 2016, at 04:41, Steven D'Aprano wrote:
> > There's no situation where "&" and " " will exist in the given
> > dataset, and recognizing that is important. You don't have to account
> > for every bit of nonsense.
>
> Whenever a programmer says "This case will never happen",
On 5/5/2016 1:53 AM, Jussi Piitulainen wrote:
Either way is easy to approximate with a regex:
import re
upper = re.compile(r'[A-Z &]+')
lower = re.compile(r'[^A-Z &]')
print([datum for datum in data if upper.fullmatch(datum)])
print([datum for datum in data if not lower.search(datum)])
This
On 5/5/2016 1:39 AM, Stephen Hansen wrote:
pattern = re.compile(r"^[A-Z\s&]+$")
output = [x for x in list if pattern.match(x)]
Holy Shr"^[A-Z\s&]+$" One line of parsing!
I was figuring a few list comprehensions would do it - this is better.
(note: the reason I specified 'spaces aroun
On 5/5/2016 2:04 AM, Steven D'Aprano wrote:
On Thursday 05 May 2016 14:58, DFS wrote:
Want to whittle a list like this:
[...]
Want to keep all elements containing only upper case letters or upper
case letters and ampersand (where ampersand is surrounded by spaces)
Start by writing a functi
On Thursday 05 May 2016 17:34, Stephen Hansen wrote:
> Meh. You have a pedantic definition of wrong. Given the inputs, it
> produced right output. Very often that's enough. Perfect is the enemy of
> good, it's said.
And this is a *perfect* example of why we have things like this:
http://www.bbc
Steven D'Aprano wrote:
> Oh, a further thought...
>
>
> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
>
>> On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
>>> Start by writing a function or a regex that will distinguish strings
>>> that match your conditions from those that don'
Oh, a further thought...
On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
>> Start by writing a function or a regex that will distinguish strings that
>> match your conditions from those that don't. A regex might be faster, but
>> he
On Thu, May 5, 2016, at 12:04 AM, Steven D'Aprano wrote:
> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> > > On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
> >> Start by writing a function or a regex that will distinguish strings that
> >> match your conditions from those that do
On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
>> Start by writing a function or a regex that will distinguish strings that
>> match your conditions from those that don't. A regex might be faster, but
>> here's a function version.
>>
On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
> Start by writing a function or a regex that will distinguish strings that
> match your conditions from those that don't. A regex might be faster, but
> here's a function version.
> ... snip ...
Yikes. I'm all for the idea that one should
On Thursday 05 May 2016 14:58, DFS wrote:
> Want to whittle a list like this:
[...]
> Want to keep all elements containing only upper case letters or upper
> case letters and ampersand (where ampersand is surrounded by spaces)
Start by writing a function or a regex that will distinguish strings
DFS writes:
. .
> Want to keep all elements containing only upper case letters or upper
> case letters and ampersand (where ampersand is surrounded by spaces)
>
> Is it easier to extract elements meeting those conditions, or remove
> elements meeting the following conditions:
>
> * elements with
On Wed, May 4, 2016, at 09:58 PM, DFS wrote:
> Want to whittle a list like this:
>
> [u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs &
> Gymnasiums (42)', 'Health Fitness Clubs', 'Name', 'Atlanta city guide',
> 'edit address', 'Tweet', 'PHYSICAL FITNESS CONSULTANTS & TRAINERS',
> '
Want to whittle a list like this:
[u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs &
Gymnasiums (42)', 'Health Fitness Clubs', 'Name', 'Atlanta city guide',
'edit address', 'Tweet', 'PHYSICAL FITNESS CONSULTANTS & TRAINERS',
'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS',
41 matches
Mail list logo