Thanks Kent! Once more you go straight to the point!
Kent Johnson writes:
> On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga wrote:
>> In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0)
>> Out[33]: 'a45453b. a325643b. '
>
> group(0) is the entire match so this returns what you expect. But what
Ok -- realized my "solution" incorrectly strips white space from
multiword strings:
> Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.']
>
So here are some more gymnastics to get the correct result:
In [105]: newlist
Out[105]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|']
In [109]
As usual, Kent Johnson has swooped in an untangled the mess with a
clear explanation.
By the time a regex gets this complicated, I typically start thinking
of ways to simplify or avoid them altogether.
Below is the code I came up with. It goes through some gymnastics and
can surely stand improvem
On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga wrote:
> Hi!
>
> I am trying to split some lists out of a single text file, and I am
> having a hard time. I have reduced the problem to the following one:
>
> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>
> Of this line of text, I wan
Serdar Tumgoren writes:
> Hey Tiago,
>
>> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>>
>> Of this line of text, I want to take out strings where all words start
>> with a, end with "b.". But I don't want a list of words. I want that:
>>
>> ["a2345b.", "a45453b. a325643b. a4356
apologies -- I just reread your post and appears you also want to
capture the dot after each "b" ( "b." )
In that case, you need to update the pattern to match for the dot. But
because the dot is itself a metacharacter, you have to escape it with
a backslash:
In [23]: re.findall(r'a\w+b\.',text)
Hey Tiago,
> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
>
> Of this line of text, I want to take out strings where all words start
> with a, end with "b.". But I don't want a list of words. I want that:
>
> ["a2345b.", "a45453b. a325643b. a435643b."]
>
Are you saying you want a
Hi!
I am trying to split some lists out of a single text file, and I am
having a hard time. I have reduced the problem to the following one:
text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b."
Of this line of text, I want to take out strings where all words start
with a, end with "b.".