Re: [Tutor] re module / separator

2009-06-25 Thread Tiago Saboga
Thanks Kent! Once more you go straight to the point! Kent Johnson writes: > On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga wrote: >> In [33]: re.search("(a[^.]*?b\.\s?){2}", text).group(0) >> Out[33]: 'a45453b. a325643b. ' > > group(0) is the entire match so this returns what you expect. But what

Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
Ok -- realized my "solution" incorrectly strips white space from multiword strings: > Out[92]: ['a2345b.', 'a45453b.a325643b.a435643b.'] > So here are some more gymnastics to get the correct result: In [105]: newlist Out[105]: ['a2345b.', '|', 'a45453b.', 'a325643b.', 'a435643b.', '|'] In [109]

Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
As usual, Kent Johnson has swooped in an untangled the mess with a clear explanation. By the time a regex gets this complicated, I typically start thinking of ways to simplify or avoid them altogether. Below is the code I came up with. It goes through some gymnastics and can surely stand improvem

Re: [Tutor] re module / separator

2009-06-24 Thread Kent Johnson
On Wed, Jun 24, 2009 at 2:24 PM, Tiago Saboga wrote: > Hi! > > I am trying to split some lists out of a single text file, and I am > having a hard time. I have reduced the problem to the following one: > > text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b." > > Of this line of text, I wan

Re: [Tutor] re module / separator

2009-06-24 Thread Tiago Saboga
Serdar Tumgoren writes: > Hey Tiago, > >> text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b." >> >> Of this line of text, I want to take out strings where all words start >> with a, end with "b.". But I don't want a list of words. I want that: >> >> ["a2345b.", "a45453b. a325643b. a4356

Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
apologies -- I just reread your post and appears you also want to capture the dot after each "b" ( "b." ) In that case, you need to update the pattern to match for the dot. But because the dot is itself a metacharacter, you have to escape it with a backslash: In [23]: re.findall(r'a\w+b\.',text)

Re: [Tutor] re module / separator

2009-06-24 Thread Serdar Tumgoren
Hey Tiago, > text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b." > > Of this line of text, I want to take out strings where all words start > with a, end with "b.". But I don't want a list of words. I want that: > > ["a2345b.", "a45453b. a325643b. a435643b."] > Are you saying you want a

[Tutor] re module / separator

2009-06-24 Thread Tiago Saboga
Hi! I am trying to split some lists out of a single text file, and I am having a hard time. I have reduced the problem to the following one: text = "a2345b. f325. a45453b. a325643b. a435643b. g234324b." Of this line of text, I want to take out strings where all words start with a, end with "b.".