Re: Regular Expression

MRAB Sun, 12 Apr 2015 18:06:07 -0700

On 2015-04-13 01:25, Pippo wrote:

On Sunday, 12 April 2015 20:06:08 UTC-4, MRAB  wrote:

On 2015-04-13 00:47, Pippo wrote:
> On Sunday, 12 April 2015 19:44:05 UTC-4, Pippo  wrote:
>> On Sunday, 12 April 2015 19:28:44 UTC-4, MRAB  wrote:
>> > On 2015-04-12 23:49, Pippo wrote:
>> > > I have a text as follows:
>> > >
>> > > "#D{#C[Health] #P[Information] -
>> > > means any information, including #ST[genetic information],
>> > > whether #C[oral | (recorded in (any form | medium))], that
>> > > (1)#C[Is created or received by] a
>> > > #A[health care provider | health plan | public health authority | 
employer | life insurer | school | university | or health care clearinghouse];
>> > > (2)#C[Relates to] #C[the past, present, or future physical | mental 
health | condition of an individual] |
>> > > #C[the provision of health care to an individual] |
>> > > #C[the past, present, or future payment for the provision of health care to 
an individual].}"
>> > >
>> > > I want to get all elements that start with #C and are []  and put it in 
an array. For example #C[Health], I try with regex but it doesn't work:
>> > >
>> > "... it doesn't work"? In what way doesn't it work?
>> >
>> > > import re
>> > > import tkinter.filedialog
>> > > import readfile
>> > >
>> > >
>> > >
>> > > j = 0
>> > >
>> > > text = [ ]
>> > >
>> > >
>> > > content = readfile.pattread()
>> > >
>> > > while j < len(content):
>> > >
>> > There's a syntax error here:
>> >
>> > >      constraint = re.compile(r'(#C\[\w*\]'))
>> > >      result = constraint.search(content[j],re.MULTILINE)
>> > >      text.append(result)
>> > >      print(text)
>> > >      j = j+1
>> > >
>>
>> result is empty! Although it should have a content.
>>
>> What is the syntax error?
>
> I fixed the syntax error but the result shows:
>
>>>>
> [None]
> [None, None]
> [None, None, None]
> [None, None, None, None]
> [None, None, None, None, None]
> [None, None, None, None, None, None]
> [None, None, None, None, None, None, None]
> [None, None, None, None, None, None, None, None]
>>>>
>
>
> No error but if I don't call the content I posted up and call this as a content: 
#content = "#C[Health] #P[Information]"
>
> result gives me #C[Health]
>
What does 'readfile.pattread()' return? Does it return a list of
strings? I'm guessing it does.


yes it reads a file of string similar to the one I posted above


Try printing each string you're trying to match using 'repr', i.e.:

     print(repr(content[j]))

Do any look like they should match?


  print(repr(content[j])) gives me the following:

[None]
'#D{#C[Health] #P[Information] - \n'
[None, None]
'means any information, including #ST[genetic information], \n'
[None, None, None]
'whether #C[oral | (recorded in (any form | medium))], that \n'
[None, None, None, None]
'(1)#C[Is created or received by] a \n'
[None, None, None, None, None]
'#A[health care provider | health plan | public health authority | employer | 
life insurer | school | university | or health care clearinghouse];  \n'
[None, None, None, None, None, None]
'(2)#C[Relates to] #C[the past, present, or future physical | mental health | 
condition of an individual] | \n'
[None, None, None, None, None, None, None]
'#C[the provision of health care to an individual] | \n'
[None, None, None, None, None, None, None, None]
'#C[the past, present, or future payment for the provision of health care to an 
individual].}\n'

shouldn't it match "#C[Health]" in the first row? If not, what is the best way 
to fetch these items in an array?


If one doesn't, but you think it should, post it here so that someone
can tell you why it doesn't! :-)

It took me a while to spot the problem...

You're passing re.MULTILINE as the second argument of
'constraint.search', but look at the help text:

>>> help(constraint.search)
Help on built-in function search:

search(...) method of _sre.SRE_Pattern instance
    search(string[, pos[, endpos]]) -> match object or None.
    Scan through string looking for a match, and return a corresponding

match object instance. Return None if no position in the stringmatches.


The second argument is the starting position for the search, _not_
flags.

That flag should've been passed as the second argument of re.compile
(not that it's needed by that pattern, anyway).

Actually, there's no point compiling the same regex every time; you
might as well compiling it one, outside (just before) the loop.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression

Reply via email to