On Sunday, 12 April 2015 20:06:08 UTC-4, MRAB wrote:
On 2015-04-13 00:47, Pippo wrote:
> On Sunday, 12 April 2015 19:44:05 UTC-4, Pippo wrote:
>> On Sunday, 12 April 2015 19:28:44 UTC-4, MRAB wrote:
>> > On 2015-04-12 23:49, Pippo wrote:
>> > > I have a text as follows:
>> > >
>> > > "#D{#C[Health] #P[Information] -
>> > > means any information, including #ST[genetic information],
>> > > whether #C[oral | (recorded in (any form | medium))], that
>> > > (1)#C[Is created or received by] a
>> > > #A[health care provider | health plan | public health authority |
employer | life insurer | school | university | or health care clearinghouse];
>> > > (2)#C[Relates to] #C[the past, present, or future physical | mental
health | condition of an individual] |
>> > > #C[the provision of health care to an individual] |
>> > > #C[the past, present, or future payment for the provision of health care to
an individual].}"
>> > >
>> > > I want to get all elements that start with #C and are [] and put it in
an array. For example #C[Health], I try with regex but it doesn't work:
>> > >
>> > "... it doesn't work"? In what way doesn't it work?
>> >
>> > > import re
>> > > import tkinter.filedialog
>> > > import readfile
>> > >
>> > >
>> > >
>> > > j = 0
>> > >
>> > > text = [ ]
>> > >
>> > >
>> > > content = readfile.pattread()
>> > >
>> > > while j < len(content):
>> > >
>> > There's a syntax error here:
>> >
>> > > constraint = re.compile(r'(#C\[\w*\]'))
>> > > result = constraint.search(content[j],re.MULTILINE)
>> > > text.append(result)
>> > > print(text)
>> > > j = j+1
>> > >
>>
>> result is empty! Although it should have a content.
>>
>> What is the syntax error?
>
> I fixed the syntax error but the result shows:
>
>>>>
> [None]
> [None, None]
> [None, None, None]
> [None, None, None, None]
> [None, None, None, None, None]
> [None, None, None, None, None, None]
> [None, None, None, None, None, None, None]
> [None, None, None, None, None, None, None, None]
>>>>
>
>
> No error but if I don't call the content I posted up and call this as a content:
#content = "#C[Health] #P[Information]"
>
> result gives me #C[Health]
>
What does 'readfile.pattread()' return? Does it return a list of
strings? I'm guessing it does.
yes it reads a file of string similar to the one I posted above
Try printing each string you're trying to match using 'repr', i.e.:
print(repr(content[j]))
Do any look like they should match?
print(repr(content[j])) gives me the following:
[None]
'#D{#C[Health] #P[Information] - \n'
[None, None]
'means any information, including #ST[genetic information], \n'
[None, None, None]
'whether #C[oral | (recorded in (any form | medium))], that \n'
[None, None, None, None]
'(1)#C[Is created or received by] a \n'
[None, None, None, None, None]
'#A[health care provider | health plan | public health authority | employer |
life insurer | school | university | or health care clearinghouse]; \n'
[None, None, None, None, None, None]
'(2)#C[Relates to] #C[the past, present, or future physical | mental health |
condition of an individual] | \n'
[None, None, None, None, None, None, None]
'#C[the provision of health care to an individual] | \n'
[None, None, None, None, None, None, None, None]
'#C[the past, present, or future payment for the provision of health care to an
individual].}\n'
shouldn't it match "#C[Health]" in the first row? If not, what is the best way
to fetch these items in an array?
If one doesn't, but you think it should, post it here so that someone
can tell you why it doesn't! :-)