I have a text as, 

"Hawaii volcano generates toxic gas plume called laze PAHOA: The eruption of 
Kilauea volcano in Hawaii sparked new safety warnings about toxic gas on the 
Big Island's southern coastline after lava began flowing into the ocean and 
setting off a chemical reaction. Lava haze is made of dense white clouds of 
steam, toxic gas and tiny shards of volcanic glass. Janet Babb, a geologist 
with the Hawaiian Volcano Observatory, says the plume "looks innocuous, but 
it's not." "Just like if you drop a glass on your kitchen floor, there's some 
large pieces and there are some very, very tiny pieces," Babb said. "These 
little tiny pieces are the ones that can get wafted up in that steam plume." 
Scientists call the glass Limu O Pele, or Pele's seaweed, named after the 
Hawaiian goddess of volcano and fire"

and I want to see its tagged output as,

"Hawaii/TAG volcano generates toxic gas plume called laze PAHOA/TAG: The 
eruption of Kilauea/TAG volcano/TAG in Hawaii/TAG sparked new safety warnings 
about toxic gas on the Big Island's southern coastline after lava began flowing 
into the ocean and setting off a chemical reaction. Lava haze is made of dense 
white clouds of steam, toxic gas and tiny shards of volcanic glass. Janet/TAG 
Babb/TAG, a geologist with the Hawaiian/TAG Volcano/TAG Observatory/TAG, says 
the plume "looks innocuous, but it's not." "Just like if you drop a glass on 
your kitchen floor, there's some large pieces and there are some very, very 
tiny pieces," Babb/TAG said. "These little tiny pieces are the ones that can 
get wafted up in that steam plume." Scientists call the glass Limu/TAG O/TAG 
Pele/TAG, or Pele's seaweed, named after the Hawaiian goddess of volcano and 
fire"

To do this I generally try to take a list at the back end as, 

Hawaii
PAHOA
Kilauea 
volcano 
Janet 
Babb
Hawaiian 
Volcano 
Observatory
Babb 
Limu 
O 
Pele

and do a simple code as follows, 

def tag_text():
    corpus=open("/python27/volcanotxt.txt","r").read().split()
    wordlist=open("/python27/taglist.txt","r").read().split()
    list1=[]
    for word in corpus:
        if word in wordlist:
            word_new=word+"/TAG"
            list1.append(word_new)
        else:
            list1.append(word)
    lst1=list1
    tagged_text=" ".join(lst1)
    print tagged_text

get the results and hand repair unwanted tags Hawaiian/TAG goddess of 
volcano/TAG.

I am looking for a better approach of coding so that I need not spend time on 
hand repairing.

Here, corpus i.e., the volcanoxt is the untagged text given in the first and 
the wordlist, i.e., taglist
is list of words given just above the code. 

I am using Python2.7.15 on MS-Windows 7.

If any one may kindly suggest a solution.

Thanks in advance. 





-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to