Hey Brian, It seems the text you are trying to parse is similar to XML/HTML. So I'd use BeautifulSoup[1] if I were you :)
here's a sample code for your scraping case: from BeautifulSoup import BeautifulSoup <python> # assume the s variable has your text s = "whatever xml or html here" # turn it into a tasty & parsable soup :) soup = BeautifulSoup(s) # for every element tag in the soup for el in soup.findAll("element"): # print out its tag & name attribute plus its inner value! print el["tag"], el["name"], el.string </python> that's it! [1] http://www.crummy.com/software/BeautifulSoup/ On 8 mar, 14:49, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > I have a large file that has many lines like this, > > <element tag="300a,0014" vr="CS" vm="1" len="4" > name="DoseReferenceStructureType">SITE</element> > > I would like to identify the line by the tag (300a,0014) and then grab > the name (DoseReferenceStructureType) and value (SITE). > > I would like to create a file that would have the structure, > > DoseReferenceStructureType = Site > ... > ... > > Also, there is a possibility that there are multiple lines with the > same tag, but different values. These all need to be recorded. > > So far, I have a little bit of code to look at everything that is > available, > > for line in open(str(sys.argv[1])): > i_line = line.split() > if i_line: > if i_line[0] == "<element": > a = i_line[1] > b = i_line[5] > print "%s | %s" %(a, b) > > but do not see a clever way of doing what I would like. > > Any help or guidance would be appreciated. > > Bryan -- http://mail.python.org/mailman/listinfo/python-list