On Mon, 18 Aug 2008 13:40:13 -0700, Alexnb wrote: > Lets say I have a text file. The contents look like this, only there is > A LOT of the same thing. > > () A registry mark given by underwriters (as at Lloyd's) to ships in > first-class condition. Inferior grades are indicated by A 2 and A 3. () > The first three letters of the alphabet, used for the whole alphabet. () > In church or chapel style; -- said of compositions sung in the old > church style, without instrumental accompaniment; as, a mass a capella, > i. e., a mass purely vocal. > () Astride; with a part on each side; -- used specif. in designating the > position of an army with the wings separated by some line of > demarcation, as a river or road. > > Now, I am talking 1000's of these. I need to do something like this. I > will have a number, and what I want to do is go through this text file, > just like the example. The trick is this, those "()'s" are what I need > to match, so if the number is 245 I need to find the 245th () and then > get the all the text from after it until the next (). If you have an > idea about the best way to do this I would love your help. If you made > it all the way through thanks! ;)
If I take your description of the problem literally, then the solution is: text = "() A registry mark given ..." # lots and lots of text blocks = text.split( "()" ) # use a literal "()" as a delimiter answer = blocks[n] # whichever number you want, starting counting at 0 I suspect that the problem is more complicated than you are saying. I guess that in your actual data, the brackets () probably have something inside them. It looks like you are quoting definitions from a dictionary. Alex, a word of advice for you: we really don't like playing guessing games. If you get a reputation for describing your problem inaccurately, incompletely or cryptically, you will find fewer and fewer people willing to answer your questions. I recommend that you spend a few minutes now reading this page and save yourself a lot of grief later: http://www.catb.org/~esr/faqs/smart-questions.html Now, back to your problem. If my guess is right, and the brackets actually have text inside them, then my simple solution above will not work. You will need a more complicated solution using a regular expression or a parser. That solution will depend on whether or not you can get nested brackets "(ab (123 (fee fi fum) 456) cd ef)" or arbitrary single brackets without the matching pair. Your question also sounds suspiciously like homework. I don't do people's homework, but here's something to get you started. It's not a solution, but it can be used as the first step towards a solution. text = "() A registry mark given ..." # lots and lots of text level = 0 blocks = [] for c in text: # process text one character at a time if c == '(': print "Found an opening bracket" level += 1 # one deeper in brackets elif c == ')': level -= 1 if level < 0: print "Found a close bracket without matching open bracket" else: print "Found a closing bracket" else: # any other character # here's where you do the real work if level == 0: print "Not inside a bracket" blocks.append(c) else: print "Inside a bracket" if level > 0: print "Missing close bracket" text_minus_bracketed_words = ''.join(blocks) -- Steven -- http://mail.python.org/mailman/listinfo/python-list