On Nov 11, 1:59 pm, André <[EMAIL PROTECTED]> wrote: > Hi everyone, > > I would like to implement a parser for a mini-language > and would appreciate some pointers. The type of > text I would like to parse is an extension of: > > http://www.websequencediagrams.com/examples.html > > For those that don't want to go to the link, consider > the following, *very* simplified, example: > ======= > > programmer Guido > programmer "Fredrik Lundh" as effbot > programmer "Alex Martelli" as martellibot > programmer "Tim Peters" as timbot > note left of effbot: cutting sense of humor > note over martellibot: > Offers detailed note, explaining a problem, > accompanied by culinary diversion > to the delight of the reader > note over timbot: programmer "clever" as fox > timbot -> Guido: I give you doctest > Guido --> timbot: Have you checked my time machine? > > ======= > From this, I would like to be able to extract > ("programmer", "Guido") > ("programmer as", "Fredrik Lundh", "effbot") > ... > ("note left of", "effbot", "cutting sense of humor") > ("note over", "martellibot", "Offers...") > ("note over", "timbot", 'programmer "clever" as fox') >
Even if you choose not to use pyparsing, a pyparsing example might give you some insights into your problem. See how the grammar is built up from separate pieces. Parse actions in pyparsing implement callbacks to do parse-time conversion - in this case, the multiline note body is converted from the parsed list of separate strings into a single newline-separated string. Here is the pyparsing example: from pyparsing import Suppress, Combine, LineEnd, Word, alphas, alphanums,\ quotedString, Keyword, Optional, oneOf, restOfLine, indentedBlock, \ removeQuotes,empty,OneOrMore,Group # used to manage indentation levels when parsing indented blocks indentstack = [1] # define some basic punctuation and terminal words COLON = Suppress(":") ARROW = Combine(Word('-')+'>') NL = LineEnd().suppress() ident = Word(alphas,alphanums+"-_") quotedString.setParseAction(removeQuotes) # programmer definition progDefn = Keyword("programmer") + Optional(quotedString("alias") + \ Optional("as")) + ident("name") # new pyparsing idiom - embed simple asserts to verify bits of the # overall grammar in isolation assert "programmer Guido" == progDefn assert 'programmer "Tim Peters" as timbot' == progDefn # note specification - only complicated part is the indented block # form of the note we use a pyparsing parse action to convert the # nested token lists into a multiline string OF = Optional("of") notelocn = oneOf("over under") | "left" + OF | "right" + OF notetext = restOfLine.setName("notetext") noteblock = indentedBlock(notetext, indentstack).setName("noteblock") noteblock.setParseAction(lambda t:'\n'.join(tt[0] for tt in t[0])) note = Keyword("note") + notelocn("location") + ident("subject") + COLON + \ (~NL + empty + notetext("note") | noteblock("note") ) assert 'note over timbot: programmer "clever" as fox ' == note # message definition msg = ident("from") + ARROW + ident("to") + COLON + empty + notetext ("note") assert 'Guido --> timbot: Have you checked my time machine?' == msg # a seqstatement is one of these 3 types of statements seqStatement = progDefn | note | msg # parse the sample text parsedStatements = OneOrMore(Group(seqStatement)).parseString(seqtext) # print out token/field dumps for each statement for s in parsedStatements: print s.dump() Prints: ['programmer', 'Guido'] - name: Guido ['programmer', 'Fredrik Lundh', 'as', 'effbot'] - alias: Fredrik Lundh - name: effbot ['programmer', 'Alex Martelli', 'as', 'martellibot'] - alias: Alex Martelli - name: martellibot ['programmer', 'Tim Peters', 'as', 'timbot'] - alias: Tim Peters - name: timbot ['note', 'left', 'of', 'effbot', 'cutting sense of humor '] - location: left - note: cutting sense of humor - subject: effbot ['note', 'over', 'martellibot', 'Offers ...'] - location: over - note: Offers detailed note, explaining a problem, accompanied by culinary diversion to the delight of the reader - subject: martellibot ['note', 'over', 'timbot', 'programmer "clever" as fox '] - location: over - note: programmer "clever" as fox - subject: timbot ['timbot', '->', 'Guido', 'I give you doctest '] - from: timbot - note: I give you doctest - to: Guido ['Guido', '-->', 'timbot', 'Have you checked my time machine?'] - from: Guido - note: Have you checked my time machine? - to: timbot Best of luck in your project, -- Paul -- http://mail.python.org/mailman/listinfo/python-list