On Thu, May 21, 2009 at 7:03 PM, John Fouhy <j...@fouhy.net> wrote: > 2009/5/22 Eduardo Vieira <eduardo.su...@gmail.com>: >> I will be looking for lines like these: >> Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23 >> >> So, references in different chapters are separated by a semicolon. My >> main challenge would be make the program guess that 10:12 refers to >> the previous book. 15-20 means verses 15 thru 20 inclusive. I'm afraid >> that will take more than Regex and I never studied anything about >> parser tools, really. > > Well, pyparsing is one of the standard python parsing modules. It's > not that bad, really :-) > > Here's some code I knocked out: > > from pyparsing import * > > SingleVerse = Word(nums) > VerseRange = SingleVerse + '-' + SingleVerse > Verse = VerseRange | SingleVerse > Verse = Verse.setResultsName('Verse').setName('Verse') > Verses = Verse + ZeroOrMore(Suppress(',') + Verse) > Verses = Verses.setResultsName('Verses').setName('Verses') > > ChapterNum = Word(nums) > ChapterNum = ChapterNum.setResultsName('Chapter').setName('Chapter') > ChapVerses = ChapterNum + ':' + Verses > SingleChapter = Group(ChapVerses | ChapterNum) > > Chapters = SingleChapter + ZeroOrMore(Suppress(';') + SingleChapter) > Chapters = Chapters.setResultsName('Chapters').setName('Chapters') > > BookName = CaselessLiteral('Acts') | CaselessLiteral('Psalm') | > CaselessLiteral('John') > BookName = BookName.setResultsName('Book').setName('Book') > > Book = Group(BookName + Chapters) > Books = Book + ZeroOrMore(Suppress(';') + Book) > Books = Books.setResultsName('Books').setName('Books') > > All = CaselessLiteral('Lesson Text:') + Books + LineEnd() > > s = 'Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23' > res = All.parseString(s) > > for b in res.Books: > for c in b.Chapters: > if c.Verses: > for v in c.Verses: > print 'Book', b[0], 'Chapter', c[0], 'Verse', v > else: > print 'Book', b[0], 'Chapter', c[0] > > ###### > > Hopefully you can get the idea of most of it from looking at the code. > > Suppress() means "parse this token, but don't include it in the results". > > Group() is necessary for getting access to a list of things -- you can > experiment by taking it out and seeing what you get. > > Obviously you'll need to add more names to the BookName element. > > Obviously also, there is a bit more work to be done on Verses. You > might want to look into the concept of "parse actions". A really > simple parse action might be this: > > def convertToNumber(string_, location, tokens): > """ Used in setParseAction to make numeric parsers return numbers. """ > > return [int(tokens[0])] > > SingleVerse.setParseAction(convertToNumber) > ChapterNum.setParseAction(convertToNumber) > > That should get you python integers instead of strings. You can > probably do more with parseActions to, for instance, turn something > like '15-20' into [15,16,17,18,19,20]. > > HTH! > > -- > John. > Thanks for the thorough example, I guess I really should get into this thing of parsing somehow. To W W. I guess that approach can work too. I will study both things and if I get stumped, I'll try the list again. It will take a while for me to really delve into the task, but I want to do it for a good friend of mine.
Eduardo _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor