HI,
I have the need to perform a syntactical parsing of various legal documents with the result to identify and extract each article and sub-paragraph. The documents are text like: Act. 123 Bla Bla Bla Art. 1 (Article title) Article body with sub paragraph (at most three levels of sub paragraph identified by numbers (1, 2, 3...) and letters (a, b, c...) and roman literals (i, ii, iii, vi, etc.) Unfortunately the real life is a bit tougher than this, i.e. in some documents you have the string Art. in others Article; sometimes the Article title is present sometimes not, and so on. Do you think that ANTLR can help in generating a parser that identifies and extracts the parts of the legal documents labelling each part with the proper hierarchical structure? So far I am doing a prototype in PERL but taking into account all the possible variations that can be found in the plethora of documents I have to "ingest" it seems a quite cumbersome activity to code all the exceptions. Thanks for your support. Regards Marco Bagni List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---