On Jul 17, 8:47 am, Xah Lee <xah...@gmail.com> wrote: > 2011-07-16 > > folks, this one will be interesting one. > > the problem is to write a script that can check a dir of text files > (and all subdirs) and reports if a file has any mismatched matching > brackets. > > • The files will be utf-8 encoded (unix style line ending). > > • If a file has mismatched matching-pairs, the script will display the > file name, and the line number and column number of the first > instance where a mismatched bracket occures. (or, just the char number > instead (as in emacs's “point”)) > > • the matching pairs are all single unicode chars. They are these and > nothing else: () {} [] “” ‹› «» 【】 〈〉 《》 「」 『』 > Note that ‘single curly quote’ is not consider matching pair here. > > • You script must be standalone. Must not be using some parser tools. > But can call lib that's part of standard distribution in your lang. > > Here's a example of mismatched bracket: ([)], (“[[”), ((, 】etc. (and > yes, the brackets may be nested. There are usually text between these > chars.) > > I'll be writing a emacs lisp solution and post in 2 days. Ι welcome > other lang implementations. In particular, perl, python, php, ruby, > tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp > (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure, > Scala, C, C++, or any others, are all welcome, but i won't be able to > eval it. javascript implementation will be very interesting too, but > please indicate which and where to install the command line version. > > I hope you'll find this a interesting “challenge”. This is a parsing > problem. I haven't studied parsers except some Wikipedia reading, so > my solution will probably be naive. I hope to see and learn from your > solution too. > > i hope you'll participate. Just post solution here. Thanks. > > Xah
Parsing technology based on BNF enables an elegant solution. First take a basic bracket balancing program which parenthesises the contents of the input. e.g. in Shen-YACC (defcc <br> "(" <br> ")" <br$> := [<br> | <br$>]; <item> <br>; <e> := [];) (defcc <br$> <br>;) (defcc <item> -*- := (if (element? -*- ["(" ")"]) (fail) [-*-]);) Given (compile <br> ["(" 1 2 3 ")" 4]) the program produces [[1 2 3] 4]. When this program is used to parse the input, whatever residue is left indicates where the parse has failed. In Shen-YACC (define tellme Stuff -> (let Br (<br> (@p Stuff [])) Residue (fst Br) (if (empty? Residue) (snd Br) (error "parse failure at position ~A~%" (- (length Stuff) (length Residue)))))) e.g. (tellme ["(" 1 2 3 ")" "(" 4]) parse failure at position 5 (tellme ["(" 1 2 3 ")" "(" ")" 4]) [[1 2 3] [] 4] The extension of this program to the case described is fairly simple. Qi-YACC is very similar. Nice problem. I do not have further time to correspond right now. Mark -- http://mail.python.org/mailman/listinfo/python-list