I'm trying to pack to leave town for a week. I'll try to post something, but Sunday will be the earliest since I'll be spending most of tomorrow in a small metal tube at 20,000 feet in the air.
Jim Howard Lewis Ship wrote: > I've been struggling to create a monadic parser. I'm parsing XML > documents into a DOM tree. Step one is to use a SAX parser to create > a stream of XML tokens; each a struct map. The token types are > :start-element, :attribute, :text and :end-element. > > Next I'm feeding this list of tokens into a monadic parser I've > created following > http://intensivesystems.net/tutorials/monads_101.html. The output > should be a DOM tree structure; element nodes will contain an > :attributes key (containing attribute tokens) and a :body key > containing child text and elements nodes. > > I've had to make some minor changes to the tutorial's parser-m monad, > to allow for the fact that I'm parsing tokens, not character strings. > > As I parse :attribute elements, I take the original element node and > add the attribute token to the :attributes list inside the element > node struct, creating the vector as necessary. > > I have this working for the simple case, where I only allow for > at-most one attribute token. > > (defmonad parser-m > [m-result (fn [x] > (fn [tokens] > (list x tokens))) > > m-bind (fn [parser action] > (fn [tokens] > (let [result (parser tokens)] > (when-not (nil? result) > ((action (first result)) > (second result)))))) > > m-zero (fn [tokens] > nil) > > m-plus (fn [& parsers] > (fn [tokens] > (first > (drop-while nil? > (map #(% tokens) parsers)))))]) > > ; And some actions and parser generators > > (defn any-token > "Fundamental parser action: returns [first, rest] if tokens is not > empty, nil otherwise." > [tokens] > (if (empty? tokens) > nil > ; This is what actually "consumes" the tokens seq > (list (first tokens) (rest tokens)))) > > ; Utilities that will likely move elsewhere > > (defn add-to-key-list > "Updates the map adding the value to the list stored in the indicated key." > [map key value] > (update-in map [key] #(conj (or % []) value))) > > (with-monad > parser-m > > (defn token-test > "Parser factory using a predicate. When a token matches the > predicate, it becomes > the new result." > [pred] > (domonad > [t any-token :when (pred t)] > ; return the matched token > t)) > > (defn match-type > "Parser factory that matches a particular token type (making the matched > token the result), or returns nil." > [type] > (token-test #(= (% :type) type))) > > (defn optional [parser] > (m-plus parser (m-result nil))) > > (declare element one-or-more) > > (defn none-or-more [parser] > (optional (one-or-more parser))) > > (defn one-or-more [parser] > (domonad [a parser > as (none-or-more parser)] > (cons a as))) > > (defn attribute > "Parser generator that matches attribute tokens, adding them to > :attributes key of the element-node, and returning the new > element-node." > [element-node] > (domonad [attribute-token (match-type :attribute)] > (add-to-key-list element-node :attributes attribute-token))) > > (defn text > "Parser generator that matches text tokens, adding them to the :body > key of the element-node, and returning the new element-node." > [element-node] > (domonad [text-token (match-type :text)] > (add-to-key-list element-node :body text-token))) > > (def match-first m-plus) > > (defn body-token > [element-node] > (match-first > (attribute element-node) > (text element-node) > (element element-node))) > > (defn process-body > [element-node] > (domonad [modified-element (optional (body-token element-node))] > (or modified-element element-node))) > > (defn element > "Parses an element token into an element-node, and then adds the > fully constructed element-node to the > :body of the containing element node." > [container-element-node] > (domonad [token (match-type :start-element) > element-node (m-result (struct element-node :element token)) > assembled-node (process-body element-node) > _ (match-type :end-element)] > (add-to-key-list container-element-node :body assembled-node))) > > > (def template > ; Creates a psuedo-element (an empty map) and parses the root > element into it, then extracts the root element. > ; Will eventually have to expand to support text and comments, > etc., around the root element." > (domonad [root (element {})] > (first (root :body)))) > ) > > > Here's the problem: I need to be able to handle multiple attribute > tokens, as well as :text and sub-elements. To me, that means that > process-body should be recursive, but we need to pass each > modification of the element on each iteration (to accumlate additions > to the :attributes and :body keys). > > However, when I do the following: > > (defn process-body > [element-node] > (domonad [modified-element (optional (body-token element-node)) > :when modified-element > final-element (process-body modified-element)] > (or final-element modified-element element-node))) > > Which, to me, says "parse a body token and apply changes to the > original element-node, then recurse; if the current token is not a > body token, then abort, returning the initial element unchanged." > > Unfortunately, this breaks the parser entirely, with nothing parsing, > not even a trivial document of just a root element with no attributes. > > I've tried variations with (none-or-more), but what happens is the > initial element-node appears to be passed to the (body-token) parser > repeatedly with the end result being that only the final change (the > last attribute node in my current test) is recorded, and the prior > changes are lost. > > Perhaps I'm misunderstanding what the :when guard does? > > I think I'm missing something subtle here (well, obviously, it doesn't > work!) but my head keeps spinning a little. Within a parser monad, > it's easy to get tripped up by parsers (monadic functions: accept > state and return actions), actions (monadic values: functions that > accept state and return result and new state) and parser generators > (that return parsers). > > Thanks! > > -- > Howard M. Lewis Ship > > Creator of Apache Tapestry > Director of Open Source Technology at Formos --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---