I'm trying to pack to leave town for a week. I'll try to post
something, but Sunday will be the earliest since I'll be spending most
of tomorrow in a small metal tube at 20,000 feet in the air.

Jim

Howard Lewis Ship wrote:
> I've been struggling to create a monadic parser.  I'm parsing XML
> documents into a DOM tree.  Step one is to use a SAX parser to create
> a stream of XML tokens; each a struct map.  The token types are
> :start-element, :attribute, :text and :end-element.
>
> Next I'm feeding this list of tokens into a monadic parser I've
> created following
> http://intensivesystems.net/tutorials/monads_101.html. The output
> should be a DOM tree structure; element nodes will contain an
> :attributes key (containing attribute tokens) and a :body key
> containing child text and elements nodes.
>
> I've had to make some minor changes to the tutorial's parser-m monad,
> to allow for the fact that I'm parsing tokens, not character strings.
>
> As I parse :attribute elements, I take the original element node and
> add the attribute token to the :attributes list inside the element
> node struct, creating the vector as necessary.
>
> I have this working for the simple case, where I only allow for
> at-most one attribute token.
>
> (defmonad parser-m
>           [m-result (fn [x]
>                         (fn [tokens]
>                             (list x tokens)))
>
>            m-bind (fn [parser action]
>                       (fn [tokens]
>                           (let [result (parser tokens)]
>                                (when-not (nil? result)
>                                          ((action (first result))
> (second result))))))
>
>            m-zero (fn [tokens]
>                       nil)
>
>            m-plus (fn [& parsers]
>                       (fn [tokens]
>                           (first
>                             (drop-while nil?
>                                         (map #(% tokens) parsers)))))])
>
> ; And some actions and parser generators
>
> (defn any-token
>   "Fundamental parser action: returns [first, rest] if tokens is not
> empty, nil otherwise."
>   [tokens]
>   (if (empty? tokens)
>       nil
>       ; This is what actually "consumes" the tokens seq
>       (list (first tokens) (rest tokens))))
>
> ; Utilities that will likely move elsewhere
>
> (defn add-to-key-list
>   "Updates the map adding the value to the list stored in the indicated key."
>   [map key value]
>   (update-in map [key] #(conj (or % []) value)))
>
> (with-monad
>   parser-m
>
>   (defn token-test
>     "Parser factory using a predicate. When a token matches the
> predicate, it becomes
>   the new result."
>     [pred]
>     (domonad
>       [t any-token :when (pred t)]
>       ; return the matched token
>       t))
>
>   (defn match-type
>     "Parser factory that matches a particular token type (making the matched
> token the result), or returns nil."
>     [type]
>     (token-test #(= (% :type) type)))
>
>   (defn optional [parser]
>     (m-plus parser (m-result nil)))
>
>   (declare element one-or-more)
>
>   (defn none-or-more [parser]
>     (optional (one-or-more parser)))
>
>   (defn one-or-more [parser]
>     (domonad [a parser
>               as (none-or-more parser)]
>              (cons a as)))
>
>   (defn attribute
>     "Parser generator that matches attribute tokens, adding them to
>     :attributes key of the element-node, and returning the new
>     element-node."
>     [element-node]
>     (domonad [attribute-token (match-type :attribute)]
>              (add-to-key-list element-node :attributes attribute-token)))
>
>   (defn text
>     "Parser generator that matches text tokens, adding them to the :body
> key of the element-node, and returning the new element-node."
>     [element-node]
>     (domonad [text-token (match-type :text)]
>              (add-to-key-list element-node :body text-token)))
>
>   (def match-first m-plus)
>
>   (defn body-token
>     [element-node]
>       (match-first
>         (attribute element-node)
>         (text element-node)
>         (element element-node)))
>
>   (defn process-body
>     [element-node]
>     (domonad [modified-element (optional (body-token element-node))]
>              (or modified-element element-node)))
>
>   (defn element
>     "Parses an element token into an element-node, and then adds the
> fully constructed element-node to the
>     :body of the containing element node."
>     [container-element-node]
>     (domonad [token (match-type :start-element)
>               element-node (m-result (struct element-node :element token))
>               assembled-node (process-body element-node)
>               _ (match-type :end-element)]
>              (add-to-key-list container-element-node :body assembled-node)))
>
>
>   (def template
>     ; Creates a psuedo-element (an empty map) and parses the root
> element into it, then extracts the root element.
>     ; Will eventually have to expand to support text and comments,
> etc., around the root element."
>     (domonad [root (element {})]
>              (first (root :body))))
>   )
>
>
> Here's the problem: I need to be able to handle multiple attribute
> tokens, as well as :text and sub-elements. To me, that means that
> process-body should be recursive, but we need to pass each
> modification of the element on each iteration (to accumlate additions
> to the :attributes and :body keys).
>
> However, when I do the following:
>
>   (defn process-body
>     [element-node]
>     (domonad [modified-element (optional (body-token element-node))
>               :when modified-element
>               final-element (process-body modified-element)]
>              (or final-element modified-element element-node)))
>
> Which, to me, says "parse a body token and apply changes to the
> original element-node, then recurse; if the current token is not a
> body token, then abort, returning the initial element unchanged."
>
> Unfortunately, this breaks the parser entirely, with nothing parsing,
> not even a trivial document of just a root element with no attributes.
>
> I've tried variations with (none-or-more), but what happens is the
> initial element-node appears to be passed to the (body-token) parser
> repeatedly with the end result being that only the final change (the
> last attribute node in my current test) is recorded, and the prior
> changes are lost.
>
> Perhaps I'm misunderstanding what the :when guard does?
>
> I think I'm missing something subtle here (well, obviously, it doesn't
> work!) but my head keeps spinning a little. Within a parser monad,
> it's easy to get tripped up by parsers (monadic functions: accept
> state and return actions), actions (monadic values: functions that
> accept state and return result and new state) and parser generators
> (that return parsers).
>
> Thanks!
>
> --
> Howard M. Lewis Ship
>
> Creator of Apache Tapestry
> Director of Open Source Technology at Formos
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to