After further investigation, I modified my copy of the xml parsing code to
wrap either an InputStream or a Reader with an org.xml.sax.InputSource (and
changed the type hint):

(defn ^:private sax-parse-fn
  [xml-input content-handler]
  (let [input-source (cond
                       (or (instance? InputStream xml-input)
                           (instance? Reader xml-input))
(org.xml.sax.InputSource. xml-input)
                       (instance? org.xml.sax.InputSource xml-input)
xml-input
                       :else (throw (ex-info "sax-parse-fn: xml-input must
be one of InputStream, Reader, or org.xml.sax.InputSource"
                                      {:type  (type xml-input)
                                       :class (class xml-input)})))]
    (it-> (SAXParserFactory/newInstance)
      (doto it
        (.setValidating false)
        (.setFeature "http://xml.org/sax/features/external-general-entities";
false)
        (.setFeature "
http://xml.org/sax/features/external-parameter-entities"; false))
      (.newSAXParser it)
      (doto it
        (.setProperty "http://xml.org/sax/properties/lexical-handler";
content-handler))
      (.parse it
        ^org.xml.sax.InputSource             input-source
        ^org.xml.sax.helpers.DefaultHandler  content-handler))))

(s/defn parse       ; #todo fix docstring
  ([xml-input] (parse xml-input sax-parse-fn))
  ([xml-input parse-fn]
    (let [result-atom     (atom (xml-zip {:type :document :content nil}))
          content-handler (handler result-atom)]
      (parse-fn xml-input content-handler)
      ; #todo document logic vvv using xkcd & plain xml example
      (let [parsed-data (it-> @result-atom
                          (first it)
                          (:content it)
                          (drop-if #(= :dtd (:type %)) it)
                          (drop-if #(string? %) it)
                          (only it))]
        parsed-data))))






On Sat, Mar 2, 2019 at 4:11 PM Matching Socks <phill.w...@gmail.com> wrote:

> Does this need adjusting in clojure.xml too?  The code looks pretty
> similar:
>
> (defn startparse-sax [s ch]
>   (.. SAXParserFactory (newInstance) (newSAXParser) (parse s ch)))
>
> The reflection on "parse" is convenient. There are multiple
> SaxParser.parse methods with unique capabilities. The String method allows
> you not to know how the data is encoded.  To open an InputStream, you have
> to know the encoding.  It's pretty hard to open an XML instance correctly!
> And the InputSource method allows you to parse from a Reader, e.g., a
> StringReader.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to