I suggest StAX instead of SAX for this kind of transformation.
It provides both reader and writer API's so can be used for both parsing and
generation/serialization.


----------------------------------------
David A. Lee
d...@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Dawid Chodura [mailto:dawid.chod...@gmail.com] 
Sent: Tuesday, November 23, 2010 6:12 AM
To: j-users@xerces.apache.org
Subject: Transforming the stream of SAX events


Hello,
   I want to transform an XML document, but I can't use XSLT, because
I need to invoke Java code inside the transformation. If I understand
correctly, Xalan is not an option. I don't need to keep the whole XML
document in the memory for the transformation, so I decided to use SAX
parser instead of DOM. I need to create new elements in the
transformation.
   I read the sample xerces-2_10_0/samples/sax/Writer.java and it
generates the output document manually:

public void startElement(String uri, String local, String raw,
Attributes attrs) throws SAXException {
//...
    fOut.print('<');
    fOut.print(raw);
//...
    fOut.print('>');
    fOut.flush();
}

   I don't want to generate the output manually.
   I wrote my own example, which uses
org.cyberneko.html.parsers.SAXParser from NekoHTML parser:

package saxtransformexample;

import org.apache.xerces.util.AugmentationsImpl;
import org.apache.xerces.util.XMLAttributesImpl;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.QName;
import org.apache.xerces.xni.XMLAttributes;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLDocumentFilter;
import org.cyberneko.html.filters.DefaultFilter;
import org.cyberneko.html.parsers.SAXParser;
import org.xml.sax.InputSource;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;

public class SAXTransformExample {

    public static void main(String args[]) throws Exception {
        String inputString = "<div></div>";
        StringWriter out = new StringWriter();
        StreamResult result = new StreamResult(out);

        SAXTransformerFactory transformerFactory =
(SAXTransformerFactory) SAXTransformerFactory.newInstance();

        Transformer transformer = transformerFactory.newTransformer();

        transformer.setOutputProperty(OutputKeys.INDENT, "no");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "html");

        XMLDocumentFilter[] filters = {new DefaultFilter() {

            @Override
            public void startElement(QName element, XMLAttributes
attributes, Augmentations augs) throws XNIException {
                if (!element.localpart.toLowerCase().equals("div")) {
                    super.startElement(element, attributes, augs);
                } else {
                    super.startElement(element, attributes, augs);
                    super.startElement(new QName("", "p", "p", null),
new XMLAttributesImpl(), new AugmentationsImpl());
                }
            }

            @Override
            public void endElement(QName element, Augmentations augs)
throws XNIException {
                if (!element.localpart.toLowerCase().equals("div")) {
                    super.endElement(element, augs);
                } else {
                    super.endElement(new QName("", "p", "p", null),
new AugmentationsImpl());
                    super.endElement(element, augs);
                }
            }
        }};
        SAXParser parser = new SAXParser();
        parser.setFeature("http://xml.org/sax/features/namespaces";, false);
 
parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-
fragment",
true);
        parser.setProperty("http://cyberneko.org/html/properties/filters";,
filters);
 
parser.setProperty("http://cyberneko.org/html/properties/names/elems";,
"lower");

        transformer.transform(new SAXSource(parser, new
InputSource(new StringReader(inputString))), result);

        System.out.println("RESULT:" + out.getBuffer().toString() + ":");
    }
}

   It prints out:
RESULT:<div><p></p></div>

   The problem is that it uses XNI and since I'm not writing a parser
I think I shouldn't use XNI at all.

There is an example:
http://book.javanb.com/xml-and-java-developing-web-applications-2nd/02017700
40_ch05lev1sec2.html

       OutputFormat format = new OutputFormat("xml", "UTF-8", false);
       format.setPreserveSpace(true);
       ContentHandler handler = new XMLSerializer(System.out, format);
       XMLReader parser =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
       XMLReader filter = new MailFilter(parser);
       filter.setContentHandler(handler);
       filter.parse(argv[0]);

MailFilter extends org.xml.sax.helpers.XMLFilterImpl.
The example uses org.apache.xml.serialize.XMLSerializer, which is
deprecated in Xerces 2.9.0 API:

Deprecated. This class was deprecated in Xerces 2.9.0. It is
recommended that new applications use the DOM Level 3 LSSerializer or
JAXP's Transformation API
for XML (TrAX) for serializing XML. See the Xerces documentation for
more information.
http://xerces.apache.org/xerces2-j/javadocs/other/org/apache/xml/serialize/X
MLSerializer.html

   If I don't want to use DOM, I assume I can't use DOM Level 3
LSSerializer.
   If I don't want to use XSLT, I assume I can't use JAXP's
Transformation API for XML (TrAX) for serializing XML.

   What is the proper way to transform the stream of SAX events into
another stream of SAX events, so that I don't need to write my own
parser or my own serializer?

Best regards,
   Dawid Chodura

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to