One possibility is that the .dtd URI referred to in the DOCTYPE is being
loaded from the web.  You can make a local copy and register it in an
XML catalog or otherwise provide it to a custom LSResourceResolver in
the DOMConfiguration attached to the LSParser.  (Even if not validating
against the DTD, you need to access it because it may define entities,
such as ' used in XHTML to represent the apostrophe character,
which might appear in the document.  If the delay is caused by the
retrieval of the DTD from the web, it should go away if the URI is
resolved to a local copy.

Jeff

On 8/2/2014 7:17 AM, John Jones wrote:
> Hi,
> I am : using Java 1.7 with xerces 2.11.0 in Eclipse.
>        ; inexperienced with  xerces and xml.
>        : have a simple  program (Which seems to work as intended) to
> pick out some required details from an xml document.
> 
> The problem I have is that there is a delay of around 20 seconds before
> the parsing completes. There is no significant network or cpu activity
> during the parse which suggested a network time-out. I have checked that
> the http references all exist.
> Consulting the web faqs suggested trying to ensure validation is turned off.
> I have tried to do this but it leads to an impossible cast.
> 
> So I now have three questions:
> The traffic on this mailing list seems quite light. Is this question
> appropriate for this mailing list? Can you think of a more appropriate
> forum?
> Can you suggest other likely reasons why there is a delay in processing
> a short XML document?
> Can you explain where I am going wrong in turning off validation?
> 
> The document is an html bank statement with just two transactions. It is
> suitable for input to Excel and there is no delay when opening it with
> Excel.
> The first line of the xml was added manually when xerces reported a bad
> UTF-8 character. It doesn't appear to affect Excel.
> The start of the document is:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
> <html xmlns="http://www.w3.org/1999/xhtml"; lang="es"
> xml:lang="es"><head><STYLE type="text/css">
> #CabeceraCuerpo {font-size:10pt;font-family:arial;}
> #CabeceraTitulo {font-size:14.0pt;font-family:Arial;text-align:left;}
> #CabeceraFechaDescarga {font-size:9pt;font-family:arial}
> #CabeceraCuerpoNoCuenta {font-size:12pt;font-family:arial;}
> #CabeceraSubTitulo {font-size:11pt;font-family:arial;}
> #CuerpoDetalleTitulo {font-size:9pt;font-family:arial;}
> #CuerpoDetalleDoble{width:60pt;font-size:10pt;font-family:arial;text-align:left;border-bottom:.5pt
> hairline silver}
> #CuerpoDetalle
> {font-size:10pt;font-family:arial;text-align:center;border-bottom:.5pt
> hairline silver}
> #TDNoWrappedLeft {text-align:left;white-space:nowrap;width:20pt}
> #TDNoWrappedLeftDoble {text-align:left;white-space:nowrap;width:100pt}
> #TDNoWrappedRight {text-align:right;white-space:nowrap}
> #TDNoWrappedTituloBorderBottom
> {text-align:left;white-space:nowrap;border-bottom:1pt solid windowtext}
> #TDNoWrappedColCuerpoTituloBorderBottom
> {text-align:center;white-space:nowrap;border-bottom:.5pt solid windowtext}
> #TDNoWrappedSubTituloBorderBottom
> {text-align:left;white-space:nowrap;border-bottom:.5pt solid windowtext}
> #TDSeparadorInicial {width:10pt}
> #TDSeparadorDoble {width:100pt; border-bottom:.5pt hairline silver}
> #TDTituloSeparadorBorderBotton {width:10pt;border-bottom:1pt solid
> windowtext}
> #TDTituloBorderBotton {border-bottom:1pt solid windowtext}
> #TDSubTituloSeparadorBorderBotton {width:10pt;border-bottom:.5pt solid
> windowtext}
> #TDSubTituloBorderBotton {border-bottom:.5pt solid windowtext}
> #TDListadoBorderBottom {border-bottom:.5pt hairline silver}
> </STYLE><meta content="text/html; charset=iso-8859-1"
> http-equiv="Content-Type" /></head><body><table><tr><td /><td
> style="TDSeparadorDoble">Transactions</td></tr><tr
> style="vertical-align:middle"><td id="TDSeparadorInicial" /><td
> id="TDNoWrappedLeftDoble"><font id="CabeceraCuerpo">XXXX XXXX XXXX 4271:
> </font></td><td /><td><font id="CuerpoDetalle">01/02/2014
> to
> 01/08/2014</font></td>
> 
> 
> The start of the  Java for opening and parsing the document was obtained
> from the web several years ago and I don't fully understand what it is
> doing but it works with a different program!   It is :
> 
> public class BaseXML
> {
> 
>     /** Default namespaces support (true). */
>     protected static final boolean DEFAULT_NAMESPACES = true;
> 
>     /** Default validation support (false). */
>     protected static final boolean DEFAULT_VALIDATION = false;
> 
>     /** Default Schema validation support (false). */
>     protected static final boolean DEFAULT_SCHEMA_VALIDATION = false;
> 
>     //Set false in the first constructor called. N.B not synchronised
> etc so can be fooled.
>     static boolean firstCaller=true;
> 
>     LSParser builder=null;
>     DOMImplementationRegistry registry = null;
>     DOMImplementationLS impl = null;
>     DOMConfiguration config = null;
>     DOMErrorHandler errorHandler = null;
>     LSParserFilter filter = null;
>     
>     HashMap<Object, Object> bookShelf;
>             
> 
>   public BaseXML()
>   {     if( ! firstCaller )
>         { System.err.println(" XML work class already initialised.
> (Fatal)");
>           System.exit(1);
>         }
>         firstCaller=false;
>         try {
>             // get DOM Implementation using DOM Registry
>            
> System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMXSImplementationSourceImpl");
>          //
> System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMImplementationSourceImpl");
>             System.out.println("DOM Impl");
>            
> System.out.print(System.getProperty(DOMImplementationRegistry.PROPERTY ));
> 
>             registry = DOMImplementationRegistry.newInstance();
> 
>             impl = (DOMImplementationLS)registry.getDOMImplementation("LS");
> 
>             // create DOMBuilder
>             builder =
> impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null);
>             
>             config = builder.getDomConfig();
> 
>             // create Error Handler
>             errorHandler = new Handlers();
>             // create filter
>             filter = new Handlers();
>             
>             builder.setFilter(filter);
>             try { ((org.apache.xerces.parsers.SAXParser)
> builder).setFeature("http://xml.org/sax/features/validation";, false); 
>             } catch (org.xml.sax.SAXException e) 
>             { 
>             System.out.println("error in setting up parser feature");
>             e.toString();
>             }
>             // set error handler
>             config.setParameter("error-handler", errorHandler);
> 
>             // set validation feature
>             config.setParameter("validate",Boolean.TRUE);
>             
>             // set schema language
>             config.setParameter("schema-type",
> "http://www.w3.org/2001/XMLSchema";);
>           
>         } catch ( Exception ex ) {
>             ex.printStackTrace();
>         }
>         bookShelf=new HashMap<Object, Object>();
>   }
> 
> 
> A bit later on is the routine I call to parse the input document:
> 
>   public boolean openXMLFile(String bookName, String FileName)
>   {
>     Document doc=null;
>     try
>     { doc = builder.parseURI(FileName);
>     }catch (Exception e)
>     { System.err.println(" Exception during parse of
> "+bookName+":"+e.getMessage());
>     };
>     if( doc != null )
>     { bookShelf.put(bookName,doc);
>       return true;
>     }else
>     { return false;
>     }
>   }
> 
> 
> The delay is during the call to parseURI.
> The exception that arises when trying to turn off validation as
> suggested on the web is :
> 
> org.apache.xerces.dom.DOMXSImplementationSourceImpljava.lang.ClassCastException:
> org.apache.xerces.parsers.DOMParserImpl cannot be cast to
> org.apache.xerces.parsers.SAXParser
> at BaseXML.<init>(BaseXML.java:80)
> at SanPost.<clinit>(SanPost.java:45)
> 
> 
> Regards
> 
> John


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to