Hi, I am : using Java 1.7 with xerces 2.11.0 in Eclipse. ; inexperienced with xerces and xml. : have a simple program (Which seems to work as intended) to pick out some required details from an xml document.
The problem I have is that there is a delay of around 20 seconds before the parsing completes. There is no significant network or cpu activity during the parse which suggested a network time-out. I have checked that the http references all exist. Consulting the web faqs suggested trying to ensure validation is turned off. I have tried to do this but it leads to an impossible cast. So I now have three questions: The traffic on this mailing list seems quite light. Is this question appropriate for this mailing list? Can you think of a more appropriate forum? Can you suggest other likely reasons why there is a delay in processing a short XML document? Can you explain where I am going wrong in turning off validation? The document is an html bank statement with just two transactions. It is suitable for input to Excel and there is no delay when opening it with Excel. The first line of the xml was added manually when xerces reported a bad UTF-8 character. It doesn't appear to affect Excel. The start of the document is: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="es" xml:lang="es"><head><STYLE type="text/css"> #CabeceraCuerpo {font-size:10pt;font-family:arial;} #CabeceraTitulo {font-size:14.0pt;font-family:Arial;text-align:left;} #CabeceraFechaDescarga {font-size:9pt;font-family:arial} #CabeceraCuerpoNoCuenta {font-size:12pt;font-family:arial;} #CabeceraSubTitulo {font-size:11pt;font-family:arial;} #CuerpoDetalleTitulo {font-size:9pt;font-family:arial;} #CuerpoDetalleDoble{width:60pt;font-size:10pt;font-family:arial;text-align:left;border-bottom:.5pt hairline silver} #CuerpoDetalle {font-size:10pt;font-family:arial;text-align:center;border-bottom:.5pt hairline silver} #TDNoWrappedLeft {text-align:left;white-space:nowrap;width:20pt} #TDNoWrappedLeftDoble {text-align:left;white-space:nowrap;width:100pt} #TDNoWrappedRight {text-align:right;white-space:nowrap} #TDNoWrappedTituloBorderBottom {text-align:left;white-space:nowrap;border-bottom:1pt solid windowtext} #TDNoWrappedColCuerpoTituloBorderBottom {text-align:center;white-space:nowrap;border-bottom:.5pt solid windowtext} #TDNoWrappedSubTituloBorderBottom {text-align:left;white-space:nowrap;border-bottom:.5pt solid windowtext} #TDSeparadorInicial {width:10pt} #TDSeparadorDoble {width:100pt; border-bottom:.5pt hairline silver} #TDTituloSeparadorBorderBotton {width:10pt;border-bottom:1pt solid windowtext} #TDTituloBorderBotton {border-bottom:1pt solid windowtext} #TDSubTituloSeparadorBorderBotton {width:10pt;border-bottom:.5pt solid windowtext} #TDSubTituloBorderBotton {border-bottom:.5pt solid windowtext} #TDListadoBorderBottom {border-bottom:.5pt hairline silver} </STYLE><meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type" /></head><body><table><tr><td /><td style="TDSeparadorDoble">Transactions</td></tr><tr style="vertical-align:middle"><td id="TDSeparadorInicial" /><td id="TDNoWrappedLeftDoble"><font id="CabeceraCuerpo">XXXX XXXX XXXX 4271: </font></td><td /><td><font id="CuerpoDetalle">01/02/2014 to 01/08/2014</font></td> The start of the Java for opening and parsing the document was obtained from the web several years ago and I don't fully understand what it is doing but it works with a different program! It is : public class BaseXML { /** Default namespaces support (true). */ protected static final boolean DEFAULT_NAMESPACES = true; /** Default validation support (false). */ protected static final boolean DEFAULT_VALIDATION = false; /** Default Schema validation support (false). */ protected static final boolean DEFAULT_SCHEMA_VALIDATION = false; //Set false in the first constructor called. N.B not synchronised etc so can be fooled. static boolean firstCaller=true; LSParser builder=null; DOMImplementationRegistry registry = null; DOMImplementationLS impl = null; DOMConfiguration config = null; DOMErrorHandler errorHandler = null; LSParserFilter filter = null; HashMap<Object, Object> bookShelf; public BaseXML() { if( ! firstCaller ) { System.err.println(" XML work class already initialised. (Fatal)"); System.exit(1); } firstCaller=false; try { // get DOM Implementation using DOM Registry System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMXSImplementationSourceImpl"); // System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMImplementationSourceImpl"); System.out.println("DOM Impl"); System.out.print(System.getProperty(DOMImplementationRegistry.PROPERTY )); registry = DOMImplementationRegistry.newInstance(); impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); // create DOMBuilder builder = impl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, null); config = builder.getDomConfig(); // create Error Handler errorHandler = new Handlers(); // create filter filter = new Handlers(); builder.setFilter(filter); try { ((org.apache.xerces.parsers.SAXParser) builder).setFeature("http://xml.org/sax/features/validation", false); } catch (org.xml.sax.SAXException e) { System.out.println("error in setting up parser feature"); e.toString(); } // set error handler config.setParameter("error-handler", errorHandler); // set validation feature config.setParameter("validate",Boolean.TRUE); // set schema language config.setParameter("schema-type", " http://www.w3.org/2001/XMLSchema"); } catch ( Exception ex ) { ex.printStackTrace(); } bookShelf=new HashMap<Object, Object>(); } A bit later on is the routine I call to parse the input document: public boolean openXMLFile(String bookName, String FileName) { Document doc=null; try { doc = builder.parseURI(FileName); }catch (Exception e) { System.err.println(" Exception during parse of "+bookName+":"+e.getMessage()); }; if( doc != null ) { bookShelf.put(bookName,doc); return true; }else { return false; } } The delay is during the call to parseURI. The exception that arises when trying to turn off validation as suggested on the web is : org.apache.xerces.dom.DOMXSImplementationSourceImpljava.lang.ClassCastException: org.apache.xerces.parsers.DOMParserImpl cannot be cast to org.apache.xerces.parsers.SAXParser at BaseXML.<init>(BaseXML.java:80) at SanPost.<clinit>(SanPost.java:45) Regards John