André Warnier schrieb am 13.11.2008 um 00:08:05 (+0100): > [...] on the same machine I have a text search and retrieval > application that can sift through a full-text index of 100,000 > documents (1 Gb of text) and retrieve the ones I want in couple of > seconds. It has a 10 Mb memory footprint. That's why the 500 Mb > footprint of Tomcat (with the app) and the 5 minute delay in starting > the app over 25 Mb of XML so struck me.
The app vendor should (a) seperate XML data analysis (which is what it seems to be doing) from application server startup for both time and memory reasons, (b) make the result of the analysis available to the server in a compact and easily digestible form, and (c) make data analysis run in 10 seconds instead of five minutes, which may well be feasible. > I also have learned (separately, and confirmed here several times) > that XML parsing is a hog, and that is not only in Java. Particularly > the DOM-style of parsing exhibits exponential time behaviour in > relation to document size. Large text fields are absolute killers, > and making them CDATA only partly alleviates that. I don't believe it's the parsing that takes so much time here. There must be some subsequent inefficient processing going on. > One alternative to XML in feeding that application with data is CSV > files (the text version of spreadsheet). I had discarded it until now > as old-fashioned, "passé", limited etc.. XML is so much more "in". But > I am having second thoughts now, and I will give it a try. If your data is indeed tabular, CSV is probably fine. > Some reflexes remain for a lifetime. Fortunately! Michael Ludwig --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]