Highly appreciate your replies Michael. No, I don't hit OOME if I comment out the call to getHTMLTitle. The heap behaves perfectly.
I completely agree with you, the thread count goes haywire the moment I call the HTMLParser.getTitle(). I have seen a thread count of like 600 before my I hit OOME (with the getTitle() call on) and 90% of those threads are in wait state. They are not doing anything but just sitting there forever, I am sure they are consuming the heap and never giving it back. Does my hypothesis make sense? Michael McCandless-2 wrote: > > Odd. I don't know of any memory leaks w/ the demo HTMLParser, hmm > though it's doing some fairly scary stuff in its getReader() method. > EG it spawns a new thread every time you run it. And, it's parsing > the entire HTML document even though you only want the title. > > You may want to switch to better supported HTMLParsers, eg NekoHTML. > > Plus, it would be better if you extracted the title during indexing, > and stored in the document, than doing all this work at search time. > You want CPU at search time to be minimized (think of all the > electricity...). > > But: if you increase the HEAP do you still eventually hit OOME? > > Mike > > Chetan Shah <chetankrs...@gmail.com> wrote: >> >> After some more researching I discovered that the following code snippet >> seems to be the culprit. I have to call this to get the "title" of the >> indexed html page. And this is called 10 times as my I display 10 results >> on >> a page. >> >> Any Suggestions on how to achieve this without the OOME issue. >> >> >> File f = new File(htmlFileName); >> FileInputStream fis = new FileInputStream(f); >> HTMLParser parser = new HTMLParser(fis); >> String title = parser.getTitle(); >> /* following was added to for my sanity :) */ >> parser = null; >> fis.close(); >> fis = null; >> f = null; >> /* till here */ >> return title; >> >> >> Chetan Shah wrote: >>> >>> I am initiating a simple search and after profiling the my application >>> using NetBeans. I see a constant heap consumption and eventually a >>> server >>> (tomcat) crash due to "out of memory" error. The thread count also keeps >>> on increasing and most of the threads in "wait" state. >>> >>> Please let me know what am I doing wrong here so that I can avoid server >>> crash. I am using Lucene 2.4.0. >>> >>> >>> IndexSearcher indexSearcher = >>> IndexSearcherFactory.getInstance().getIndexSearcher(); >>> >>> //Create the query and search >>> QueryParser queryParser = new >>> QueryParser("contents", new >>> StandardAnalyzer()); >>> Query query = queryParser.parse(searchCriteria); >>> >>> >>> TermsFilter categoryFilter = null; >>> >>> // Create the filter if it is needed. >>> if (filter != null) { >>> Term aTerm = new >>> Term(Constants.WATCH_LIST_TYPE_TERM); >>> categoryFilter = new TermsFilter(); >>> for (int i = 0; i < filter.length; i++) { >>> aTerm = >>> aTerm.createTerm(filter[i]); >>> categoryFilter.addTerm(aTerm); >>> } >>> } >>> >>> // Create sort criteria >>> SortField [] sortFields = new SortField[2]; >>> SortField watchList = new >>> SortField(Constants.WATCH_LIST_TYPE_TERM, >>> SortField.STRING); >>> SortField score = SortField.FIELD_SCORE; >>> if (sortByWatchList) { >>> sortFields[0] = watchList; >>> sortFields[1] = score; >>> } else { >>> sortFields[1] = watchList; >>> sortFields[0] = score; >>> >>> } >>> Sort sort = new Sort(sortFields); >>> >>> // Collect results >>> TopDocs topDocs = indexSearcher.search(query, >>> categoryFilter, >>> Constants.MAX_HITS, sort); >>> ScoreDoc scoreDoc[] = topDocs.scoreDocs; >>> int numDocs = scoreDoc.length; >>> if (numDocs > 0) results = scoreDoc; >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Memory-Leak--tp22663917p22685294.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Memory-Leak--tp22663917p22686500.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org