Highly appreciate your replies Michael.

No, I don't hit OOME if I comment out the call to getHTMLTitle. The heap
behaves perfectly. 

I completely agree with you, the thread count goes haywire the moment I call
the HTMLParser.getTitle(). I have seen a thread count of like 600 before my
I hit OOME (with the getTitle() call on) and 90% of those threads are in
wait state. They are not doing anything but just sitting there forever, I am
sure they are consuming the heap and never giving it back.

Does my hypothesis make sense?








Michael McCandless-2 wrote:
> 
> Odd.  I don't know of any memory leaks w/ the demo HTMLParser, hmm
> though it's doing some fairly scary stuff in its getReader() method.
> EG it spawns a new thread every time you run it.  And, it's parsing
> the entire HTML document even though you only want the title.
> 
> You may want to switch to better supported HTMLParsers, eg NekoHTML.
> 
> Plus, it would be better if you extracted the title during indexing,
> and stored in the document, than doing all this work at search time.
> You want CPU at search time to be minimized (think of all the
> electricity...).
> 
> But: if you increase the HEAP do you still eventually hit OOME?
> 
> Mike
> 
> Chetan Shah <chetankrs...@gmail.com> wrote:
>>
>> After some more researching I discovered that the following code snippet
>> seems to be the culprit. I have to call this to get the "title" of the
>> indexed html page. And this is called 10 times as my I display 10 results
>> on
>> a page.
>>
>> Any Suggestions on how to achieve this without the OOME issue.
>>
>>
>>                File f = new File(htmlFileName);
>>                FileInputStream fis = new FileInputStream(f);
>>                HTMLParser parser = new HTMLParser(fis);
>>                String title = parser.getTitle();
>>                /* following was added to for my sanity :) */
>>                parser = null;
>>                fis.close();
>>                fis = null;
>>                f = null;
>>                /* till here */
>>                return title;
>>
>>
>> Chetan Shah wrote:
>>>
>>> I am initiating a simple search and after profiling the my application
>>> using NetBeans. I see a constant heap consumption and eventually a
>>> server
>>> (tomcat) crash due to "out of memory" error. The thread count also keeps
>>> on increasing and most of the threads in "wait" state.
>>>
>>> Please let me know what am I doing wrong here so that I can avoid server
>>> crash. I am using Lucene 2.4.0.
>>>
>>>
>>>                       IndexSearcher indexSearcher =
>>> IndexSearcherFactory.getInstance().getIndexSearcher();
>>>
>>>                       //Create the query and search
>>>                       QueryParser queryParser = new
>>> QueryParser("contents", new
>>> StandardAnalyzer());
>>>                       Query query = queryParser.parse(searchCriteria);
>>>
>>>
>>>                       TermsFilter categoryFilter = null;
>>>
>>>                       // Create the filter if it is needed.
>>>                       if (filter != null) {
>>>                               Term aTerm = new
>>> Term(Constants.WATCH_LIST_TYPE_TERM);
>>>                               categoryFilter = new TermsFilter();
>>>                               for (int i = 0; i < filter.length; i++) {
>>>                                       aTerm =
>>> aTerm.createTerm(filter[i]);
>>>                                       categoryFilter.addTerm(aTerm);
>>>                               }
>>>                       }
>>>
>>>                       // Create sort criteria
>>>                       SortField [] sortFields = new SortField[2];
>>>                       SortField watchList = new
>>> SortField(Constants.WATCH_LIST_TYPE_TERM,
>>> SortField.STRING);
>>>                       SortField score = SortField.FIELD_SCORE;
>>>                       if (sortByWatchList) {
>>>                               sortFields[0] = watchList;
>>>                               sortFields[1] = score;
>>>                       } else {
>>>                               sortFields[1] = watchList;
>>>                               sortFields[0] = score;
>>>
>>>                       }
>>>                       Sort sort = new Sort(sortFields);
>>>
>>>                       // Collect results
>>>                       TopDocs topDocs = indexSearcher.search(query,
>>> categoryFilter,
>>> Constants.MAX_HITS, sort);
>>>                       ScoreDoc scoreDoc[] = topDocs.scoreDocs;
>>>                       int numDocs = scoreDoc.length;
>>>                       if (numDocs > 0) results = scoreDoc;
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Memory-Leak--tp22663917p22685294.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Memory-Leak--tp22663917p22686500.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to