Odd.  I don't know of any memory leaks w/ the demo HTMLParser, hmm
though it's doing some fairly scary stuff in its getReader() method.
EG it spawns a new thread every time you run it.  And, it's parsing
the entire HTML document even though you only want the title.

You may want to switch to better supported HTMLParsers, eg NekoHTML.

Plus, it would be better if you extracted the title during indexing,
and stored in the document, than doing all this work at search time.
You want CPU at search time to be minimized (think of all the
electricity...).

But: if you increase the HEAP do you still eventually hit OOME?

Mike

Chetan Shah <chetankrs...@gmail.com> wrote:
>
> After some more researching I discovered that the following code snippet
> seems to be the culprit. I have to call this to get the "title" of the
> indexed html page. And this is called 10 times as my I display 10 results on
> a page.
>
> Any Suggestions on how to achieve this without the OOME issue.
>
>
>                File f = new File(htmlFileName);
>                FileInputStream fis = new FileInputStream(f);
>                HTMLParser parser = new HTMLParser(fis);
>                String title = parser.getTitle();
>                /* following was added to for my sanity :) */
>                parser = null;
>                fis.close();
>                fis = null;
>                f = null;
>                /* till here */
>                return title;
>
>
> Chetan Shah wrote:
>>
>> I am initiating a simple search and after profiling the my application
>> using NetBeans. I see a constant heap consumption and eventually a server
>> (tomcat) crash due to "out of memory" error. The thread count also keeps
>> on increasing and most of the threads in "wait" state.
>>
>> Please let me know what am I doing wrong here so that I can avoid server
>> crash. I am using Lucene 2.4.0.
>>
>>
>>                       IndexSearcher indexSearcher =
>> IndexSearcherFactory.getInstance().getIndexSearcher();
>>
>>                       //Create the query and search
>>                       QueryParser queryParser = new QueryParser("contents", 
>> new
>> StandardAnalyzer());
>>                       Query query = queryParser.parse(searchCriteria);
>>
>>
>>                       TermsFilter categoryFilter = null;
>>
>>                       // Create the filter if it is needed.
>>                       if (filter != null) {
>>                               Term aTerm = new 
>> Term(Constants.WATCH_LIST_TYPE_TERM);
>>                               categoryFilter = new TermsFilter();
>>                               for (int i = 0; i < filter.length; i++) {
>>                                       aTerm = aTerm.createTerm(filter[i]);
>>                                       categoryFilter.addTerm(aTerm);
>>                               }
>>                       }
>>
>>                       // Create sort criteria
>>                       SortField [] sortFields = new SortField[2];
>>                       SortField watchList = new 
>> SortField(Constants.WATCH_LIST_TYPE_TERM,
>> SortField.STRING);
>>                       SortField score = SortField.FIELD_SCORE;
>>                       if (sortByWatchList) {
>>                               sortFields[0] = watchList;
>>                               sortFields[1] = score;
>>                       } else {
>>                               sortFields[1] = watchList;
>>                               sortFields[0] = score;
>>
>>                       }
>>                       Sort sort = new Sort(sortFields);
>>
>>                       // Collect results
>>                       TopDocs topDocs = indexSearcher.search(query, 
>> categoryFilter,
>> Constants.MAX_HITS, sort);
>>                       ScoreDoc scoreDoc[] = topDocs.scoreDocs;
>>                       int numDocs = scoreDoc.length;
>>                       if (numDocs > 0) results = scoreDoc;
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Memory-Leak--tp22663917p22685294.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to