[ 
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5468:
--------------------------------

    Attachment: LUCENE-5468.patch

I think the change is ready. There are other improvements that can be done (for 
example, maybe an option for the factory to cache these things in case you use 
same ones across multiple fields, and more efficient affix handling against the 
FST, and so on), but it would be better on different issues I think?

Here is a patch (from diff-sources), sorry its not so useful, as I renamed some 
things. I tried making one from svn diff after reintegration, but it was 
equally useless. If you want you can also review my commits on this issue to 
the branch, too.

here is CHANGES entry:

API Changes:

* LUCENE-5468: Move offline Sort (from suggest module) to OfflineSort. (Robert 
Muir)

Optimizations:

* LUCENE-5468: HunspellStemFilter uses 10 to 100x less RAM. It also loads
  all known openoffice dictionaries without error, and supports an additional
  longestOnly option for a less aggressive approach.  (Robert Muir)



> Hunspell very high memory use when loading dictionary
> -----------------------------------------------------
>
>                 Key: LUCENE-5468
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5468
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Maciej Lisiewski
>            Priority: Minor
>         Attachments: LUCENE-5468.patch, patch.txt
>
>
> Hunspell stemmer requires gigantic (for the task) amounts of memory to load 
> dictionary/rules files. 
> For example loading a 4.5 MB polish dictionary (with empty index!) will cause 
> whole core to crash with various out of memory errors unless you set max heap 
> size close to 2GB or more.
> By comparison Stempel using the same dictionary file works just fine with 1/8 
> of that (and possibly lower values as well).
> Sample error log entries:
> http://pastebin.com/fSrdd5W1
> http://pastebin.com/Lmi0re7Z



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to