Re: Avoid automaton Memory Usage

Anna Björk Nikulásdóttir Thu, 08 Aug 2013 09:56:46 -0700

Am 8.8.2013 um 12:37 schrieb Michael McCandless <luc...@mikemccandless.com>:


> <snip>
>> What would help in my case as I use the same FST for both analyzers, if the 
>> same FST object could be shared among both analyzers. So what I am doing is 
>> to use AnalyzingSuggester.store() and use the stored file for 
>> AnalyzingSuggester.load() and FuzzySuggester.load().
> 
> That's interesting ... so you mean you sometimes want fuzzy
> suggestions and sometimes non-fuzzy ones, off the same built
> suggester?  I believe AnalyzingSuggester and FuzzySuggester in fact
> use the same FST (not certain) ... are you able to do
> FuzzySuggester.load from a previous AnalyzingSuggester.store and it
> works?  And that's still too much RAM?
> 

Yes it works like a charm. I use it for auto completion of non english language 
terms. Often the typed beginning of a term can be used as is and then 
AnlyzingSuggester gives best results, whereas FuzzySuggester would give too 
many results that need a lot of post processing. If the user is lazy and 
because the Android keyboard doesn't always provide easy access to specific 
letters, e.g. 'æ', 'ä', 'ß', etc. or if he mistypes some letters, I use 
FuzzySuggester as fallback if AnalyzingSuggester doesn't yield appropriate 
results. It's a bit of a cludge because FuzzySuggester doesn't boost minimal 
Levenstein-Distance terms.

Performance wise this is absolutely no problem on Android, but memory wise it 
means 2x the FST memory. Atm. 1 FST needs ~20MB. If e.g. I would like to 
simultanously support multiple languages, it's not going to work this way.

Ideally all this could be done on disk/flash only. But this then needs changes 
according to your former proposal via DirectByteBuffer. Do you think going this 
way would yield acceptable performance ? And does mapping a file into memory 
not fill the DRAM with the complete content of the file over time ? Are 
"normal" Lucene indexes accessed this way ?


>> Unfortunately there is no immutable FST class, but as I do not use it in 
>> mulithreaded environment, that is probably not a problem, no ? A quick fix 
>> could be to copy the analyzer classes and change these to such behaviour and 
>> reuse the FST object. Does this make sense functional wise or do I have to 
>> expect problems ?
> 
> Sharing an FST across analyzing and fuzzy suggesters does seem
> worthwhile; it may "just work" today…
> 

I will try then. Do you have any evidence about if it could not work at some 
point in the future ?


>> Would a patch for such behaviour make sense for the existing analyzer 
>> classes or is this use case too specific ?
> 
> It might ... open an issue and we can discuss/iterate there?


If it works here, I will open an issue / provide a patch.


regards,
Anna.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Avoid automaton Memory Usage

Reply via email to