Re: PositionLengthAttribute

2013-09-06 Thread Robert Muir
On Fri, Sep 6, 2013 at 9:32 PM, Benson Margulies wrote: > On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir wrote: >> its the latter. the way its designed to work i think is illustrated >> best in kuromoji analyzer where it heuristically decompounds nouns: >> >> if it decompounds ABCD into AB + CD, the

Re: PositionLengthAttribute

2013-09-06 Thread Benson Margulies
On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir wrote: > its the latter. the way its designed to work i think is illustrated > best in kuromoji analyzer where it heuristically decompounds nouns: > > if it decompounds ABCD into AB + CD, then the tokens are AB and CD. > these both have posinc=1. > howev

Re: PositionLengthAttribute

2013-09-06 Thread Robert Muir
On Fri, Sep 6, 2013 at 8:03 PM, Benson Margulies wrote: > I'm confused by the comment about compound components here. > > If a single token fissions into multiple tokens, then what belongs in > the PositionLengthAttribute. I'm wanting to store a fraction in here! > Or is the idea to store N in the

Re: LookaheadTokenFilter

2013-09-06 Thread Benson Margulies
I think that the penny just dropped, and I should not be using this class. If I call peekToken 10 times while sitting at token 0, this class will stack up all 10 of these _at token position 0_. That's not really very helpful for what I'm doing. I need to borrow code from this class and not use it.

Re: LookaheadTokenFilter

2013-09-06 Thread Benson Margulies
Michael, I'm apparently not fully deconfused yet. I've got a very simple incrementToken function. It calls peekToken to stack up the tokens. afterPosition is never called; I expected it to be called as each of the peeked tokens gets next-ed back out. I assume that I'm missing something simple.

Re: Lucene Concurrent Search

2013-09-06 Thread Stephen Green
Mostly because it already handles all of the I sexing and querying that I expect you'll want to be doing and now with Solr Cloud you can ven scale search beyond one machine. If you're just looking to learn about this stuff, though, it is fun to roll your own! On Friday, September 6, 2013, David M

PositionLengthAttribute

2013-09-06 Thread Benson Margulies
I'm confused by the comment about compound components here. If a single token fissions into multiple tokens, then what belongs in the PositionLengthAttribute. I'm wanting to store a fraction in here! Or is the idea to store N in the 'mother' token and then '1' in each of the babies? -

答复: Smart Chinese Analyzer Performance

2013-09-06 Thread Oliver Xu (Aigine Co)
You may want to try IKAnalyzer which what I have used for years. Never see a delay with this analyzer as you mentioned. Oliver -邮件原件- 发件人: java-user-return-56896-oliver.xu=aigine@lucene.apache.org [mailto:java-user-return-56896-oliver.xu=aigine@lucene.apache.org] 代表 Erick Erickson

Re: 答复: Smart Chinese Analyzer Performance

2013-09-06 Thread Darren Hoffman
Thanks for the advice! On 9/6/13 3:08 PM, "Oliver Xu (Aigine Co)" wrote: >You may want to try IKAnalyzer which what I have used for years. Never >see a >delay with this analyzer as you mentioned. > >Oliver > >-邮件原件- >发件人: java-user-return-56896-oliver.xu=aigine@lucene.apache.org >[ma

Re: Lucene Concurrent Search

2013-09-06 Thread David Miranda
Why use Solr instead of Lucene for this kind of application? 2013/9/6 Stephen Green > Something like: > > public class SearchListener implements ServletContextListener { > > @Override > public void contextInitialized(ServletContextEvent sce) { > > ServletContext sc = sce.getServ

Re: Smart Chinese Analyzer Performance

2013-09-06 Thread Erick Erickson
Well, various people have measured between a 50% and 70+% reduction in memory used for identical data, so I'd say so. The CHANGES.txt is where I'd look to see if anything mentioned is worth your time. Not to mention SolrCloud... Erick On Fri, Sep 6, 2013 at 3:41 PM, Darren Hoffman wrote: > I

Re: Smart Chinese Analyzer Performance

2013-09-06 Thread Darren Hoffman
Thanks for the feedback. I'll keep pressing on then. BTW, I'm not using solr; I am building an Android app. On 9/6/13 1:06 PM, "Erick Erickson" wrote: >Well, various people have measured between a 50% and 70+% reduction in >memory used for identical data, so I'd say so. The CHANGES.txt is where

Smart Chinese Analyzer Performance

2013-09-06 Thread Darren Hoffman
I am using the SmartChineseAnalyzer in v3.6 but accessing or instantiating it for the first time takes 10 to 15 seconds before it does anything. I do not see this huge delay with StandardAnalyzer. Is it loading a cache? Is there someway to speed it up? I am currently using Lucene 3.6 and am tryin

Re: Lucene Concurrent Search

2013-09-06 Thread Stephen Green
Something like: public class SearchListener implements ServletContextListener { @Override public void contextInitialized(ServletContextEvent sce) { ServletContext sc = sce.getServletContext(); String indexDir = sc.getInitParameter("indexDir"); SearcherManager sear

Re: Lucene Concurrent Search

2013-09-06 Thread Ian Lea
For the singleton technique that I use, the per-search code looks like import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.SearcherManager; SearcherManager sm = LuceneSearcherManagerCache.get(indexdir); IndexSearcher s = sm.acquire(); try { search(...); } finall

Re: LookaheadTokenFilter

2013-09-06 Thread Benson Margulies
On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless wrote: > > On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies wrote: > > I'm trying to work through the logic of reading ahead until I've seen > > marker for the end of a sentence, then applying some analysis to all of the > > tokens of the sentenc

Re: LookaheadTokenFilter

2013-09-06 Thread Michael McCandless
On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies wrote: > I'm trying to work through the logic of reading ahead until I've seen > marker for the end of a sentence, then applying some analysis to all of the > tokens of the sentence, and then changing some attributes of each token to > reflect the r

Re: LookaheadTokenFilter

2013-09-06 Thread Michael McCandless
It's in test-framework only because no "real" TokenFilter uses it yet, and, it's all very new code :) My intention was to eventually cutover tricky graph TokenFilters (like SynFilter), to simplify them, factoring out the common buffering of tokens by position into LookaheadTokenFilter, but I never

Re: Basic understanding and difference between getSuggestion and loopup method of InfixSuggester.

2013-09-06 Thread Michael McCandless
AnalyzingInfixSuggester will match based on tokens or prefix of tokens. So query "oz" will match "wizard of oz" and also "who is ozzy osbourne". The other suggesters are strictly prefix match, so "oz" can only match when the suggestion starts with oz, e.g. "ozzy osbourne". But FuzzySuggester all

Re: How to get hits coordinates in Lucene 4.4.0

2013-09-06 Thread Darren Hoffman
Lingviston, Can you tell me what IDE and process you are using to build your APK file? I am having issues with loading the Lucene42Codec and I see the code you are using is just like mine. However, when I try to run the app, I get an exception stating that it can't find the codec. I am using Int