Re: Questions about doing a full text search with numeric values

2013-07-03 Thread Ivan Krišto
ide more "notepad-like find" ability as it is able to search for part of the word, but it will introduce more noise in search results. Also, it will deal with Erick Erickson's example: > That won't deal with this example though: 00123456. Regards, Ivan Krišto &

Re: Compare the input string with stored string and Take decision.

2013-07-10 Thread Ivan Krišto
hashtable will do just fine). Use lucene only if: comparison method is complicated (searching over tokens which involves tokenization and normalization) and you have lots of strings (documents). Otherwise, it's an overkill. Regards, Ivan Krišto -

Re: Compare the input string with stored string and Take decision.

2013-07-11 Thread Ivan Krišto
uted hashtable (some are also know as Key-Value-Stores). Notable products: Apache Voldemort, Redis (extremly simple with lots of bindings), Riak, ... Regards, Ivan Krišto > On 7/11/2013 11:59 AM, Ivan Krišto wrote: >> On 07/11/2013 08:04 AM, Ankit Murarka wrote: >> >

Re: ngrams in Lucene 4.3.0

2013-07-15 Thread Ivan Krišto
} }; } } If, for example, you want to remove stop words from document before breaking it into n-grams, than you would need: reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter Regards, Ivan Krišto ---

Re: Complete phrase Suggest Feature in Apache Lucene

2013-08-02 Thread Ivan Krišto
y "how to use lucene", you would index "how to use" and "to use lucene" as phrases) -- than you would "fix" given query by parts. - To explore more solutions of this problem search papers for "related query suggestion". - Twitter came to similar idea as

Re: Complete phrase Suggest Feature in Apache Lucene

2013-08-06 Thread Ivan Krišto
) throws IOException { String[] suggestions = phraseRecommender.suggestSimilar(query, 5); if (suggestions.length > 0) return suggestions[0]; else return null; } } It prints: Lovely spam! Wonderful spam! This parrot is no more. That Rabbit's Dynamite!! Regards, Ivan Krišto

Re: lucene and ejb applications

2013-08-09 Thread Ivan Krišto
49 Hibernate search is easy way of integrating Lucene into JEE application. Regards, Ivan Krišto

Re: Optimize Lucene 4.4 for CPU usage

2013-08-21 Thread Ivan Krišto
profiler that comes with JDK) should do the trick. Just run profiler against Lucene and check which methods take most of CPU time. Maybe some serialization outside lucene takes most of the CPU time. Regards, Ivan Krišto - To u

Re: Lucene index customization

2013-08-24 Thread Ivan Krišto
would suggest you to try alternatives, especially http://terrier.org/ (flexible IR system with main goal to serve in academic purposes). Regards, Ivan Krišto

Re: Lucene Text Similarity

2013-09-04 Thread Ivan Krišto
taining at least one uppercase letter (add boost of 3 or 4; maybe skip first word of a sentence) - break search text into sentences then search index for each sentence (combine results using borda count or something similar) - do what Koji suggested Regards, Ivan Krišto

Re: Which is the +best +fast HTML parser/tokenizer that I can use with Lucene for indexing HTML content today ?

2011-03-11 Thread Ivan Krišto
robust parser). But, this parser is neither an event nor tree based parser (so, even automata theory can help us here). If you need something pretty specific, like extracting all links from page, I would recommend you to use simple regular expressions.

Re: Lucene for Log file indexing and search

2013-09-19 Thread Ivan Krišto
oduct similar to Solr, also based on Lucene). Regards, Ivan Krišto

Re: Indexing Huge tree structure represented in a Text file

2014-04-15 Thread Ivan Krišto
ck (slides 48-77). Regards, Ivan Krišto On Tue, Apr 15, 2014 at 12:30 PM, kumagirish wrote: > Thanks Doug > > i have gone through SIREN DB Unfortunately i couldn't find enough > examples > which i could match to my requirement could you point me to any examp