ide more "notepad-like find" ability as it is able to
search for part of the word, but it will introduce more noise in search
results.
Also, it will deal with Erick Erickson's example:
> That won't deal with this example though: 00123456.
Regards,
Ivan Krišto
&
hashtable
will do just fine).
Use lucene only if: comparison method is complicated (searching over
tokens which involves tokenization and normalization) and you have lots
of strings (documents).
Otherwise, it's an overkill.
Regards,
Ivan Krišto
-
uted hashtable (some are also know as
Key-Value-Stores). Notable products: Apache Voldemort, Redis (extremly
simple with lots of bindings), Riak, ...
Regards,
Ivan Krišto
> On 7/11/2013 11:59 AM, Ivan Krišto wrote:
>> On 07/11/2013 08:04 AM, Ankit Murarka wrote:
>>
>
}
};
}
}
If, for example, you want to remove stop words from document before
breaking it into n-grams, than you would need:
reader(document) -> SomeTokenizer -> StopFilter -> NGramTokenFilter
Regards,
Ivan Krišto
---
y "how to use lucene", you would index "how to use" and "to use
lucene" as phrases) -- than you would "fix" given query by parts.
- To explore more solutions of this problem search papers for "related
query suggestion".
- Twitter came to similar idea as
) throws IOException {
String[] suggestions = phraseRecommender.suggestSimilar(query, 5);
if (suggestions.length > 0) return suggestions[0];
else return null;
}
}
It prints:
Lovely spam! Wonderful spam!
This parrot is no more.
That Rabbit's Dynamite!!
Regards,
Ivan Krišto
49
Hibernate search is easy way of integrating Lucene into JEE application.
Regards,
Ivan Krišto
profiler that comes with JDK) should do the
trick. Just run profiler against Lucene and check which methods take
most of CPU time. Maybe some serialization outside lucene takes most of
the CPU time.
Regards,
Ivan Krišto
-
To u
would suggest you to try alternatives, especially http://terrier.org/
(flexible IR system with main goal to serve in academic purposes).
Regards,
Ivan Krišto
taining at least one uppercase
letter (add boost of 3 or 4; maybe skip first word of a sentence)
- break search text into sentences then search index for each sentence
(combine results using borda count or something similar)
- do what Koji suggested
Regards,
Ivan Krišto
robust parser). But, this parser is neither an event nor tree
based parser (so, even automata theory can help us here).
If you need something pretty specific, like extracting all links from
page, I would recommend you to use simple regular expressions.
oduct similar to Solr, also based on
Lucene).
Regards,
Ivan Krišto
ck (slides
48-77).
Regards,
Ivan Krišto
On Tue, Apr 15, 2014 at 12:30 PM, kumagirish wrote:
> Thanks Doug
>
> i have gone through SIREN DB Unfortunately i couldn't find enough
> examples
> which i could match to my requirement could you point me to any examp
13 matches
Mail list logo