Re: Lucene vs RDBMS indexing at scale

2013-02-06 Thread Andrew Gilmartin
Drew Kutcharian wrote: I'm trying to figure out what would be a better approach to indexing when it comes to a large number of records (say 1 billion) A rule of thumb is that if you want a list of exact matches use a database. If you want a ranked list of matches use Lucene. -- Andrew

Re: How to find related words ?

2013-01-31 Thread Andrew Gilmartin
wgggfiy wrote: en, it seems nice, but I'm puzzled by you and Andrew Gilmartina above, what's the difference between you guys ? The different is that similar documents do not give you similar terms. Similar documents can show a correlation of terms -- ie, whereever Lucene is mentioned so is So

Re: How to find related words ?

2013-01-30 Thread Andrew Gilmartin
wgggfiy wrote: In short, you put in a term like "Lucene", and The ideal output would be "solr", "index", "full-text search", and so on. How to make it ? to find the related words. thx My idea is to use FuzzyQuery, or MoreLikeThis, or calc the score with all the terms and then sort. Any idea ? T

Re: Custom Query Syntax/Parser

2013-01-28 Thread Andrew Gilmartin
Uwe Schindler wrote: there is no need to extend Lucene's QueryParser. Lucene by itself does not need a Query Parser at all and it does not use it, it is just a convenience class. If you have worked with Antlr to generate a grammar, just use it and build the final org.apache.lucene.search.Quer

Re: about isStored method

2013-01-26 Thread Andrew Gilmartin
honetic filter"? Appreciate if you could show an example? I don't any experience with this side Lucene. When Google fails to find something relevant try a Delicious search. For example, https://delicious.com/search?p=lucene%2Cphonetic Good luck.

Re: Filtering top hits based on stored field? And Lucene 1.x -> 3.x for Dummies

2013-01-25 Thread Andrew Gilmartin
Ian Lea wrote: Thank you for the quick and helpful reply. I had forgotten that Lucene's change document was one of best example of change documents around. I will read it. On the specific question, calling doc() is still expensive. You could look at the FieldCache or the new DocValues stuff.

Filtering top hits based on stored field? And Lucene 1.x -> 3.x for Dummies

2013-01-25 Thread Andrew Gilmartin
redundant and learn Lucene and Solr anew? Or are there documents that can better direct my re-education? -- Andrew -- Andrew Gilmartin and...@andrewgilmartin.com 401-441-2062 - To unsubscribe, e-mail: java-user-unsubscr