[ANN] katta-0.1.0 release - distribute lucene indexes in a grid

2008-09-18 Thread Stefan Groschupf
After 5 month work we are happy to announce the first developer preview release of katta. This release contains all functionality to serve a large, sharded lucene index on many servers. Katta is standing on the shoulders of the giants lucene, hadoop and zookeeper. Main features: + Plays wel

Re: Caching Results

2005-11-29 Thread Stefan Groschupf
Well, this depends, in case you have a small index just some million documents that it make no sense. But in case you have some hundred millions documents and may use distributed searching it makes a lot of sense. Just check ehcache.sf.net i found it very useful. HTH Stefan Am 29.11.2005 um 1

Re: Is There Other Ports of Nutch?

2005-11-06 Thread Stefan Groschupf
No! Porting nutch in general makes no sense. Since nutch is not a library as lucene but a complete ready to use application you can download and start. There is a kind of 'webservice' (open search rss) to be able to integrate nutch search results in third party applications. Stefan ... and

Re: How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Stefan Groschupf
BTW, there are some cool free ad servers available as open source... Am 25.10.2005 um 09:14 schrieb Sam Lee: Hi, My network is designed to have a bunch of advertisers to enter their ads with keywords. I think of using mysql to store those, and then use lucene and part of nutch to index them fr

Re: Can I Do Reverse Search?

2005-10-23 Thread Stefan Groschupf
hen # of query is 1 only. Huge difference! Any idea how to accomplish this? --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: Index the keywords of your ads with lucene. Extract all words from your page (ajax), remove stop words, build a query from the page words by connect the words with

Re: Can I Do Reverse Search?

2005-10-23 Thread Stefan Groschupf
Index the keywords of your ads with lucene. Extract all words from your page (ajax), remove stop words, build a query from the page words by connect the words with OR and you will find the best matching ad. You may need to limit the words per page or set the maximum clauses to a much higher

Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???

2005-08-08 Thread Stefan Groschupf
Hi, I run in the same problem some weeks ago as well. You can find following in the java doc: "Note: this value is not stored directly with the document in the index. Documents returned from IndexReader.document(int) and Hits.doc (int) may thus not have the same value present as when this fiel

Re: URL search causes BooleanQuery TooManyClauses Excp

2005-05-23 Thread Stefan Groschupf
Andrew, the solution for RangeQueries will work for WildcardQueries as well. see: http://wiki.apache.org/jakarta-lucene/ LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831 HTH Stefan Am 23.05.2005 um 21:26 schrieb Andrew Boyd: Hi All, I have an index with 4811 documents each of which

Re: boosting?

2005-03-22 Thread Stefan Groschupf
sible unless the length normalization is 1.0 (which is not usually a good idea). Erik On Mar 21, 2005, at 4:35 PM, Stefan Groschupf wrote: Hi there, how to get the real boost value of a field or document? The java doc says that it is _may_ not correct returned when reading a document w

boosting?

2005-03-21 Thread Stefan Groschupf
Hi there, how to get the real boost value of a field or document? The java doc says that it is _may_ not correct returned when reading a document with a index reader. Any hints how to get the boost when reading a document? Thanks. Stefan -