Re: Searching by bit masks

2006-11-09 Thread Erick Erickson
Let me see if I have a clue what you're trying to do. Warning: I'm a bit confused since "filter" has a very specific meaning in Lucene, so when you talk about filters I'm assuming that you're NOT talking about Lucene filters, but rather just a set of flags you're associating with each document, an

Searching by bit masks

2006-11-09 Thread Larry Taylor
Hello, I am currently evaluating Lucene to see if it would be appropriate to replace my company's current search software. So far everything has been looking great, however there is one requirement that I am not too certain about. What we need to do is to be able to store a bit mask specifying

Re: TerraCotta cluster Lucene

2006-11-09 Thread Steve Harris
Interesting that the article references a post I made here. In the end I did not end up needing that lucene code change for two reasons. 1) I ended up just clustering the RAMDirectory itself so the subclass of the collection ended up not being shared 2) In the coming release (early december) we'l

Re: PDF Highlighting Again

2006-11-09 Thread Daniel Naber
On Thursday 09 November 2006 19:55, Renaud Waldura wrote: > I'm thinking I might have to tokenize the document text (I have it), > then compute the intersection between the set of all terms and the set > of terms from the rewritten query. Blech. Sounds expensive. Any other > ideas? No faster, but

PDF Highlighting Again

2006-11-09 Thread Renaud Waldura
Greetings: I read the mailing-list archives about this topic and found the PDFBox solutions at: http://www.pdfbox.org/userguide/highlighting.html Basically there are 3 options: 1- append query parameters to the PDF URL 2- generate a highlight XML document that Acrobat Reader will download separa

TerraCotta cluster Lucene

2006-11-09 Thread karl wettin
Some people might find this interesting. I have personally not looked at it in depth: Engineers at TerraCotta have detailed a new way to cluster lucene, the popular text search library from Apache. Their method involves implementing the lucene RAMDirectory interface and using TerraCotta D

Multi Query MultiSearcher

2006-11-09 Thread Mark Miller
Okay, so no help with the JGuruMultisearcher...How about something more specific: It seems easy enough to just copy The JGuruMS method of keeping a an array of Weight's around and feeding a different one to each subsearcher...I am worried about the following method though...I am guessing that thi

RE: Google Coop - Lucene style

2006-11-09 Thread Vladimir Olenin
I think it's pretty straighforward: the 'custom search engine' is essentially the 'filter' that can also modify score weights of found documents. I'd say 'coop engine' + 'your query' should be relatively easily reducted into your 'your extended query', once you subsitute 'coop engine' with 'query p

Re: Specific Query on multiple fields

2006-11-09 Thread Patrick Turcotte
Hi, How do we use a specific query on multiple fields ? for eg. I have to run a query "jakarta tomcat" (the string which i give in my textbox is with double quotes as I have to get the string 'jakarta tomcat' together) on mutiple fields like "content" ,"title","examples" Take a look at org.a

Re: Specific Query on multiple fields

2006-11-09 Thread Erick Erickson
Well, if I think you can create three PhraseQueries and combine them in a BooleanQuery with SHOULD. That is PhraseQuery pq1 = new PhraseQuery(); pq1.add(new Term("content", "jakarta tomcat")); pq2.add(new Term("title", "jakarta tomcat")); BooleanQuery bq = new BooleanQuery(); bq.add(pq1, Boolean

Specific Query on multiple fields

2006-11-09 Thread Krishnendra Nandi
Hi All, How do we use a specific query on multiple fields ? for eg. I have to run a query "jakarta tomcat" (the string which i give in my textbox is with double quotes as I have to get the string 'jakarta tomcat' together) on mutiple fields like "content" ,"title","examples" Also if I have a

Scoring depending on terms combination

2006-11-09 Thread Soeren Pekrul
How can I manipulate the score depending on the combination of query terms containing in the result document? Not a single term is important. That could be boosted. Important is the combination of terms. The user searches for the terms A, B, C and D. Of-course, the document containing all terms

Re: whats the correct way to do normalisation?

2006-11-09 Thread Joe
Hi, : I want "Überraschung" is found by : : Überr* : Ueberr* : : So the best i can do is to do the normalisation manually(not by an : analyzer) before the indexing/searching process? Or use an Analyzer at index time that puts both the UTF-8 version of the string and the Latin-1 version of the st