Need help with XML Query Parser for search form

2009-12-30 Thread syedfa
Dear fellow Java developers: I am setting up an advanced search page that is very similar to google's and yahoo's. I have four text fields with the following labels: With all of the words: With the exact phrase: With at least on one of the words:

Re: Question about many fields within a single index

2009-12-30 Thread Tom Hill
Hi - One thing to consider is field norms. If your fields aren't analyzed, this doesn't apply to you. But if you do have norms, I believe that it's one by per field with norms x number of documents. It doesn't matter if the field occurs in a document or not, it's nTotalFields x nDocs. So, an ind

Re: Copy and augment an indexed Document

2009-12-30 Thread tsuraan
> It's an open question whether this is more or less work than > re-parsing the document (I infer that you have the originals > available). Before trying to reconstruct the document I'd > ask how often you need to do this. The gremlins coming out > of the woodwork from reconstruction would consume

Re: Copy and augment an indexed Document

2009-12-30 Thread Erick Erickson
It is possible to reconstruct a document from the terms, but it's a lossy process. Luke does this (you can see from the UI, and the code is available). There's no utility that I know of to make this easy. It's an open question whether this is more or less work than re-parsing the document (I infer

Re: Copy and augment an indexed Document

2009-12-30 Thread Grant Ingersoll
On Dec 30, 2009, at 5:08 PM, tsuraan wrote: > Suppose I have a (useful) document stored in a Lucene index, and I > have a variant that I'd also like to be able to search. This variant > has the exact same data as the original document, but with some extra > fields. I'd like to be able to use an

Re: Different Analyzers

2009-12-30 Thread Max Lynch
> Alternatively, if one of the "regular" analyzers works for you *except* > for lower-casing, just use that one for your mixed-case field and > lower-case your input and send it to your lower-case field. > > Be careful to do the same steps when querying . > Thanks Erick, I didn't think about this.

Re: Different Analyzers

2009-12-30 Thread Erick Erickson
See PerFieldAnalyzerWrapper for an easy way to implement two fields in the same document processed with different analyzers. So basically you're copying the input to two fields that handle things slightly differently. As far as re-implementing stuff, no real re-implementing is necessary, just crea

Copy and augment an indexed Document

2009-12-30 Thread tsuraan
Suppose I have a (useful) document stored in a Lucene index, and I have a variant that I'd also like to be able to search. This variant has the exact same data as the original document, but with some extra fields. I'd like to be able to use an IndexReader to get the document that I stored, use th

Lucene 2.9: IOException from IndexReader.reopen() - Real time search

2009-12-30 Thread Kumaravel Kandasami
I am getting IOException when I am doing a "Real-time" search, i.e. I am creating a Index using the Index Writer and also opening the Index using Index Reader (writer.getReader()) to make sure the document does not exist prior adding to the Index file. The code works perfect fine multiple time ind

Re: RES: Question about TokenStream lucene 3.0

2009-12-30 Thread AHMET ARSLAN
> System.out.println(typeAtt.type()); > ??? And this typeAtt? > > Thanks! > Yes. You can add the other attributes if you want. By the way i forget to remove (TermAttribute) and TypeAttribute). You don't need them in 3.0.0. TermAttribute termAtt = tokenStream.getAttribute(TermAttribute.class);

RE: Using the new tokenizer API from a jar file

2009-12-30 Thread Uwe Schindler
That would be good, if you could test it! Please checkout Lucene 2.9 branch from svn (http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9), compile the whole package (at least lucene-core.jar) and then replace the lucene jar files in solr's lib folder. Uwe - Uwe Schindler H.-H.-M

Re: Question about many fields within a single index

2009-12-30 Thread Renaud Delbru
Hi, just sharing some personal experiences in this domain, We performed some benchmarks in a similar setup (indexing millions of documents with thousands of fields) to measure the impact of large number of fields on a Lucene index. We observed that more you have fields, more the dictionary wil

Re: Different Analyzers

2009-12-30 Thread Max Lynch
> I just want to see if it's safe to use two different analyzers for the > following situation: > > I have an index that I want to preserve case with so I can do > case-sensitive > searches with my WhitespaceAnalyzer. However, I also want to do case > insensitive searches. you should also make su

RES: Question about TokenStream lucene 3.0

2009-12-30 Thread Mário André
System.out.println(typeAtt.type()); ??? And this typeAtt? Thanks! - Mário André Instituto Federal de Educação, Ciência e Tecnologia de Sergipe - IFS Mestrando em MCC - Universidade Federal de Alagoas - UFAL http://www.marioandre.

Re: Question about TokenStream lucene 3.0

2009-12-30 Thread AHMET ARSLAN
> Using PorterStemFilter and removing the stopwords, but how > can I use > TokenStream in release 3.0 (print the result this method). > > I tried to use: > >     public static void main(String[] args) throws > IOException, > ParseException >     { >       StringReader sr = new > StringReader("T

Re: Highlighter doesn't highlight wildcard queries after updating to 2.9.1/3.0.0

2009-12-30 Thread Mohsen Saboorian
Yes I can (though I need some time, since I have my nested custom analyzers and filter). I'll try to write a test scenario to reproduce this issue. For now, can you tell me if these steps are correct for instantiating and using highlighter: IndexSearcher is = new IndexSearcher(indexReader); Quer

Question about TokenStream lucene 3.0

2009-12-30 Thread Mário André
Hi, I have the method below: public final TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new LowerCaseTokenizer(reader); result = new StopFilter(true, result, stopWords, true); result = new PorterStemFilter(result); return re

Re: Multi-value (complex) field indexing

2009-12-30 Thread Leonid M.
* Yes, I understand the first part about two rows and querying. * The problem is - I'm not the one creating those Analyzers and storing documents into indexes. All I could say - "add this field to document", it's as simple as this. Luckily the system is built using Pico and OSGi, so I will try t

Re: Question about many fields within a single index

2009-12-30 Thread Erick Erickson
As far as I know, no problem. There's no penalty that I know of for having this kind of setup. Of course your mileage may vary, and a relevant question is "why do you care?" That is, if your total index is 100M in size, pretty much no matter how Lucene implements the internal data structures you wo

Re: Highlighter doesn't highlight wildcard queries after updating to 2.9.1/3.0.0

2009-12-30 Thread Mark Miller
Mohsen Saboorian wrote: > After updating to 2.9.x or 3.0, highlighter doesn't work on wildcard queries > like "abc*". I thought that it would be because of scoring, so I also set > myIndexSearcher.setDefaultFieldSortScoring(true, true) before searching. > I tested with both QueryScorer and QueryTer

Highlighter doesn't highlight wildcard queries after updating to 2.9.1/3.0.0

2009-12-30 Thread Mohsen Saboorian
After updating to 2.9.x or 3.0, highlighter doesn't work on wildcard queries like "abc*". I thought that it would be because of scoring, so I also set myIndexSearcher.setDefaultFieldSortScoring(true, true) before searching. I tested with both QueryScorer and QueryTermScorer. In my custom highligh

Re: Multi-value (complex) field indexing

2009-12-30 Thread Erick Erickson
You'll have one problem if you can't return a different increment gap, you'll match across rows. Say you index row 1 with "aaa" "bbb" "ccc", then row two with "ddd", "eee", "fff". Just adding multiple rows to a single document, that document would match the phrase "ccc ddd". I don't understand wh

Question about many fields within a single index

2009-12-30 Thread Jason Tesser
I have a situation where I might have 1000 different types of Lucene Documents each with 10 or so fields with different names that get indexed. I am wondering if this is bad to do within Lucene. I end up with 10,000 fields within the index although any given document has only 10 or so. I was hop

Re: Using the new tokenizer API from a jar file

2009-12-30 Thread Ahmed El-dawy
Thanks all for your interest, especially Uwe. I asked this question on solr-user at the beginning but I got no reply. That's why I re-asked the question at java-user. Thanks for your efforts. I will try it now. On Mon, Dec 28, 2009 at 12:02 PM, Uwe Schindler wrote: > I opened https://issues.apac