Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
Thanks Erik, thats what I thought. In my case no phrase queries are done, so it seems I am good to go. Any additional thoughts on the issue are welcomed. Thanks Erick Erickson wrote: > > No, a phrase search it will NOT match. Phrase semantics > requires that split tokens be adjacent (slop of 0

Re: In memory MultiSearcher

2007-05-21 Thread Erick Erickson
Why are you doing this in the first place? Do you actually have evidence that the default Lucene behavior (caching, etc) is inadequate for your needs? I'd *strongly* recommend, if you haven't, just using the regular FSDirectories rather than RAMDirectories and only getting complex if that's too s

Re: getTermFreqVector atomicity

2007-05-21 Thread Erick Erickson
An IndexReader doesn't see changes in the index unless you close and reopen it, but if there is significant time between the time you fetch your docid and read it's vector, that could be a problem. You can always use TermEnum/TermDocs to find the doc ID associated with a particular field you have

In memory MultiSearcher

2007-05-21 Thread Peter W.
Hello, I have been using a large, in memory MultiSearcher that is reaching the limits of my hardware RAM with this code: try { IndexSearcher[] searcher_a= { new IndexSearcher(new RAMDirectory(index_one_path)), new IndexSearcher(new RAMD

getTermFreqVector atomicity

2007-05-21 Thread Walter Ferrara
I'm interested in getting the term vector of a lucene doc. The point is, it seems I have to give to the IndexReader.getTermFreqVector a doc ID, while I would know if there is a way to get the termvector by a doc identifier (not lucene doc id, but a my own field). I know how to get the lucene docid

Re: search result problem

2007-05-21 Thread Doron Cohen
Stefan Colella wrote: > I tried to only add the content of the page where that expression can be > found (instead of the whole document) and then the search works. > > Do i have to split my pdf text into more field? Or what could be the > problem? Perhaps indexWriter's setMaxFieldLength() is rel

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Erick Erickson
No, a phrase search it will NOT match. Phrase semantics requires that split tokens be adjacent (slop of 0). So, since "mainstrasse" was split into two tokens at index time, the test for "is schöne right next to strasse" will fail because of the intervening (introduced) term "main". Whether this is

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
I will never have "mainstrasse" in my lucene index, since strasse is always replaced with " strasse" causing "mainstrasse" to be split to "main strasse". So the example you gave: "schöne strasse" will match "schöne mainstrasse", since in the lucene index I have "schöne main strasse". Daniel Nabe

Re: Implement a tokenizer

2007-05-21 Thread Chris Lu
Actually before you jump in, be warned that the "+" plus sign is also part of query parser. You can not really/easily pass the query with the "+" sign through query parser in order to get a match. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application

RE: Implement a tokenizer

2007-05-21 Thread Mordo, Aviran (EXP N-NANNATEK)
What you need to do is to create your own tokenizer. Just copy the code from the StandardTokenizer to your XYZTokenizer and make your changes. Then you need to create your own Analyzer class (again copy the code from the StandardAnalyzer) and user your XYZTokenizer in the new XYZAnalyzer you create

Re: How to Update the Index once it is created

2007-05-21 Thread Chris Lu
Does it mandate you to pass data through Hibernate? This seems very similar to Compass' approach. I believe a more generic approach is to compare what's already indexed with what's changed or deleted, so you can use any framework to work with Lucene. And simply selecting the data and creating the

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Daniel Naber
On Monday 21 May 2007 22:53, bhecht wrote: > If someone searches for mainstrasse, my tools will split it again to > main and strasse, and then lucene will be able to find it. "strasse" will match "mainstrasse" but the phrase query "schöne strasse" will not match "schöne mainstrasse". However, th

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
Thanks Daniel, But when searching, I will run my "standardization" tools again before querying Lucene, so what you mentioned will not be a problem. If someone searches for mainstrasse, my tools will split it again to main and strasse, and then lucene will be able to find it. Daniel Naber-5 wrot

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Daniel Naber
On Monday 21 May 2007 22:05, bhecht wrote: > Is there any point for me to start creating custom analyzers with filter > for stop words, synonyms, and implementing my own "sub string" filter, > for separating tokens into "sub words" (like "mainstrasse"=> "main", > "strasse") Yes: I assume your doc

stop words, synonyms... what's in it for me?

2007-05-21 Thread bhecht
Hi there, I have started using Lucene not long ago, with plans to replace my current sql queries in my application with it. As I wasn't aware of Lucene before, I have implemented some similar tools (filters) as Lucene includes. For example I have implemented a "stop word" tool. In my case I have

Implement a tokenizer

2007-05-21 Thread bhecht
Hi there, I was interested in changing the StandardTokenzier so it will not remove the "+" (plus) sign from my stream. Looking in the code and documentation, it reads: "If this tokenizer does not suit your application, please consider copying this source code directory to your project and maint

Re: How to Update the Index once it is created

2007-05-21 Thread bhecht
If you are using Orcale and Lucene, check out http://www.hibernate.org/410.html "Hibernate Search" , this will automaticly update your lucene index, on any change to your database table Erick Erickson wrote: > > You have to delete the old document and add it a new one. > > See IndexModifier c

Re: Upgrade 2.0 -> 2.1

2007-05-21 Thread Svend Ole Nielsen
Hi Ian Well it worked. Thanks :) Wasn't aware of that could have fixed it, but after your suggestion it seemed like the most logical solution. /Svend man, 21 05 2007 kl. 14:30 +0100, skrev Ian Lea: > Hi > > > I saw this or something similar going from 2.0 to 2.1 when hadn't > recompiled all

Re: Very odd behaviour of FrenchAnalyzer with strings in capital letters

2007-05-21 Thread Jolinar13
Hello, Thank you for your quick answer. I use Luke to examine the index, but since I switched to FrenchAnalyzer, it says 'Not a Lucene index'. If I open the index files in a text viewer, the strings are in UPPER case. I do use the same analyzer to index and search. So, do I have to specify the Fre

Re: documents with large numbers of fields

2007-05-21 Thread Steven Rowe
Mike Klaas wrote: > On 18-May-07, at 1:01 PM, charlie w wrote: >> Is there an upper limit on the number of fields comprising a document, >> and if so what is it? > > There is not. They are relatively costless if omitNorms=False Mike, I think you meant "relatively costless if omitNorms=True". St

Re: Upgrade 2.0 -> 2.1

2007-05-21 Thread Ian Lea
Hi I saw this or something similar going from 2.0 to 2.1 when hadn't recompiled all my lucene related code. It went away when everything was recompiled, so I'd guess you've got an old class file lurking somewhere. -- Ian. On 5/21/07, Svend Ole Nielsen <[EMAIL PROTECTED]> wrote: Hi I have t

Re: Very odd behaviour of FrenchAnalyzer with strings in capital letters

2007-05-21 Thread Erick Erickson
First have you gotten a copy of Luke to examine your index to see what's actually indexed? The default behavior is usually to lowercase everything, but I'm not entirely sure if the French analyzer does this. But I suspect so. Searches are case sensitive. To get caseless searching, you need to pu

Upgrade 2.0 -> 2.1

2007-05-21 Thread Svend Ole Nielsen
Hi I have tried to upgrade from 2.0 -> 2.1 to overcome some NFS-issues. It compiles just fine, but when I run the application and try to add a document if throws an exception stating NoSuchMethod. This happens when I try to add an object of type Field to a newly created empty Document. I have eras

Very odd behaviour of FrenchAnalyzer with strings in capital letters

2007-05-21 Thread Jolinar13
Hello, I tried org.apache.lucene.analysis.fr.FrenchAnalyzer and I got strange search results on strings in uppercase. (example : VEHICLE) When I search the string (in lower case), I get no result. I get results if I use "vehicle*" or "vehiclE", or "vehicLe" etc. What is odd is that it affects on

Re: search result problem

2007-05-21 Thread Stefan Colella
hello, thx for u reply, i used the explain method and i understand now why some documents are returned. I am using the same Analyzer for indexing and searching. I tried to only add the content of the page where that expression can be found (instead of the whole document) and then the search

Re: Optional terms in BooleanQuery

2007-05-21 Thread Soeren Pekrul
Peter Bloem wrote: [...] "+(A B) C D E" [...] In other words, Lucene considers all documents that have both A and B, and ranks them higher if they also have C D or E. Hello Peter, for my understanding "+(A B) C D E" means at least one of the terms "A" or "B" must be contained and the terms