2.9 per segment searching/caching

2009-10-21 Thread Bill Au
Since Lucene 2.9 has per segment searching/caching, does query performance degrade less than before (2.9) as more segments are added to the index? Bill

Resolving Lucene Index error

2009-10-21 Thread mitu2009
Hi, Why do I get error like this in Lucene and how to resolve it? Could not find file 'C:\Indexes_z3_1.del'. Thanks. -- View this message in context: http://www.nabble.com/Resolving-Lucene-Index-error-tp26002849p26002849.html Sent from the Lucene - Java Users mailing list archive at Nabble.c

Handling + as a special character in Lucene search

2009-10-21 Thread mitu2009
Hi, How do i make sure lucene gives me back relevant search results when my input string contains terms like c++? Lucene seems to ignore ++ characters. Thanks -- View this message in context: http://www.nabble.com/Handling-%2B-as-a-special-character-in-Lucene-search-tp26002815p26002815.html S

Re: JDBC access to a Lucene index

2009-10-21 Thread Jukka Zitting
Hi, On Mon, Oct 19, 2009 at 10:43 PM, Marcelo Ochoa wrote: >   This is similar approach to Lucene Domain Index: > http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg >    But Lucene Domain Index is an specific implementation for Oracle > Databases 10g/11g which is integrated through the ODCI API and

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Robert Muir
there is some information on this topic in the pkg summary: http://lucene.apache.org/java/2_9_0/api/contrib-analyzers/org/apache/lucene/analysis/compound/package-summary.html in short, for a large list (there is no limit in the code), you will want to make use of a hyphenation grammar as well: Hy

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Paul Libbrecht
Great, now the next question: which dictionary to do you guys use? How big can it be? Is 5 words acceptable? paul Le 21-oct.-09 à 21:23, Robert Muir a écrit : Paul, i think in general scoring should take care of this too, its all about your dictionary, same as the previous example.

XorReader?

2009-10-21 Thread Karl Wettin
Hi people, I have an application in which the users are allowed to make changes to the database, changes visible only to that user. I.e. they don't modify the original data, they create a clone of the original. When the user request the instance I retrieve the modified clone rather than t

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Robert Muir
Paul, i think in general scoring should take care of this too, its all about your dictionary, same as the previous example. this is because überwachungsgesetz matches 3 tokens: überwachungsgesetz, überwachung, gesetz but überwachung gesetz only matches 2. überwachungsgesetz 0.37040412 = (MATCH) su

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Robert Muir
just add them to the dictionary, the compound filter will do this automatically. if you want to tweak it even further, you can also tell compounds to NOT emit the subwords if they form a bigger compound with the onlyLongestMatch parameter i spoke of earlier. I haven't played with this option much

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Paul Libbrecht
Can the dictionary have weights? überwachungsgesetz alone probably needs a higher rank than überwachung and gesetzt or? paul Le 21-oct.-09 à 21:09, Benjamin Douglas a écrit : OK, that makes sense. So I just need to add all of the sub-compounds that are real words at posIncr=0, even if th

RE: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Benjamin Douglas
OK, that makes sense. So I just need to add all of the sub-compounds that are real words at posIncr=0, even if they are combinations of other sub-compounds. Thanks! -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, October 21, 2009 11:49 AM To: java-user@lu

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Robert Muir
yes, your dictionary :) if überwachungsgesetz is a real word, add it to your dictionary. for example, if your dictionary is { "Rind", "Fleisch", "Draht", "Schere", "Gesetz", "Aufgabe", "Überwachung" }, and you index Rindfleischüberwachungsgesetz, then all 3 queries will have the same score. but i

RE: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Benjamin Douglas
Thanks for all of the answers so far! Paul's question is similar to another aspect I am curious about: Given the way the sample word is analyzed, is there anything in the scoring mechanism that would rank "überwachungsgesetz" higher than "gesetzüberwachung" or "fleischgesetz"? -Original Me

Re: Parsing Error while indexing in Lucene WordNet package

2009-10-21 Thread Robert Muir
Hi, thanks again for reporting this. I created an issue here: http://issues.apache.org/jira/browse/LUCENE-2001 On Wed, Oct 21, 2009 at 2:05 AM, parag dave wrote: > While using the Lucene WordNet package, we found that the Syns2Index > program > indexes the Synsets wrongly. For example, looking u

Re: singular and plural search

2009-10-21 Thread Matthew Hall
If I recall correctly the highlighter also has an analyzer passed to it. Ensure that this is the same one as well. Matt m.harig wrote: Thanks erick , It works fine , if i use the (code snippet found from nabble) same analyzer for both indexing & querying . But the highlighter has gone f

Re: singular and plural search

2009-10-21 Thread m.harig
Thanks erick , It works fine , if i use the (code snippet found from nabble) same analyzer for both indexing & querying . But the highlighter has gone for plural words. Hope i need to search more , i'll come back to you once if i can't find out. Thanks again erick. -- View this message in

Re: singular and plural search

2009-10-21 Thread m.harig
thanks erick , A little more information would help here.1> Are you using the same analyzer at both index and query time? no . sorry , am using StandardAnalyzer at the index time , during querying am using the code snippet found from nabble. 2> Assuming <1> is "yes", did you re-index your data

Re: singular and plural search

2009-10-21 Thread Erick Erickson
A little more information would help here.1> Are you using the same analyzer at both index and query time? 2> Assuming <1> is "yes", did you re-index your data after you created this analyzer? 3> What are the results of query.toString()? Looking at that might help you pinpoint what's going on. 4> H

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Robert Muir
Paul, there are two implementations in compounds, one is dictionary-based, the other is hyphenation-grammar + dictionary (it restricts the decompounding based on hyphenation rules). You could also subclass the compound base class and implement your own. I haven't seen any user-measures (relevance,

Re: Parsing Error while indexing in Lucene WordNet package

2009-10-21 Thread Robert Muir
thanks, this sounds like a bug, I'll play with this today. On Wed, Oct 21, 2009 at 2:05 AM, parag dave wrote: > While using the Lucene WordNet package, we found that the Syns2Index > program > indexes the Synsets wrongly. For example, looking up the synsets for the > word "king", we get: > > java

singular and plural search

2009-10-21 Thread m.harig
hello all i've a doubt in plural & singular word searching , i've got code snippet from nabble forum , private static Analyzer createEnglishAnalyzer() { return new Analyzer() { public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result =

Re: Lucene 1.4.3 "Already closed" IOException

2009-10-21 Thread Michael McCandless
Make sure you are not closing the IndexSearcher while still using a Hits object obtained from it in the past. Hits goes back and re-runs the search if you iterate deep enough... Mike On Wed, Oct 21, 2009 at 5:39 AM, Ian Lea wrote: > 1.4.3?  How old is that?  Maybe time to consider an upgrade. >

Re: Lucene 1.4.3 "Already closed" IOException

2009-10-21 Thread Ian Lea
1.4.3? How old is that? Maybe time to consider an upgrade. Anyway, if you're getting that exception when creating a searcher I guess you are using a constructor that takes an IndexReader and a further guess would be that something has closed it. -- Ian. On Tue, Oct 20, 2009 at 6:41 PM, Zhang,

Re: Using org.apache.lucene.analysis.compound

2009-10-21 Thread Paul Libbrecht
I'm interested to this analyzer.. it had escaped me and solves an old problem! Could you report about its usage: - did you have to feed words in a dictionary? - does anyone have user-measures already? ... and the last question for the research fun: is there any approach towards preferring Üb