Re: Query scoring

2009-04-16 Thread liat oren
Thanks for the answer. In Luke, I used the WhiteSpaceAnalyzer as well. The scores AND the explain method worked perfectly. In my application, I checked for the query - it contains the numbers splitted in different term queries so it prepares it well. Also the scoring is good. However the explain

readModifiedUTF8String stuck

2009-04-16 Thread MakMak
Please have a look at the following 2 stack traces: "[STUCK] ExecuteThread: '21' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=1 tid=0x0001018de580 nid=0x62 runnable [0xf

Re: Best way for paging with TopDocs class?

2009-04-16 Thread AlexElba
Why you don't extend to HitCollector and put all logic you need into it? Ivan Vasilev-2 wrote: > > Hi All, > > As Hits class was deprecated in current Lucene and is expected to be > excluded from Lucene 3.0 we decided to change our code so that to use > TopDocs class. > Our app provides p

Re: Lucene SnowBall unexpected behavior for some terms

2009-04-16 Thread AlexElba
I look thru source code for snowball. I think this bug does exist in previous version as well I asked in there mailing list no response so far. This is there demo page it has the same issue http://snowball.tartarus.org/demo.php I was trying to find there pattern for words which will not get lemm

Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens
Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford - To unsubscribe,

Re: Lucene SnowBall unexpected behavior for some terms

2009-04-16 Thread Chris Hostetter
: little bit about back compatibility with regards to contrib. ...which were noted in contrib/CHANGES.txt -Hoss - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h..

A Challenge!: Combining 2 searches into a single resultset?

2009-04-16 Thread rufus1
hi! I am trying to do something a little unique... I have a 90k text documents that I am trying to search Search A: indexes and searches the documents using regular relevancy search Search B: indexes and searches the documents using a smaller subset of "key" words that I have chosen This gives

Re: Best way for paging with TopDocs class?

2009-04-16 Thread Ivan Vasilev
OK Guys Thanks , Thanks for your help. I really think that paging without caching will be best for in case. I think in most cases users find results in the first page. When not, I think they would not not go through more than 2-3 more pages or just will narrow the search by adding more filter

Re: Lucene index sizes and performance

2009-04-16 Thread Michael Stoppelman
On Sat, Jul 7, 2007 at 8:19 PM, Chun Wei Ho wrote: > We are currently running a search service with a single Lucene index > of about 10 GB. We would like to find out: > > (a) What is the usual index size of everyone else? How large have > Lucene index gone in prodution environments, and is there

Google's search Appliance relevance ranking

2009-04-16 Thread Vasudevan Comandur
Hi, The question that I am posting in this group may be inappropriate and I want to apologize for that. My question is, what is the relevance ranking algorithm which is used in Google Search Appliance (GSA) because the search is predominantly on documents rather than web pages. I app

RE: Need help : SpanNearQuery

2009-04-16 Thread Steven A Rowe
Hi Radha, On 4/16/2009 at 8:35 AM, Radhalakshmi Sredharan wrote: > I have a question related to SpanNearQuery. > > I need a hit even if there are 2/3 terms found with the span being > applied for those 2 terms. > > Is there any custom implementation in place for this? I checked > SrndQuery but t

Re: Best way for paging with TopDocs class?

2009-04-16 Thread Erick Erickson
Well, under the covers, the old Hits object *was* reloading the first N pages to get page N + 1, you just didn't see it. Hits also had other, undesirable behaviors. But "loading docs N-1 times" it's not as expensive as you perhaps fear. To get a sorted list, you must sort the entire set of documen

RE: Best way for paging with TopDocs class?

2009-04-16 Thread Uwe Schindler
Hi Ivan, We had this discussion some time ago on this list, too. I explained another guy a possibility for TopDocs caching. Just search in the list archives! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message-

Re: Query scoring

2009-04-16 Thread Erick Erickson
Hmmm, try query.toString() and/or query.explain(). Also, try using Luke to see what is actually in the document. BTW, what analyzer did you use in Luke? Luke also has an explain (tab?) that will show you what Luke does, which may be useful. The default operator should be "OR", but looking at the

Best way for paging with TopDocs class?

2009-04-16 Thread Ivan Vasilev
Hi All, As Hits class was deprecated in current Lucene and is expected to be excluded from Lucene 3.0 we decided to change our code so that to use TopDocs class. Our app provides paging and now we are uondering what is the bset way to do it with th TopDocs. I can see only this possibility: 1.

Need help : SpanNearQuery

2009-04-16 Thread Radhalakshmi Sreedharan
Hi , I have a question related to SpanNearQuery. As of now, the SpanNearQuery has the constraint that all the terms need to present in the document. Eg : If my SpanNearQuery terms are ( ab,bc,cd) all of them need to be found within a span of "n" and unordered. But my requirement is similar t

Re: semi-infinite loop during merging

2009-04-16 Thread Michael McCandless
One question: are you using IndexWriter.close(false)? I wonder if there's some path whereby the merges fail to abort (and simply keep retrying) if you do that... More inlined below... On Thu, Apr 16, 2009 at 5:42 AM, Christiaan Fluit wrote: > I spent a lot of time on getting the stacktraces, bu

Re: Query scoring

2009-04-16 Thread liat oren
I wanted to add also that I index it tokenized and that when I use Luke to do this search, it gives the correct results. Should I run the query differntly than the way I do? 2009/4/16 liat oren > Hi, > > I try to understand why the following query gives the scoring below: > > document 1 : a b

RE: IndexReader and numDocs

2009-04-16 Thread Uwe Schindler
You must close or commit first in the IndexWriter. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: cbodeguilla [mailto:cbodegui...@gmail.com] > Sent: Thursday, April 16, 2009 1:31 PM > To: java-user@lucen

IndexReader and numDocs

2009-04-16 Thread cbodeguilla
Hello!!! My problem is that when I see the number or documents through IndexWriter I see the number is 1 document. If I open IndexReader, the numDocs sais there are no Documents...why???May u help me,please writerFS.addDocument(indexa.indexaFichero(archivo)); Sys

Query scoring

2009-04-16 Thread liat oren
Hi, I try to understand why the following query gives the scoring below: document 1 : a b c document 2 : g k a h u c 0.0 = (NON-MATCH) product of: 0.0 = (NON-MATCH) sum of: 0.0 = coord(0/3) 0.06155877 The query code is: IndexSearcher searcher = new IndexSearcher(path); Analyzer analy

FW: Binary indexing / query efficiency

2009-04-16 Thread Eger, Patrick
Resending, I think this got dropped by the list for some reason - Hi, was recently looking to incorporate Lucene for a simple "parametric"/"faceted" type search. The documents are very small, roughly 15 fields of short length (5-15 characters, generally strings and padded integers). When

Re: semi-infinite loop during merging

2009-04-16 Thread Christiaan Fluit
I spent a lot of time on getting the stacktraces, but JET seems to make this impossible. Ctrl-Break, connecting with JConsole, even a "Dump Threads" button in my UI that uses Threads.getAllStacktraces were not able to produce a dump of all threads. I just got an additional confirmation that th

Re: BitSet Filter ArrayIndexOutOfBoundsException?

2009-04-16 Thread Michael McCandless
On Wed, Apr 15, 2009 at 8:35 PM, Ryan McKinley wrote: > I have an operation that is quite expensive that I am hoping to run only > once for each time the index changes.  Is the Couldn't you simply run your expensive operation, on the IndexReader passed into getDocIdSet? Ie, just move your curre