get all tokens from index

2009-09-08 Thread m.harig
hello all , is there any way to get all tokens from my index ? please anyone suggest me -- View this message in context: http://www.nabble.com/get-all-tokens-from-index-tp25359411p25359411.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Extending Sort/FieldCache

2009-09-08 Thread Shai Erera
I didn't say we won't need CSF, but that at least conceptually, CSF and my sort-by-payload are the same. If however it turns out that CSF performs better, then I'll definitely switch my sort-by-payload package to use it. I thought that CSF is going to be implemented using payloads, but perhaps I'm

Re: Is there way to get complete start end matches to be first in the list ?

2009-09-08 Thread Shai Erera
I can think of a way where you rely solely on scores and therefore there is still chance to get results not ordered the way you want, but you can try it - run the query [foo bar OR "foo bar"^10]. That way, your first result should be scored by [foo], [bar] and ["foo bar"]. Also, the phrase is added

Re: Extending Sort/FieldCache

2009-09-08 Thread Yonik Seeley
On Sun, Sep 6, 2009 at 4:42 AM, Shai Erera wrote: >> I've resisted using payloads for this purpose in Solr because it felt >> like an interim hack until CSF is implemented. > > I don't see it as a hack, but as a proper use of a great feature in Lucene. It's proper use for an application perhaps, b

Re: Is there way to get complete start end matches to be first in the list ?

2009-09-08 Thread Paul Taylor
Michael Barbarelli wrote: What I do is run each entry in the hits collection through a home-rolled levenstein distance algorithm to obtain a score. Then I sort by score. On Sep 8, 2009 9:44 PM, "Paul Taylor" > wrote: Is there way to get complete start end matc

Re: Extending Sort/FieldCache

2009-09-08 Thread Chris Hostetter
: Thanks Mike. I did not phrase well my understanding of Cache reload. I : didn't mean literally as part of the reopen, but *because* of the reopen. : Because FieldCache is tied to an IndexReader instance, after reopen it gets : refreshed. If I keep my own Cache, I'll need to code that logic, and

Re: Filtering question/advice

2009-09-08 Thread Chris Hostetter
: Hi : I include a testcase to show what I am trying to do. Testcase number 3 : fails. the mailing list is finicky about attachments ... the best thing to do is to include your test case directly in the body of your email as plain text. : > I created a test case to test this solution and it wo

Re: Is there way to get complete start end matches to be first in the list ?

2009-09-08 Thread Michael Barbarelli
What I do is run each entry in the hits collection through a home-rolled levenstein distance algorithm to obtain a score. Then I sort by score. On Sep 8, 2009 9:44 PM, "Paul Taylor" wrote: Is there way to get complete start end matches to be first in the list We use Lucene to search song albums

Is there way to get complete start end matches to be first in the list ?

2009-09-08 Thread Paul Taylor
Is there way to get complete start end matches to be first in the list We use Lucene to search song albums titles typically one to ten words long. If the user enter something like 'foo bar' everything that contains foo bar is returned with max score , thats fine but it would be better if an ex

Re: New "Stream closed" exception with Java 6

2009-09-08 Thread Mark Miller
Chris Hostetter wrote: > : I'm coming to the same conclusion - there must be >1 threads accessing this > index at the same time. Better go figure it out ... :-) > > careful about your assumptions ... you could get this same type of > exception even with only one thread, the stream that's being

RE: New "Stream closed" exception with Java 6

2009-09-08 Thread Chris Hostetter
: I'm coming to the same conclusion - there must be >1 threads accessing this index at the same time. Better go figure it out ... :-) careful about your assumptions ... you could get this same type of exception even with only one thread, the stream that's being closed isn't internal to Luce

Combining hits from multiple documents into a single hit

2009-09-08 Thread Adrian Banks
[I originally posted this to the Lucene.net mailing list,but it was suggested that I might have more luck here] I am trying to get a particular search to work and it is proving problematic. The actual source data is quite complex but can be summarised by the following example: I have arti

Re: large document with multiple fields performance

2009-09-08 Thread Anshum
Hey Steve, I'd suggest you go with the 20 fields (Non normalized) model. I've used much larger models and they happen to work just fine. Wouldnt be a point increasing the complexity. Hope that clarifies things a little atleast :) -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts

How to avoid huge index files

2009-09-08 Thread Dvora
Hello, I'm using Lucene2.4. I'm developing a web application that using Lucene (via compass) to do the searches. I'm intending to deploy the application in Google App Engine (http://code.google.com/appengine/), which limits files length to be smaller than 10MB. I've read about the various policie

RE: large document with multiple fields performance

2009-09-08 Thread Stephen Greene
Hi Anshum, Thank you for your reply. I have two options I am considering. One would be: Document { String projectID; String generalComment; String workHistoryComment; String environmentalComment; String claimsComment; ... } And the document may cont

Re: large document with multiple fields performance

2009-09-08 Thread Anshum
Hi Stephen, Could you clarify more on the requirement. Do you intend to have data in index as: Document{ String Comment; String CommentId; String ProjectId; } How do you intend to index it.. as in the doc structure? Is there a primary key there? What would you search on? What would you want

Re: Best way to understand the "*.frq" file?

2009-09-08 Thread Grant Ingersoll
On Sep 7, 2009, at 12:10 PM, 関 磊 wrote: Hello dears, I an studying the index format of lucene. But, I really cannot understand the format in "*.frq" file. http://lucene.apache.org/java/2_4_1/fileformats.html -- Grant Ingersoll http://www.lucidimagination.com/ Searc

large document with multiple fields performance

2009-09-08 Thread Stephen Greene
Hello, I am new to lucene and building an application which requires documents with many fields to be searched. A "project" id is being stored (not_analyzed) and all matching project ids will be returned to be used to join other data from a database. Will it provide better performance to stor