Re: How to give a score for all documents?

2009-08-21 Thread prashant ullegaddi
If you want to modify the way Lucene scores documents, I guess you need to extend Similarity class and provide your own implementation. Take a look at: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DefaultSimilarity.html http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/

Re: Lucene index question

2009-08-21 Thread Anshum
Hi Marquin, So you have a field that you want to sort on, well thats pretty much a straight task in lucene. Sort sort = new Sort(); sort.setSort(, true/false); http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Sort.html * It would not really be taxing and frequent updates could b

Using HitCollector to Collect First N Hits

2009-08-21 Thread Len Takeuchi
Hello, I’m using Lucene 2.4.1 and I’m trying to use a custom HitCollector to collect only the first N hits (not the best hits) for performance. I saw another e-mail in this group where they mentioned writing a HitCollector which throws an exception after N hits to do this. So I tried this approa

Lucene index question

2009-08-21 Thread marquinhocb
Hello everyone. New here to Lucene, amazing product, but takes some time to wrap my head around it all heh. So I have a basic question regarding indices. I have a "document" with various fields (content, post date, last viewed). The majority of these fields are quite constant, but clearly the

Re: Lucene SORT does a sort on entire index..how do I filter SORT?

2009-08-21 Thread Jason Rutherglen
Only the results from the query should be sorted. The field caches do get loaded for all values of a field though, is that what you're seeing? On Fri, Aug 21, 2009 at 4:09 PM, javaguy44 wrote: > > Hi Jason, > > Thanks for the advice. > > However I was just working through the example.  I actually

Re: Lucene SORT does a sort on entire index..how do I filter SORT?

2009-08-21 Thread javaguy44
Hi Jason, Thanks for the advice. However I was just working through the example. I actually don't want to search on numbers / dates / geo etc and was looking at custom sorting. It appears that custom sorting, or even sorting for that matter is not useful if every document will have sorting app

Re: Are there any non-alpha/numeric character that StandardAnalyzer won't treat as break?

2009-08-21 Thread K. M. McCormick
Hey Jim: I'm not sure if Standard Analyzer would do this for any such character, unless you qualified the pre-pended string and a period so it fits Standard Analyzer's definition for HOST. However, if you want, you could add a Filter that you use after the Standard Analyzer's Filters are done tha

Are there any non-alpha/numeric character that StandardAnalyzer won't treat as break?

2009-08-21 Thread ohaya
Hi, This is a kind of followup to a thread a couple of weeks ago. In my indexer, I want to pre-pend a string to certain terms to make it easier to search. So for example, if I have a string "XXX", I want to add, say, "field1" to it, to get "field1XXX" before I index it. To make it easier to s

Re: Have I done this right, Patch Submission for Lucene

2009-08-21 Thread Simon Willnauer
Generally you should add javadoc, add/extend/modify test and eventually run all test (they should pass ;) . Once you are there you can upload a patch and add all necessary infos to the issue. If you are not a committer you can not assign the issue to yourself afaik. Make sure you meet compat polici

Have I done this right, Patch Submission for Lucene

2009-08-21 Thread Paul Taylor
Hi I have just submitted my first patch for lucene for the following issue https://issues.apache.org/jira/browse/LUCENE-1787 , I built from ant build does this mean the (flex code would have use the correct java version) and updated test. Do I have to reassign the issue to myself or to someon

How to give a score for all documents?

2009-08-21 Thread Fabrício Raphael
How to give a customize score for all documents independent of the vector model? I already know how to give a customize score, but I want to give this customize score for all documents in the collection, regardless of what is relevant to the vector model. How to do this? Now, thanks! -- Fabríc

Re: Lucene SORT does a sort on entire index..how do I filter SORT?

2009-08-21 Thread Jason Rutherglen
Take a look at contrib/spatial. On Fri, Aug 21, 2009 at 7:00 AM, javaguy44 wrote: > > Hi, > > I'm currently looking at sorting in lucene, and to get started I took a look > at the distance sorting example from the Lucene in Action book. > > Working through the test DistanceSortingTest, I've notice

Lucene SORT does a sort on entire index..how do I filter SORT?

2009-08-21 Thread javaguy44
Hi, I'm currently looking at sorting in lucene, and to get started I took a look at the distance sorting example from the Lucene in Action book. Working through the test DistanceSortingTest, I've noticed that performing the SORT ends up sorting the whole index! To test this I did the following:

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Simon Willnauer
On Fri, Aug 21, 2009 at 2:18 PM, Valery wrote: > > > Simon Willnauer wrote: >> >> I already responded... again... >> > sorry, I've been in answering and seen your post right after sending. > > > Simon Willnauer wrote: >> >> Tokenizer splits the input stream into tokens (Token.java) and >> TokenFilt

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
Simon Willnauer wrote: > > I already responded... again... > sorry, I've been in answering and seen your post right after sending. Simon Willnauer wrote: > > Tokenizer splits the input stream into tokens (Token.java) and > TokenFilter subclasses operate on those. I expect from a Tokenizer >

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Robert Muir
Valery, FWIW, to answer this question, I think the answer is still "it depends". I agree with John, I think it is much easier for your tokenizer to create tokens that contain all the context you need for the downstream filters to do their job. I don't think you can put some exact specification on

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
Simon Willnauer wrote: > > you could do > the whole job in a Tokenizer but this would not be a good separation > of concerns right!? > right, it wouldn't be a good separation of concerns. That's why I wanted to know what you consider as "Tokenizer's job". -- View this message in context:

Re: Merge Exception in Lucene 2.4

2009-08-21 Thread Michael McCandless
That code looks fine... What OS/filesystem are you using? Can you make a small test case that shows the issue? Mike On Thu, Aug 20, 2009 at 7:41 AM, Sumanta Bhowmik wrote: > We put all the IndexWriters in an array which is defined by > > final Directory[] finalDir; > > We also declare an indexe

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Simon Willnauer
On Fri, Aug 21, 2009 at 12:51 PM, Valery wrote: > > Hi John, > > (aren't you the same John Byrne who is a key contributor to the great > OpenSSI project?) > > > John Byrne-3 wrote: >> >> I'm inclined to disagree with the idea that a token should not be split >> again downstream. I think that is act

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread John Byrne
Valery wrote: Hi John, (aren't you the same John Byrne who is a key contributor to the great OpenSSI project?) Nope, never heard of him! But with a great name like that I'm sure he'll go a long way :) John Byrne-3 wrote: I'm inclined to disagree with the idea that a token should not be

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
Hi John, (aren't you the same John Byrne who is a key contributor to the great OpenSSI project?) John Byrne-3 wrote: > > I'm inclined to disagree with the idea that a token should not be split > again downstream. I think that is actually a much easier way to handle > it. I would have the to

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Simon Willnauer
On Fri, Aug 21, 2009 at 10:26 AM, Valery wrote: > > Hi Simon, > > > Simon Willnauer wrote: >> >> Valery, have you tried to use whitespaceTokenizer / CharTokenizer and >> [...]?! >> >> simon >> > > yes, I did, please find the info in the initial message. Here are the > excerpts: > > > Valery wrote:

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
Hi Simon, Simon Willnauer wrote: > > Valery, have you tried to use whitespaceTokenizer / CharTokenizer and > [...]?! > > simon > yes, I did, please find the info in the initial message. Here are the excerpts: Valery wrote: > > 2) WhitespaceTokenizer gives me a lot of lexems that are act

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread John Byrne
Hi Valery, I'm inclined to disagree with the idea that a token should not be split again downstream. I think that is actually a much easier way to handle it. I would have the tokenizer return the longest match, and then split it in a token filter. In fact I have dones this before and it has wo