If you want to modify the way Lucene scores documents, I guess you need to
extend Similarity class and provide your own implementation. Take a look at:
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/DefaultSimilarity.html
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/
Hi Marquin,
So you have a field that you want to sort on, well thats pretty much a
straight task in lucene.
Sort sort = new Sort();
sort.setSort(, true/false);
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Sort.html
* It would not really be taxing and frequent updates could b
Hello,
Im using Lucene 2.4.1 and Im trying to use a custom HitCollector to collect
only the first N hits (not the best hits) for performance. I saw another
e-mail in this group where they mentioned writing a HitCollector which throws
an exception after N hits to do this. So I tried this approa
Hello everyone. New here to Lucene, amazing product, but takes some time to
wrap my head around it all heh.
So I have a basic question regarding indices. I have a "document" with
various fields (content, post date, last viewed). The majority of these
fields are quite constant, but clearly the
Only the results from the query should be sorted. The field
caches do get loaded for all values of a field though, is that
what you're seeing?
On Fri, Aug 21, 2009 at 4:09 PM, javaguy44 wrote:
>
> Hi Jason,
>
> Thanks for the advice.
>
> However I was just working through the example. I actually
Hi Jason,
Thanks for the advice.
However I was just working through the example. I actually don't want to
search on numbers / dates / geo etc and was looking at custom sorting.
It appears that custom sorting, or even sorting for that matter is not
useful if every document will have sorting app
Hey Jim:
I'm not sure if Standard Analyzer would do this for any such character,
unless you qualified the pre-pended string and a period so it fits Standard
Analyzer's definition for HOST.
However, if you want, you could add a Filter that you use after the Standard
Analyzer's Filters are done tha
Hi,
This is a kind of followup to a thread a couple of weeks ago.
In my indexer, I want to pre-pend a string to certain terms to make it easier
to search. So for example, if I have a string "XXX", I want to add, say,
"field1" to it, to get "field1XXX" before I index it.
To make it easier to s
Generally you should add javadoc, add/extend/modify test and eventually run
all test (they should pass ;) . Once you are there you can upload a patch
and add all necessary infos to the issue.
If you are not a committer you can not assign the issue to yourself afaik.
Make sure you meet compat polici
Hi
I have just submitted my first patch for lucene for the following issue
https://issues.apache.org/jira/browse/LUCENE-1787 , I built from ant
build does this mean the (flex code would have use the correct java
version) and updated test. Do I have to reassign the issue to myself or
to someon
How to give a customize score for all documents independent of the vector
model?
I already know how to give a customize score, but I want to give this
customize score for all documents in the collection, regardless of what is
relevant to the vector model.
How to do this?
Now, thanks!
--
Fabríc
Take a look at contrib/spatial.
On Fri, Aug 21, 2009 at 7:00 AM, javaguy44 wrote:
>
> Hi,
>
> I'm currently looking at sorting in lucene, and to get started I took a look
> at the distance sorting example from the Lucene in Action book.
>
> Working through the test DistanceSortingTest, I've notice
Hi,
I'm currently looking at sorting in lucene, and to get started I took a look
at the distance sorting example from the Lucene in Action book.
Working through the test DistanceSortingTest, I've noticed that performing
the SORT ends up sorting the whole index!
To test this I did the following:
On Fri, Aug 21, 2009 at 2:18 PM, Valery wrote:
>
>
> Simon Willnauer wrote:
>>
>> I already responded... again...
>>
> sorry, I've been in answering and seen your post right after sending.
>
>
> Simon Willnauer wrote:
>>
>> Tokenizer splits the input stream into tokens (Token.java) and
>> TokenFilt
Simon Willnauer wrote:
>
> I already responded... again...
>
sorry, I've been in answering and seen your post right after sending.
Simon Willnauer wrote:
>
> Tokenizer splits the input stream into tokens (Token.java) and
> TokenFilter subclasses operate on those. I expect from a Tokenizer
>
Valery,
FWIW, to answer this question, I think the answer is still "it depends".
I agree with John, I think it is much easier for your tokenizer to
create tokens that contain all the context you need for the downstream
filters to do their job.
I don't think you can put some exact specification on
Simon Willnauer wrote:
>
> you could do
> the whole job in a Tokenizer but this would not be a good separation
> of concerns right!?
>
right, it wouldn't be a good separation of concerns.
That's why I wanted to know what you consider as "Tokenizer's job".
--
View this message in context:
That code looks fine...
What OS/filesystem are you using?
Can you make a small test case that shows the issue?
Mike
On Thu, Aug 20, 2009 at 7:41 AM, Sumanta
Bhowmik wrote:
> We put all the IndexWriters in an array which is defined by
>
> final Directory[] finalDir;
>
> We also declare an indexe
On Fri, Aug 21, 2009 at 12:51 PM, Valery wrote:
>
> Hi John,
>
> (aren't you the same John Byrne who is a key contributor to the great
> OpenSSI project?)
>
>
> John Byrne-3 wrote:
>>
>> I'm inclined to disagree with the idea that a token should not be split
>> again downstream. I think that is act
Valery wrote:
Hi John,
(aren't you the same John Byrne who is a key contributor to the great
OpenSSI project?)
Nope, never heard of him! But with a great name like that I'm sure he'll
go a long way :)
John Byrne-3 wrote:
I'm inclined to disagree with the idea that a token should not be
Hi John,
(aren't you the same John Byrne who is a key contributor to the great
OpenSSI project?)
John Byrne-3 wrote:
>
> I'm inclined to disagree with the idea that a token should not be split
> again downstream. I think that is actually a much easier way to handle
> it. I would have the to
On Fri, Aug 21, 2009 at 10:26 AM, Valery wrote:
>
> Hi Simon,
>
>
> Simon Willnauer wrote:
>>
>> Valery, have you tried to use whitespaceTokenizer / CharTokenizer and
>> [...]?!
>>
>> simon
>>
>
> yes, I did, please find the info in the initial message. Here are the
> excerpts:
>
>
> Valery wrote:
Hi Simon,
Simon Willnauer wrote:
>
> Valery, have you tried to use whitespaceTokenizer / CharTokenizer and
> [...]?!
>
> simon
>
yes, I did, please find the info in the initial message. Here are the
excerpts:
Valery wrote:
>
> 2) WhitespaceTokenizer gives me a lot of lexems that are act
Hi Valery,
I'm inclined to disagree with the idea that a token should not be split
again downstream. I think that is actually a much easier way to handle
it. I would have the tokenizer return the longest match, and then split
it in a token filter. In fact I have dones this before and it has wo
24 matches
Mail list logo