I don't know of classes which will be suitable but if they are ordered
queries a simple code could easily be written.
On Mon, Feb 22, 2010 at 9:59 PM, Nigel wrote:
> I'd like to scan documents as they're being indexed, to find out
> immediately
> if any of them match certain queries. The goal i
What sorts of rules would govern which one should be
kept? Say you were adding three indexes and there
was a document in each that was identical. Which one
should be kept?
I suspect any rule would be wrong at least part of
the time
FWIW
Erick
On Mon, Feb 22, 2010 at 5:02 PM, Michael McCandle
addIndexes doesn't make this possible.
Maybe add the indexes but then make a 2nd pass to dedup?
Mike
On Mon, Feb 22, 2010 at 4:26 PM, jchang wrote:
>
> When I call IndexWriter.addIndexes, is there anything I can do to make it
> filter out duplicates based a certain field (or group of fields)?
When I call IndexWriter.addIndexes, is there anything I can do to make it
filter out duplicates based a certain field (or group of fields)? If I
know that the id field of the document is unique, can I make addIndexes know
that if it finds a new document bat the same id, the new one is valid and
I'm pretty sure there are flushes and segment merges going on, but as you
said, that shouldn't affect the version increment. I'll see what I can do to
get infoStream output.
Thanks,
Peter
On Mon, Feb 22, 2010 at 2:30 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> Well I'm at a loss
Well I'm at a loss then. The version should only increment on commit.
Can you make it all happen when infoStream is on, and post back?
Mike
On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan wrote:
> Only one writer thread and one writer process.
> I'm calling IndexWriter(Directory d, Analyzer a,
Only one writer thread and one writer process.
I'm calling IndexWriter(Directory d, Analyzer a, boolean create,
MaxFieldLength mfl), which sets autocommit=false.
Peter
On Mon, Feb 22, 2010 at 12:24 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> That's curious.
>
> It's only on prep
That's curious.
It's only on prepareCommit (or, commit, if you didn't first prepare,
since that will call prepareCommit internally) that this version
should increase.
Is there only 1 thread doing this?
Oh, and, are you passing false for autoCommit?
Mike
On Mon, Feb 22, 2010 at 11:43 AM, Peter
Peter,
Perhaps other concurrent operations?
Jason
On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan wrote:
> Using Lucene 2.9.1, I have the following pseudocode which gets repeated at
> regular intervals:
>
> 1. FSDirectory dir = FSDirectory.open(java.io.File);
> 2. dir.setLockFactory(new SingleIn
Using Lucene 2.9.1, I have the following pseudocode which gets repeated at
regular intervals:
1. FSDirectory dir = FSDirectory.open(java.io.File);
2. dir.setLockFactory(new SingleInstanceLockFactory());
3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen)
4. writer.getReader(
I'd like to scan documents as they're being indexed, to find out immediately
if any of them match certain queries. The goal is to find out of there are
any new hits for these queries as soon as possible, without re-searching the
index over and over (which would be inefficient, and higher latency).
Could you back up a step and tell us what the upper-level
task you're trying to accomplish is? That is, why the partner
wants the number?
Because the raw score in Lucene is only relevant within that
single query, and then only for ranking. The normalized score
*is* in a fixed range already, betwee
> I have observed that even if we change boosting
> drastically, scores are being normalized at the end because of
> queryNorm value. Is there anything ( regarding to the queryNorm) that
> we can rely on ?
Dunno.
> like score will always be under 10
No.
> or some fixed value ?
I think not.
>
I still don't understand why a simple sort as suggested by Ian wouldn't
work.
It'd be a lot more reliable than fiddling with doc scores if you want a
strict
ordering on a particular field (make sure it's untokenized though).
Erick
On Mon, Feb 22, 2010 at 8:19 AM, pdaures wrote:
>
> It WORKS !
>
Patch is in JIRA: LUCENE-2272
On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan wrote:
> Yes, I will provide a patch. Our new proxy server has broken my access to
> the svn repository, though :-(
>
>
> On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll wrote:
>
>> That sounds reasonable. Patch?
>>
>>
It WORKS !
Thank you so much, I spent a lot of time trying to do that, thank you again
!
Uwe Schindler wrote:
>
> The simple fix for that is to wrap the subQuery using: new
> ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score
> is constant and the ValueSource only scores.
The simple fix for that is to wrap the subQuery using: new
ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score is
constant and the ValueSource only scores.
I recommend to use NumericField for indexing this boost (no storing needed,
only indexing, precisionStep=Integer.MAX_V
boostField needs to be indexed to be used in the FieldScoreQuery.
Are you now using one of the the latest releases that Uwe mentioned,
with fixes for CustomScoreQuery?
And unless you provide your own implementation of
CustomScoreQuery.customScore() I think that you are still not
guaranteed to get
HI !
Thank you for your help.
I think I don't use CustomScoreQuery correctly when I do a "search".
BooleanQuery combinedQuery = new BooleanQuery();
combinedQuery.add(textQuery, Occur.MUST);
combinedQuery.add(titleQuery, Occur.MUST);
CustomScoreQuery customQuery = new CustomScoreQuery(combinedQue
Hello ,
I have observed that even if we change boosting
drastically, scores are being normalized at the end because of
queryNorm value. Is there anything ( regarding to the queryNorm) that
we can rely on ? like score will always be under 10 or some fixed
value ? The main objective is to p
It's CustomScoreQuery in 2.9 and 3.0.
Please wait for 2.9.2 and 3.0.1 for an important API change in this
experimental query type to work correct with the new per-segment-search! You
can test the release artifacts of both new versions here:
http://people.apache.org/~uschindler/staging-area/luce
Can't you simply sort by descending score (your score, not lucene's)?
Seems to me that would give you what you are asking for.
The setBoost() method is unlikely to work consistently because it only
infuences the score rather than setting it. If your John Mickeal doc
happens to have a higher lucen
Hi,
I know that there are many topics about scoring issues, but I didn't find an
answer in the topics.
This is the problem :
Imagine I'm a teacher, and I have to index all the results, comments and
score about students.
Student :
String name (eg : John Smith)
String comments : (eg: John is a good
23 matches
Mail list logo