date:20100222

Re: Scanning docs at index time

2010-02-22 Thread Apoorv Sharma

I don't know of classes which will be suitable but if they are ordered queries a simple code could easily be written. On Mon, Feb 22, 2010 at 9:59 PM, Nigel wrote: > I'd like to scan documents as they're being indexed, to find out > immediately > if any of them match certain queries. The goal i

Re: can IndexWriter.addIndexes de-dupe documents?

2010-02-22 Thread Erick Erickson

What sorts of rules would govern which one should be kept? Say you were adding three indexes and there was a document in each that was identical. Which one should be kept? I suspect any rule would be wrong at least part of the time FWIW Erick On Mon, Feb 22, 2010 at 5:02 PM, Michael McCandle

Re: can IndexWriter.addIndexes de-dupe documents?

2010-02-22 Thread Michael McCandless

addIndexes doesn't make this possible. Maybe add the indexes but then make a 2nd pass to dedup? Mike On Mon, Feb 22, 2010 at 4:26 PM, jchang wrote: > > When I call IndexWriter.addIndexes, is there anything I can do to make it > filter out duplicates based a certain field (or group of fields)?

can IndexWriter.addIndexes de-dupe documents?

2010-02-22 Thread jchang

When I call IndexWriter.addIndexes, is there anything I can do to make it filter out duplicates based a certain field (or group of fields)? If I know that the id field of the document is unique, can I make addIndexes know that if it finds a new document bat the same id, the new one is valid and

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan

I'm pretty sure there are flushes and segment merges going on, but as you said, that shouldn't affect the version increment. I'll see what I can do to get infoStream output. Thanks, Peter On Mon, Feb 22, 2010 at 2:30 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Well I'm at a loss

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Michael McCandless

Well I'm at a loss then. The version should only increment on commit. Can you make it all happen when infoStream is on, and post back? Mike On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan wrote: > Only one writer thread and one writer process. > I'm calling IndexWriter(Directory d, Analyzer a,

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan

Only one writer thread and one writer process. I'm calling IndexWriter(Directory d, Analyzer a, boolean create, MaxFieldLength mfl), which sets autocommit=false. Peter On Mon, Feb 22, 2010 at 12:24 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > That's curious. > > It's only on prep

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Michael McCandless

That's curious. It's only on prepareCommit (or, commit, if you didn't first prepare, since that will call prepareCommit internally) that this version should increase. Is there only 1 thread doing this? Oh, and, are you passing false for autoCommit? Mike On Mon, Feb 22, 2010 at 11:43 AM, Peter

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Jason Rutherglen

Peter, Perhaps other concurrent operations? Jason On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan wrote: > Using Lucene 2.9.1, I have the following pseudocode which gets repeated at > regular intervals: > > 1. FSDirectory dir = FSDirectory.open(java.io.File); > 2. dir.setLockFactory(new SingleIn

IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan

Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2. dir.setLockFactory(new SingleInstanceLockFactory()); 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) 4. writer.getReader(

Scanning docs at index time

2010-02-22 Thread Nigel

I'd like to scan documents as they're being indexed, to find out immediately if any of them match certain queries. The goal is to find out of there are any new hits for these queries as soon as possible, without re-searching the index over and over (which would be inefficient, and higher latency).

Re: range of scores : queryNorm()

2010-02-22 Thread Erick Erickson

Could you back up a step and tell us what the upper-level task you're trying to accomplish is? That is, why the partner wants the number? Because the raw score in Lucene is only relevant within that single query, and then only for ranking. The normalized score *is* in a fixed range already, betwee

Re: range of scores : queryNorm()

2010-02-22 Thread Ian Lea

> I have observed that even if we change boosting > drastically, scores are being normalized at the end because of > queryNorm value. Is there anything ( regarding to the queryNorm) that > we can rely on ? Dunno. > like score will always be under 10 No. > or some fixed value ? I think not. >

Re: Boost Problem (again), need example !

2010-02-22 Thread Erick Erickson

I still don't understand why a simple sort as suggested by Ian wouldn't work. It'd be a lot more reliable than fiddling with doc scores if you want a strict ordering on a particular field (make sure it's untokenized though). Erick On Mon, Feb 22, 2010 at 8:19 AM, pdaures wrote: > > It WORKS ! >

Re: PayloadNearSpanScorer explain method

2010-02-22 Thread Peter Keegan

Patch is in JIRA: LUCENE-2272 On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan wrote: > Yes, I will provide a patch. Our new proxy server has broken my access to > the svn repository, though :-( > > > On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll wrote: > >> That sounds reasonable. Patch? >> >>

RE: Boost Problem (again), need example !

2010-02-22 Thread pdaures

It WORKS ! Thank you so much, I spent a lot of time trying to do that, thank you again ! Uwe Schindler wrote: > > The simple fix for that is to wrap the subQuery using: new > ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score > is constant and the ValueSource only scores.

RE: Boost Problem (again), need example !

2010-02-22 Thread Uwe Schindler

The simple fix for that is to wrap the subQuery using: new ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score is constant and the ValueSource only scores. I recommend to use NumericField for indexing this boost (no storing needed, only indexing, precisionStep=Integer.MAX_V

Re: Boost Problem (again), need example !

2010-02-22 Thread Ian Lea

boostField needs to be indexed to be used in the FieldScoreQuery. Are you now using one of the the latest releases that Uwe mentioned, with fixes for CustomScoreQuery? And unless you provide your own implementation of CustomScoreQuery.customScore() I think that you are still not guaranteed to get

RE: Boost Problem (again), need example !

2010-02-22 Thread pdaures

HI ! Thank you for your help. I think I don't use CustomScoreQuery correctly when I do a "search". BooleanQuery combinedQuery = new BooleanQuery(); combinedQuery.add(textQuery, Occur.MUST); combinedQuery.add(titleQuery, Occur.MUST); CustomScoreQuery customQuery = new CustomScoreQuery(combinedQue

range of scores : queryNorm()

2010-02-22 Thread Smith G

Hello , I have observed that even if we change boosting drastically, scores are being normalized at the end because of queryNorm value. Is there anything ( regarding to the queryNorm) that we can rely on ? like score will always be under 10 or some fixed value ? The main objective is to p

RE: Boost Problem (again), need example !

2010-02-22 Thread Uwe Schindler

It's CustomScoreQuery in 2.9 and 3.0. Please wait for 2.9.2 and 3.0.1 for an important API change in this experimental query type to work correct with the new per-segment-search! You can test the release artifacts of both new versions here: http://people.apache.org/~uschindler/staging-area/luce

Re: Boost Problem (again), need example !

2010-02-22 Thread Ian Lea

Can't you simply sort by descending score (your score, not lucene's)? Seems to me that would give you what you are asking for. The setBoost() method is unlikely to work consistently because it only infuences the score rather than setting it. If your John Mickeal doc happens to have a higher lucen

Boost Problem (again), need example !

2010-02-22 Thread pdaures

Hi, I know that there are many topics about scoring issues, but I didn't find an answer in the topics. This is the problem : Imagine I'm a teacher, and I have to index all the results, comments and score about students. Student : String name (eg : John Smith) String comments : (eg: John is a good

Re: Scanning docs at index time

Re: can IndexWriter.addIndexes de-dupe documents?

Re: can IndexWriter.addIndexes de-dupe documents?

can IndexWriter.addIndexes de-dupe documents?

Re: IndexWriter.getReader.getVersion behavior

Re: IndexWriter.getReader.getVersion behavior

Re: IndexWriter.getReader.getVersion behavior

Re: IndexWriter.getReader.getVersion behavior

Re: IndexWriter.getReader.getVersion behavior

IndexWriter.getReader.getVersion behavior

Scanning docs at index time

Re: range of scores : queryNorm()

Re: range of scores : queryNorm()

Re: Boost Problem (again), need example !

Re: PayloadNearSpanScorer explain method

RE: Boost Problem (again), need example !

RE: Boost Problem (again), need example !

Re: Boost Problem (again), need example !

RE: Boost Problem (again), need example !

range of scores : queryNorm()

RE: Boost Problem (again), need example !

Re: Boost Problem (again), need example !

Boost Problem (again), need example !

23 matches

Site Navigation

Mail list logo

Footer information