Lucene write locks

2008-07-22 Thread Sandeep K
Hi all.. I had a question related to the write locks created by Lucene. I use Lucene 2.3.2. Will this newwer version create locks while indexing as older ones? or is there any other way that lucene handles its operations? And my another doubt is that i use JMS for lucene indexing. My App server w

Re: How to avoid duplicate records in lucene

2008-07-22 Thread Sebastin
Erick, example, IndexWriter writer = new IndexWriter("C:/index",new StandardAnalyzer(),true); String records = "Lucene" +" " +"action"+" "+"book" ; Document doc = new Document(); doc.add(new Field("contents",records,Field.Store.YES,Field.Index.TOKENIZED)); writer.addDocument(doc); writer.op

Re: storing the contents of a document in the lucene index

2008-07-22 Thread Erick Erickson
<<>> This not strictly true. For instance, stop words aren't even indexed. Reconstructing a document from the index is very expensive (see Luke for examples of how this is done). You can get the text back verbatim if you store it in your index. See Field.Store.YES (or Field.Store.COMPRESS). Stora

Re: Interrupting a query

2008-07-22 Thread Grant Ingersoll
You can't with that call. You have to make one that uses a HitCollector, and your hit collector needs to be interruptable and it probably needs to handle your sorting. Sounds like a nice contribution/patch. Sorry, I can't offer a better solution. -Grant On Jul 22, 2008, at 2:48 PM, Paul

Re: Fastest way to get just the "bits" of matching documents

2008-07-22 Thread eks dev
no, at the moment you can not make pure boolean queries. But 1.5 seconds on 10Mio document sounds a bit too much (we have well under 200mS on 150Mio collection) what you can do: 1. use Filter for high frequency terms, e.g. via ConstantScoreQuery as much as you can, but you have to cache them (C

Fastest way to get just the "bits" of matching documents

2008-07-22 Thread Robert Stewart
I need to execute a boolean query and get back just the bits of all the matching documents. I do additional filtering (date ranges and entitlements) and then do my own sorting later on. I know that using QueryFilter.Bits() will still compute scores for all matching documents. I do not want to

Re: Interrupting a query

2008-07-22 Thread Paul J. Lucas
If I'm calling: IndexSearcher.search( query, sortOrder ); how, exactly, can I do what you suggest? *That* call is what I want to interrupt. - Paul On Jul 18, 2008, at 3:51 AM, Grant Ingersoll wrote: True, but I think the approach is similar, in that you need to have the hit col

storing the contents of a document in the lucene index

2008-07-22 Thread starz10de
Could any one tell me please how to print the content of the document after reading the index. for example if i like to print the index terms then i do : IndexReader ir = IndexReader.open(index); TermEnum termEnum = ir.terms(); while (termEnum.next()) { TermDocs dok =

Re: How to avoid duplicate records in lucene

2008-07-22 Thread Erick Erickson
NP, if my original reply had included my second one, then you'd have known what I was talking about ... I *love* it when I unknowingly demonstrate the issue I'm trying to clarify . Best Erick On Tue, Jul 22, 2008 at 2:09 PM, mark harwood <[EMAIL PROTECTED]> wrote: > >>Well, the point of my ques

Re: How to avoid duplicate records in lucene

2008-07-22 Thread mark harwood
>>Well, the point of my question was to insure that we were all using common >>terms. Sorry, Erick. I thought your "define duplicate" question was asking me about DuplicateFilter's concept of duplicates rather than asking the original poster about his notion of what a duplicate document meant t

Re: Storing information

2008-07-22 Thread Grant Ingersoll
You may also want a Document cache and or even a Query cache, depending on your situation. -Grant On Jul 21, 2008, at 11:49 PM, Yonik Seeley wrote: On Mon, Jul 21, 2008 at 11:27 PM, blazingwolf7 <[EMAIL PROTECTED]> wrote: I am using Lucene to perform searching. I have certain information

RE: Opposite to StopFilter. Anything already implemented out there?

2008-07-22 Thread mpermar
Absolutely! Thanks Steven. Best Regards, Martin Steven A Rowe wrote: > > Hi Martin, > > On 07/22/2008 at 5:48 AM, mpermar wrote: >> I want to index some incoming text. In this case what I want >> to do is just detect keywords in that text. Therefore I want >> to discard everything that is n

Re: escaping logical operators such as OR AND

2008-07-22 Thread Erick Erickson
<> I haven't ever tried, so I don't know ... But my poor memory doesn't bring any to mind Best Erick On Tue, Jul 22, 2008 at 9:53 AM, <[EMAIL PROTECTED]> wrote: > lower-casing worked...tx...but is there a way of escaping them like we use > escape characters in java! > > Regards, > Aravind R

Re: Memory leaks during indexing.

2008-07-22 Thread Michael McCandless
Can you post the Python sources of the Lucene part of your application? One thing to check is how the JRE is being instantiated from Python, ie, what the equivalent setting is for -Xmx (= max heap size). It's possible the 140 MB consumption is actually "OK" as far as the JRE is concerned,

RE: Opposite to StopFilter. Anything already implemented out there?

2008-07-22 Thread Steven A Rowe
Hi Martin, On 07/22/2008 at 5:48 AM, mpermar wrote: > I want to index some incoming text. In this case what I want > to do is just detect keywords in that text. Therefore I want > to discard everything that is not in the keywords set. This > sounds to me pretty much like the reverse of using stop

Re: escaping logical operators such as OR AND

2008-07-22 Thread Aravind . Yarram
lower-casing worked...tx...but is there a way of escaping them like we use escape characters in java! Regards, Aravind R Yarram Enabling Technologies Equifax Information Services LLC 1525 Windward Concourse, J42E Alpharetta, GA 30005 desk: 770 740 6951 email: [EMAIL PROTECTED] "Erick Erickso

Parametric/faceted Searching

2008-07-22 Thread WY-LAC
I looking for sample code that would do the following : On the first page a parametric Fields Topics ALL Births, Marriages and Death (1200) - Major Category - Divorces in Canada (750) - sub category - Deaths (450)

Re: escaping logical operators such as OR AND

2008-07-22 Thread Erick Erickson
Have you tried lower-casing them? To be treated as an operator, they must be upper cased. But be careful that, when you lower-case them, your query analyzer doesn't treat them as stop words Best Erick On Tue, Jul 22, 2008 at 9:28 AM, <[EMAIL PROTECTED]> wrote: > helo all, > > In my project,

Re: How to avoid duplicate records in lucene

2008-07-22 Thread Erick Erickson
Well, the point of my question was to insure that we were all using common terms. For all we know, the original questioner considered "duplicate" records ones that had identical, or even similar text. Nothing in the original question indicated any de-dup happening. I've often found that assumption

escaping logical operators such as OR AND

2008-07-22 Thread Aravind . Yarram
helo all, In my project, we are indexing the US states...when we try to search on oregon ; state:OR, search on OR is throwing err...i know OR is a logical op in lucene...is there a way to escape such keywords? tx! Regards, Aravind R Yarram Enabling Technologies Equifax Information Services LL

escaping logical operators such as OR AND

2008-07-22 Thread Aravind . Yarram
helo all, In my project, we are indexing the US states...when we try to search on oregon ; state:OR, search on OR is throwing err...i know OR is a logical op in lucene...is there a way to escape such keywords? tx! Regards, Aravind R Yarram Enabling Technologies Equifax Information Services LL

Memory leaks during indexing.

2008-07-22 Thread Antony Joseph
Hi all, I am using *Lucene* 2.3.1 and JCC 1.6 to create an *index* of my python-based application(for searching).Everything is working fine.After some time(3 hours later) i found my python memory consumptions is grown to high when i started the applcaition(indexing) the python consumption is 40 m

Opposite to StopFilter. Anything already implemented out there?

2008-07-22 Thread mpermar
Hi All, I want to index some incoming text. In this case what I want to do is just detect keywords in that text. Therefore I want to discard everything that is not in the keywords set. This sounds to me pretty much like the reverse of using stop words, that is it I want to use a set of "accepted