Hi all..
I had a question related to the write locks created by Lucene.
I use Lucene 2.3.2. Will this newwer version create locks while indexing as
older ones?
or is there any other way that lucene handles its operations?
And my another doubt is that i use JMS for lucene indexing.
My App server w
Erick,
example,
IndexWriter writer = new IndexWriter("C:/index",new
StandardAnalyzer(),true);
String records = "Lucene" +" " +"action"+" "+"book" ;
Document doc = new Document();
doc.add(new
Field("contents",records,Field.Store.YES,Field.Index.TOKENIZED));
writer.addDocument(doc);
writer.op
<<>>
This not strictly true. For instance, stop words aren't even indexed.
Reconstructing a document from the index is very expensive
(see Luke for examples of how this is done).
You can get the text back verbatim if you store it in your index. See
Field.Store.YES (or Field.Store.COMPRESS). Stora
You can't with that call. You have to make one that uses a
HitCollector, and your hit collector needs to be interruptable and it
probably needs to handle your sorting. Sounds like a nice
contribution/patch.
Sorry, I can't offer a better solution.
-Grant
On Jul 22, 2008, at 2:48 PM, Paul
no, at the moment you can not make pure boolean queries. But 1.5 seconds on
10Mio document sounds a bit too much (we have well under 200mS on 150Mio
collection) what you can do:
1. use Filter for high frequency terms, e.g. via ConstantScoreQuery as much as
you can, but you have to cache them (C
I need to execute a boolean query and get back just the bits of all the
matching documents. I do additional filtering (date ranges and entitlements)
and then do my own sorting later on. I know that using QueryFilter.Bits() will
still compute scores for all matching documents. I do not want to
If I'm calling:
IndexSearcher.search( query, sortOrder );
how, exactly, can I do what you suggest? *That* call is what I want
to interrupt.
- Paul
On Jul 18, 2008, at 3:51 AM, Grant Ingersoll wrote:
True, but I think the approach is similar, in that you need to have
the hit col
Could any one tell me please how to print the content of the document after
reading the index.
for example if i like to print the index terms then i do :
IndexReader ir = IndexReader.open(index);
TermEnum termEnum = ir.terms();
while (termEnum.next()) {
TermDocs dok =
NP, if my original reply had included my second one, then you'd have
known what I was talking about ...
I *love* it when I unknowingly demonstrate the issue I'm trying to clarify
.
Best
Erick
On Tue, Jul 22, 2008 at 2:09 PM, mark harwood <[EMAIL PROTECTED]>
wrote:
> >>Well, the point of my ques
>>Well, the point of my question was to insure that we were all using common
>>terms.
Sorry, Erick. I thought your "define duplicate" question was asking me about
DuplicateFilter's concept of duplicates rather than asking the original poster
about his notion of what a duplicate document meant t
You may also want a Document cache and or even a Query cache,
depending on your situation.
-Grant
On Jul 21, 2008, at 11:49 PM, Yonik Seeley wrote:
On Mon, Jul 21, 2008 at 11:27 PM, blazingwolf7
<[EMAIL PROTECTED]> wrote:
I am using Lucene to perform searching. I have certain information
Absolutely!
Thanks Steven.
Best Regards,
Martin
Steven A Rowe wrote:
>
> Hi Martin,
>
> On 07/22/2008 at 5:48 AM, mpermar wrote:
>> I want to index some incoming text. In this case what I want
>> to do is just detect keywords in that text. Therefore I want
>> to discard everything that is n
<>
I haven't ever tried, so I don't know ... But my
poor memory doesn't bring any to mind
Best
Erick
On Tue, Jul 22, 2008 at 9:53 AM, <[EMAIL PROTECTED]> wrote:
> lower-casing worked...tx...but is there a way of escaping them like we use
> escape characters in java!
>
> Regards,
> Aravind R
Can you post the Python sources of the Lucene part of your application?
One thing to check is how the JRE is being instantiated from Python,
ie, what the equivalent setting is for -Xmx (= max heap size). It's
possible the 140 MB consumption is actually "OK" as far as the JRE is
concerned,
Hi Martin,
On 07/22/2008 at 5:48 AM, mpermar wrote:
> I want to index some incoming text. In this case what I want
> to do is just detect keywords in that text. Therefore I want
> to discard everything that is not in the keywords set. This
> sounds to me pretty much like the reverse of using stop
lower-casing worked...tx...but is there a way of escaping them like we use
escape characters in java!
Regards,
Aravind R Yarram
Enabling Technologies
Equifax Information Services LLC
1525 Windward Concourse, J42E
Alpharetta, GA 30005
desk: 770 740 6951
email: [EMAIL PROTECTED]
"Erick Erickso
I looking for sample code that would do the following :
On the first page a parametric Fields
Topics
ALL
Births, Marriages and Death (1200) - Major Category
- Divorces in Canada (750) - sub category
- Deaths (450)
Have you tried lower-casing them? To be treated as an operator, they
must be upper cased.
But be careful that, when you lower-case them, your query analyzer doesn't
treat them as stop words
Best
Erick
On Tue, Jul 22, 2008 at 9:28 AM, <[EMAIL PROTECTED]> wrote:
> helo all,
>
> In my project,
Well, the point of my question was to insure that we were all using common
terms. For all we know, the original questioner considered "duplicate"
records ones that had identical, or even similar text. Nothing in the
original question indicated any de-dup happening.
I've often found that assumption
helo all,
In my project, we are indexing the US states...when we try to search on
oregon ; state:OR, search on OR is throwing err...i know OR is a logical
op in lucene...is there a way to escape such keywords?
tx!
Regards,
Aravind R Yarram
Enabling Technologies
Equifax Information Services LL
helo all,
In my project, we are indexing the US states...when we try to search on
oregon ; state:OR, search on OR is throwing err...i know OR is a logical
op in lucene...is there a way to escape such keywords?
tx!
Regards,
Aravind R Yarram
Enabling Technologies
Equifax Information Services LL
Hi all,
I am using *Lucene* 2.3.1 and JCC 1.6 to create an *index* of my
python-based application(for searching).Everything is working fine.After
some time(3 hours later) i found my python memory consumptions is grown to
high when i started the applcaition(indexing) the python consumption is 40
m
Hi All,
I want to index some incoming text. In this case what I want to do is just
detect keywords in that text. Therefore I want to discard everything that is
not in the keywords set. This sounds to me pretty much like the reverse of
using stop words, that is it I want to use a set of "accepted
23 matches
Mail list logo