Chris Hostetter wrote:
if you are using a HitCollector, there any re-evaluation is going to
happen in your code using whatever mechanism you want -- once your collect
method is called on a docid, Lucene is done with that docid and no longer
cares about it ... it's only whatever storage you may b
Well, I though to use the PerFieldAnalyzerWrapper which contains as basic the
snowballAnalyzer with English stopwords and use snowballAnalyzer with language
specific keywords for the fields which will be in different languages. But I'm
seeing that in your MemoryIndexTest you commented the use of
This may be of interest:
http://issues.apache.org/jira/browse/LUCENE-474
Cheers
Mark
- Original Message
From: Ryan McKinley <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 23 March, 2007 3:25:02 AM
Subject: Re: How can I index Phrases in Lucene?
Is there any way to fi
"SK R" <[EMAIL PROTECTED]> wrote:
> If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first 100
> segments will be merged in RAMDir when 100 docs arrived. At the end of
> 350th
> doc added to writer , RAMDir have 2 merged segment files + 50 seperate
> segment files not merged together
23 mar 2007 kl. 09.57 skrev Melanie Langlois:
Well, I though to use the PerFieldAnalyzerWrapper which contains as
basic the snowballAnalyzer with English stopwords and use
snowballAnalyzer with language specific keywords for the fields
which will be in different languages. But I'm seeing t
Please clarify the following.
1.When will be the segments in RAMDirectory moved (flushed) in to
FSDirectory?
2.Segments creation by maxBufferedDocs occur in RAMDir. Where merge by
MergeFactor happen? whether in RAMDir or FSDir?
Thanks in Advance
RSK
On 3/23/07, Michael McCandless <[EM
I haven't used it yet, but I've seen several references to
IndexWriter.ramSizeInBytes() and using it to control when the writer
flushes the RAM. This seems like a more deterministic way of
making things efficient than trying various combinations of
maxBufferedDocs , MergeFactor, etc, all of which
Bear in mind that the million queries you run on the MemoryIndex can be
shortlisted if you place those queries in a RAMIndex and use the source
document's terms to "query the queries". The list of unique terms for your
document is readily available in the MemoryIndex's TermEnum.
You can take thi
"SK R" <[EMAIL PROTECTED]> wrote:
> 1.When will be the segments in RAMDirectory moved (flushed) in to
> FSDirectory?
This is maxBufferedDocs. Right now, every added doc creates its own
segment in the RAMDir. After maxBufferedDocs, all of these single
documents are merged and flushed to a s
"Erick Erickson" <[EMAIL PROTECTED]> wrote:
> I haven't used it yet, but I've seen several references to
> IndexWriter.ramSizeInBytes() and using it to control when the writer
> flushes the RAM. This seems like a more deterministic way of
> making things efficient than trying various combinations
Hi
I am seeking for making use of the latest lazy field loading in lucene 2.1.
I store the orignal bytes of a document, say a PDF file for example, in a
special untokenized field in the index. Though there is enough facilities in
IndexReader class for lazy field loading, the search API in IndexSea
Hello,
I am planning to index Word 2003 files. I read I have to use Jakarta Apache
POI, but I also read on the POI site that their work with doc's is in an early
stage.
Is POI advisable? Or are there better alternatives?
Please give some advice.
Regards,
Erik
Hi
My experience is not much satisfactory. It breaks very easily on many files.
On 3/23/07, [EMAIL PROTECTED] <
[EMAIL PROTECTED]> wrote:
Hello,
I am planning to index Word 2003 files. I read I have to use Jakarta
Apache POI, but I also read on the POI site that their work with doc's is in
an
please read the answer i gave you the last time you asked this question...
http://www.nabble.com/Re%3A-Lazy-field-loading-in-p9604064.html
: Hi
: I am seeking for making use of the latest lazy field loading in lucene 2.1.
: I store the orignal bytes of a document, say a PDF file for example, in
Sorry if the question is trivial but why not a Hits.doc(int,FieldSelector)
method?
On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
please read the answer i gave you the last time you asked this question...
http://www.nabble.com/Re%3A-Lazy-field-loading-in-p9604064.html
: Hi
: I am se
: Sorry if the question is trivial but why not a Hits.doc(int,FieldSelector)
: method?
As i said before...
>> Lazy loading stored fields is really about perfermance tweaking ... if
>> yoiu are that concerned baout performance, you shouldn't be using Hits at
>> all.
...there is a lot of info in
Hello All,
We allow our users to search through our index with a simple textfield.
The search phrase has "content" as its default value. This allows them
to search quickly through content but then when they type "to:blah AND
from:foo AND content:boogie" it will know to parse,etc.
What I wa
I don't believe there's anything built into Lucene that helps you out here
because you're really saying "do special things for my problem space
in these situations".
So about the only thing you can do that I know of is to construct the
query yourself by making a series of additions to BooleanQuer
Thank you,
Are there other sollutions?
Van: jafarim [mailto:[EMAIL PROTECTED]
Verzonden: vr 23-3-2007 18:55
Aan: java-user@lucene.apache.org
Onderwerp: Re: index word files ( doc )
Hi
My experience is not much satisfactory. It breaks very easily on many files
I think the code from Lucene in Action has examples that us POI and the
Textmining.org API. Check manning.com/hatcher2 for the code.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: "[
: One final note, it may be much easier for you to throw all the
: fields into a single uber-field and search that rather than implement
: all four separate clauses, but it's a trade off between simplicity and
: size.
this would be a very simple way to get the behavior you describe straight
f
www.textmining.org, but the site is no longer accessible. Check Nutch which has
a Word parser - it seems to be the original textmining.org Word6+POI parser.
Pre-word6 and "fast-saved" files will not work. I've not found a solution for
those
Antony
[EMAIL PROTECTED] wrote:
Thank you,
Are
Antony Bowesman wrote:
>> Are there other sollutions?
There's also antiword [1] which can convert your .doc to plain text or
PS, not sure how good it is.
--
Sami Siren
[1] http://www.winfield.demon.nl/
-
To unsubscribe, e-mai
23 matches
Mail list logo