I am doing some research in vertical search? Therefore, i defined some
weights of several keywords in my corpus expressing a certain
theme,later,how can i use these to compute the similarity with the given web
page(passed by url to the compute method).I saw the source code of
Similarity.java in Lu
One approach is to use dynamic fields, making the value of the second field
part of the name of the first field. So for example, you would have:
doc.Add (new Field ("Field1_A", "C", Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.Add (new Field ("Field1_B", "D", Field.Store.YES,
Field.Index.UN_TO
Dragon Fly wrote:
I'd like to get a hit if I do:
Field1:A AND Field2:C
This is fine because that's how Lucene works. However, I do not want to get a
hit if I do:
Field1:A AND Field2:D
The reason that I don't want a hit is because A is the first element in Field1
and D is the second el
Well, you could index with your index as part of the value...
doc.Add (new Field ("Field1", "1A", Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.Add (new Field ("Field1", "2B", Field.Store.YES,
Field.Index.UN_TOKENIZED));
// Add 2 values to Field2.
doc.Add (new Field ("Field2", "1C", Field.Store
On Wed, Feb 11, 2009 at 8:50 AM, Michael McCandless
wrote:
> Hmm -- OK I just fixed that FAQ entry. Thanks for raising this!
>
Cool.
> If you know the doc doesn't exist already, you gain some performance by
> using add instead of update. But if performance is already fast enough, it
> may be si
> But you'd have to do result consolidation.
That's what I'm trying to avoid. I could get a lot of hits (e.g. 100,000 hits)
and will have to load all the documents to remove the duplicates.
> Subject: RE: Fields with multiple values...
> Date: Wed, 11 Feb 2009 18:06:28 -0500
> From: sar...@syr
(I posted this question to "solr user" forum, but didn't get a clear answer.
So re-post it here.)
Our index size is about 60G. Most of the time, the optimization works fine.
But this morning, the optimization kept creating new segment files until all
the free disk space (300G) was used up.
Here
Hi Dragon Fly,
You could split the original document into multiple Lucene Documents,
one for each array index, all sharing the same "DocID" field value.
Then your queries "just work". But you'd have to do result
consolidation, removing duplicate original docs when you get matches at
multiple arr
Hi,
Let's say I have a single document with 2 fields (namely Field1 and Field2). 2
values are added to each field like below.
// Add 2 values to Field1.
doc.Add (new Field ("Field1", "A", Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add (new Field ("Field1", "B", Field.Store.YES, Field.Ind
Akshay wrote:
The use case is in context of replication. The master has a newly
created
empty index. When slave requests for data, don't do anything if its
a newly
created index.
OK.
Hmm, but would master need to tell slave "I created a new index", even
if
new index happen to be create
The use case is in context of replication. The master has a newly created
empty index. When slave requests for data, don't do anything if its a newly
created index.
On Wed, Feb 11, 2009 at 11:17 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> What exactly do you mean by "fresh index
Hi all,
I want to use Lucene in my project but I have the following problem:
My goal is to store metadata in the index.
Her is an example of the stuff I need to index
1; My; determiner;
2; project; noun; 1
3; uses; verb; 1
4; Lucene; noun; 1
6; I; noun;
7; need;
What exactly do you mean by "fresh index created for the first time"?
Ie, does opening an IndexWriter with create=true over a Directory that
previously had a Lucene index not count as "fresh" for some reason?
(If so, then it sounds like generation==1 is the test you want).
What's the use
Is there a way, without the knowledge of how IndexWriter was used, by which
we can say that an empty index currently open is a really fresh index
created for the first time?
Thanks.
On Wed, Feb 11, 2009 at 9:54 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> If you create IndexWri
Noble Paul നോബിള് नोब्ळ् wrote:
is it reasonable to assume that the generation of a commit point is
always '1' when an empty index is opened?
It depends what "empty index" means.
EG I can make a new index, add docs, do lots of commits, etc., then
open a new IndexWriter with create=true on t
is it reasonable to assume that the generation of a commit point is
always '1' when an empty index is opened?
On Wed, Feb 11, 2009 at 9:54 PM, Michael McCandless
wrote:
>
> If you create IndexWriter with create=true in a directory that has no Lucene
> index, segments_1 is created.
>
> If you do t
Vinubalaji Gopal wrote:
On Wed, Feb 11, 2009 at 2:56 AM, Michael McCandless
wrote:
IndexWriter can in fact delete documents, by Term or by Query. It
also has
updateDocument, which under-the-hood simply calls deleteDocuments
then
addDocument.
Awesome that FAQ entry confused me and I did
If you create IndexWriter with create=true in a directory that has no
Lucene index, segments_1 is created.
If you do the same, but in a directory that already has a Lucene
index, segments_(N+1) is created (where N was the last generation of
the current index in that directory).
But... t
On Wed, Feb 11, 2009 at 2:56 AM, Michael McCandless
wrote:
> IndexWriter can in fact delete documents, by Term or by Query. It also has
> updateDocument, which under-the-hood simply calls deleteDocuments then
> addDocument.
Awesome that FAQ entry confused me and I didn't look at IndexWriter
java
Hi List,
How to find if an empty lucene index has been created for the very first
time? Is the generation number 1 enough to determine this?
--
Regards,
Akshay K. Ukey.
In the beginning of the development, I was also facing a choice to mirror the
documents in DB/index.
But when the number of raws reached the mark of 7 mio, the query like
"select count(id) from documentz"
(using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became
clear t
IndexWriter can in fact delete documents, by Term or by Query. It
also has updateDocument, which under-the-hood simply calls
deleteDocuments then addDocument.
Mike
Vinubalaji Gopal wrote:
Hi all,
I am a new lucene user and got started with in a really quick time!
Its been really nice and
22 matches
Mail list logo