How to compute the simlarity of a web page?

2009-02-11 Thread renavatior
I am doing some research in vertical search? Therefore, i defined some weights of several keywords in my corpus expressing a certain theme,later,how can i use these to compute the similarity with the given web page(passed by url to the compute method).I saw the source code of Similarity.java in Lu

Re: Fields with multiple values...

2009-02-11 Thread Mark Ferguson
One approach is to use dynamic fields, making the value of the second field part of the name of the first field. So for example, you would have: doc.Add (new Field ("Field1_A", "C", Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add (new Field ("Field1_B", "D", Field.Store.YES, Field.Index.UN_TO

Re: Fields with multiple values...

2009-02-11 Thread Paul Cowan
Dragon Fly wrote: I'd like to get a hit if I do: Field1:A AND Field2:C This is fine because that's how Lucene works. However, I do not want to get a hit if I do: Field1:A AND Field2:D The reason that I don't want a hit is because A is the first element in Field1 and D is the second el

Re: Fields with multiple values...

2009-02-11 Thread Erick Erickson
Well, you could index with your index as part of the value... doc.Add (new Field ("Field1", "1A", Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add (new Field ("Field1", "2B", Field.Store.YES, Field.Index.UN_TOKENIZED)); // Add 2 values to Field2. doc.Add (new Field ("Field2", "1C", Field.Store

Re: deletes when the writer is open and autocommit is set to false

2009-02-11 Thread Vinubalaji Gopal
On Wed, Feb 11, 2009 at 8:50 AM, Michael McCandless wrote: > Hmm -- OK I just fixed that FAQ entry. Thanks for raising this! > Cool. > If you know the doc doesn't exist already, you gain some performance by > using add instead of update. But if performance is already fast enough, it > may be si

RE: Fields with multiple values...

2009-02-11 Thread Dragon Fly
> But you'd have to do result consolidation. That's what I'm trying to avoid. I could get a lot of hits (e.g. 100,000 hits) and will have to load all the documents to remove the duplicates. > Subject: RE: Fields with multiple values... > Date: Wed, 11 Feb 2009 18:06:28 -0500 > From: sar...@syr

optimization problem

2009-02-11 Thread Qingdi
(I posted this question to "solr user" forum, but didn't get a clear answer. So re-post it here.) Our index size is about 60G. Most of the time, the optimization works fine. But this morning, the optimization kept creating new segment files until all the free disk space (300G) was used up. Here

RE: Fields with multiple values...

2009-02-11 Thread Steven A Rowe
Hi Dragon Fly, You could split the original document into multiple Lucene Documents, one for each array index, all sharing the same "DocID" field value. Then your queries "just work". But you'd have to do result consolidation, removing duplicate original docs when you get matches at multiple arr

Fields with multiple values...

2009-02-11 Thread Dragon Fly
Hi, Let's say I have a single document with 2 fields (namely Field1 and Field2). 2 values are added to each field like below. // Add 2 values to Field1. doc.Add (new Field ("Field1", "A", Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add (new Field ("Field1", "B", Field.Store.YES, Field.Ind

Re: Creation of index for the first time

2009-02-11 Thread Michael McCandless
Akshay wrote: The use case is in context of replication. The master has a newly created empty index. When slave requests for data, don't do anything if its a newly created index. OK. Hmm, but would master need to tell slave "I created a new index", even if new index happen to be create

Re: Creation of index for the first time

2009-02-11 Thread Akshay
The use case is in context of replication. The master has a newly created empty index. When slave requests for data, don't do anything if its a newly created index. On Wed, Feb 11, 2009 at 11:17 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > What exactly do you mean by "fresh index

What's the best way to store metadata?

2009-02-11 Thread Mandy Günther
Hi all, I want to use Lucene in my project but I have the following problem: My goal is to store metadata in the index. Her is an example of the stuff I need to index 1; My; determiner; 2; project; noun; 1 3; uses; verb; 1 4; Lucene; noun; 1 6; I; noun; 7; need;

Re: Creation of index for the first time

2009-02-11 Thread Michael McCandless
What exactly do you mean by "fresh index created for the first time"? Ie, does opening an IndexWriter with create=true over a Directory that previously had a Lucene index not count as "fresh" for some reason? (If so, then it sounds like generation==1 is the test you want). What's the use

Re: Creation of index for the first time

2009-02-11 Thread Akshay
Is there a way, without the knowledge of how IndexWriter was used, by which we can say that an empty index currently open is a really fresh index created for the first time? Thanks. On Wed, Feb 11, 2009 at 9:54 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > If you create IndexWri

Re: Creation of index for the first time

2009-02-11 Thread Michael McCandless
Noble Paul നോബിള്‍ नोब्ळ् wrote: is it reasonable to assume that the generation of a commit point is always '1' when an empty index is opened? It depends what "empty index" means. EG I can make a new index, add docs, do lots of commits, etc., then open a new IndexWriter with create=true on t

Re: Creation of index for the first time

2009-02-11 Thread Noble Paul നോബിള്‍ नोब्ळ्
is it reasonable to assume that the generation of a commit point is always '1' when an empty index is opened? On Wed, Feb 11, 2009 at 9:54 PM, Michael McCandless wrote: > > If you create IndexWriter with create=true in a directory that has no Lucene > index, segments_1 is created. > > If you do t

Re: deletes when the writer is open and autocommit is set to false

2009-02-11 Thread Michael McCandless
Vinubalaji Gopal wrote: On Wed, Feb 11, 2009 at 2:56 AM, Michael McCandless wrote: IndexWriter can in fact delete documents, by Term or by Query. It also has updateDocument, which under-the-hood simply calls deleteDocuments then addDocument. Awesome that FAQ entry confused me and I did

Re: Creation of index for the first time

2009-02-11 Thread Michael McCandless
If you create IndexWriter with create=true in a directory that has no Lucene index, segments_1 is created. If you do the same, but in a directory that already has a Lucene index, segments_(N+1) is created (where N was the last generation of the current index in that directory). But... t

Re: deletes when the writer is open and autocommit is set to false

2009-02-11 Thread Vinubalaji Gopal
On Wed, Feb 11, 2009 at 2:56 AM, Michael McCandless wrote: > IndexWriter can in fact delete documents, by Term or by Query. It also has > updateDocument, which under-the-hood simply calls deleteDocuments then > addDocument. Awesome that FAQ entry confused me and I didn't look at IndexWriter java

Creation of index for the first time

2009-02-11 Thread Akshay
Hi List, How to find if an empty lucene index has been created for the very first time? Is the generation number 1 enough to determine this? -- Regards, Akshay K. Ukey.

Best Practice for Lucene Search

2009-02-11 Thread Konstantyn Smirnov
In the beginning of the development, I was also facing a choice to mirror the documents in DB/index. But when the number of raws reached the mark of 7 mio, the query like "select count(id) from documentz" (using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became clear t

Re: deletes when the writer is open and autocommit is set to false

2009-02-11 Thread Michael McCandless
IndexWriter can in fact delete documents, by Term or by Query. It also has updateDocument, which under-the-hood simply calls deleteDocuments then addDocument. Mike Vinubalaji Gopal wrote: Hi all, I am a new lucene user and got started with in a really quick time! Its been really nice and