Hey All,
I am very interested in indexing a 3NF Data Structure. Is there any
advice that someone can provide with this? From what I have seen Lucene
is typically a flat "First Normal Form" (Flat) data structure The
only way I can see to combine the relational links between multiple
indexe
"Karl Koch" <[EMAIL PROTECTED]> wrote:
> For the documents Lucene employs
> its norm_d_t which is explained as:
>
> norm_d_t : square root of number of tokens in d in the same field as t
Actually (by default) it is:
1 / sqrt(#tokens in d with same field as t)
> basically just the square root
Hi,
I manage to delete the document based on term, but that is just 1 part. I
wonder do lucene support how I can pull out the info that I have been
indexed and place it into other index file. Is it the only way that I have
to use indexwriter to perform indexing again with all the necessary fie
you have to search against something known. You simply (as has been
mentioned many times) cannot rely on the document IDs.
So, I'd store the full path (untokenized) of the file. When you move a file,
search for the path in the appropriate field in your index that the file was
originally stored in
Hi,
When I perform delete document and delete document based on the Id, does
the Id is the unique key and by deleting based on the Id, all the related
info will be deleted as well? If so, how can I know the document Id? Thanks.
regards,
Wooi Meng
--
View this message in context:
http://www
Hi,
I m just wondering is there any unique key that I can use to delete
particular document? How can I check the postion of a particular document
inside index file? Is there any example that I can refer to on how to delete
documents by a term.
For second scenario, the reason why I m doing
I've implemented the zero boost solution and it seems to be doing what I
want. Thanks to everyone who had suggestions.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Monday, December 11, 2006 11:45 AM
To: java-user@lucene.apache.org
Subject: Re: de-boosting fiel
Karl Koch wrote:
> Is there any other paper that actually shows the benefit of doing
> this particular normalisation with coord_q_d? I am not suggesting
> here that it is not useful, I am just looking for evidence how the
> idea developed.
I think it's a mischaracterization to call coordination a
it appears that you may have multiple copies of hte lucene code base in
your class path.
: $ java org.apache.lucene.demo.IndexFiles ../data/medline/docs/
: Indexing to directory 'index'...
: adding ../data/medline/docs/1.txt
: Exception in thread "main" java.lang.IncompatibleClassChangeError: fie
spinergywmy <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>I have ask this question before but may be the question wasn't clear.
>
>How can I delete particular index that I want to and keep the rest?
For
> instance, I have been indexed document Id, date, user Id and contents, my
> question is does t
Hi,
Yes, you can't get to the "stored" field content in your Hits because you are
using a (File)Reader.
Otis
- Original Message
From: abdul aleem <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, December 12, 2006 8:22:54 AM
Subject: Indexing large files
Hi There,
I
Hello Steven,
I looked up the paper and read the relevant part. The text quote you provided
is from the introcution. I belief that quote referes to the basic purpose of an
information retrieval system in general. At least to the purpose of a vector
space model IR system.
If this is the theore
Try this. It returns
Found a term Austria in doc 7
Found a term Botswana in doc 6
Found a term New in doc 3
Found a term Tennessee, in doc 1
Found a term US in doc 0
Found a term US in doc 1
Found a term US in doc 3
Found a term US in doc 4
Found a term Utah, in doc 4
Found a term Virginia, in do
Karl Koch wrote:
> The coord(q,d) normalisation is "a score factor based on how many of
> the query terms are found in the specified document." and described
> here:
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
>
> Does this have a theoretical
Erick,
Sorry for replying to a bit old topic.
> TermDocs.seek(new Term("specific_field", ""));
>
> Note that the "" as the value of the term gets all the terms. Then use
> TermDocs.next until it returns false. At each point, TermDocs.doc() will
> give you the Lucene ID of a document containing t
Well. searching documents for text is what Lucene is *made* for . So,
yes, this would be a fine thing to use Lucene for. You'll have to deal with
coordinating between when a document is added to the directory and when it's
added to the Lucene index.
Also, give some thought to what form your docum
Hi,
I am building a portal where users are able to maintain a personal doc, placed
in a database or dir on server.
I want the users to be able to search all other users doc's for keywords.
Like give me a top ten of documents containing the work bike.
Is Lucene useful for this? Or do you
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote:
However, what exactly is the advantage of using sqare root instead
of log?
Speaking anecdotally, I wouldn't say there's an advantage. There's a
predictable effect: very long documents are rewarded, since the
damping factor is not as strong.
i have successfully compiled, installed, and run lucene-based applications
on several machines, but i am currently trying to get lucene to run on a
sever that i do not administer and am having an odd problem... perhaps
someone can decipher it?
if i try, for instance, to run the basic lucene de
Hi There,
I have been working with Lucene API for the past 1 day
we are in the process of building a log viewer tool,
this is how the log file looks
[2006-12-11 01:52:40.179] [lon0571xus] [DEBUG] [TIE
heartbeat monitor (monitor.heartbeat.fxstreamrates)]
[unknown] [] [] ActiveRateServerIdList -
I need to apply a set of custom filters to my query.
One of the filters, which optionally can be applied, is a filter by
date range.
For the moment I'm using a BooleanQuery approach for this.
I know that it is not the best from the score accuracy nor performance
point of view and I want change th
Hello Karl,
I’m very interested in the details of Lucene’s scoring as well.
Karl Koch wrote:
For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with
norm_q : sqrt(sum_t((tf_q*idf_t)^2))
which is also called cosine normalisation. This is a technique that
Hello group,
The coord(q,d) normalisation is "a score factor based on how many of the query
terms are found in the specified document." and described here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
Does this have a theoretical base? On what b
Hi,
I have a question about the current Lucene scoring algoritm. In this scoring
algorithm, the term frequency is calcualted by using the square root of the
number of occuring terms as described in
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf
Havi
Thanks for the instant reply,
I see what rajesh advises is something lilke what MultiReader does.
That would be my last approach becouse of the complexities it will introduce
in developing the business case I have.
Any thing other than that would be a appriciable ppointer
On Monday 11 December 2
Hello Doron (and all the others who read here):),
thank you for your effort and your time. I really appreciate it. :)
I understand why normalisation is done in general. Mainly, to normalise the
bias of oversized documents. In the literature I have read so far, there is
usually a high effort on
Hi,
I have ask this question before but may be the question wasn't clear.
How can I delete particular index that I want to and keep the rest? For
instance, I have been indexed document Id, date, user Id and contents, my
question is does that particular contents will be deleted if I just
sp
27 matches
Mail list logo