I've been looking at the highlighter examples. All of them seem to deal with
fragments. I need to highlight an entire document as it is displayed (i.e.,
highlight all of the keywords in it). Can someone point me to some examples of
this or does the highlighter code not do this?
Thanks
Sco
lucene user wrote:
Am I being clear?
Now you are.
I don't know what you mean by "PERSON_ANNOTATION works for Google".
I suppose I meant annotations in the sense GATE and UIMA refer to
annotations. They are like a highlighter pen marking a particular
section of a document and adding me
I was considering not using nutch for indexing web documents. I was thinking
either extracting the full HTML document or through the use of some kind of
web scraper html parser utility extracting only the text content from a web
page and then indexing that.
I know it is strange, but I feel I have
Sorry, if you mean the java code then it's as below:
import java.io.File;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryPar
Hi Erick,
I indexed only files in pdf format so I cannot put them inline here in
email. I did use Luke and put the same query into it and the same thing
happened.
Is there any chance i can send the two pdf files that cause the error to you
to see if the error can be reproduced?
Best,
Ng
On Nov
Attachments often do not come through, at least they aren't visible to
me using g-mail. So you might want to re-send them in-line.
But another thing you can do is get a copy of luke and examine
your index to see if the actual contents of doc1 and doc2 are what
you expect. You can even run queries
Hi all,
I am having a problem with Lucene 2.2.0 with regard to the contents of the
Explanation objects after a PhraseQuery search. I indexed two documents doc1
and doc2 and then issue an OR Boolean query consisting of two PhraseQuery
pq1 and pq2.
Apparently, the details of the Explanation object
These annotations are private to a specific user and they can change at any
time. What are the challenges associated with this fact? What are the best
ways to address these challenges? There are likely to be lots of small
changes to these annotations. Can we delete and re-insert these annotation
do
One approach would be to take advantage of Lucene's ability to handle
different kinds of documents in a single index. You could put the
annotations in the same index as the main articles, but with extra
fields, like this:
Article document:
Id: article1
Type: article
Text: blah blah blah
Annotati
These annotations are not positional within the underlying article.
They are just comments the user associates with the entire underlying
document, i.e., "This article gets the facts wrong about the real
reasons the US went into Iraq." Could be a sentence or a few sentences
about the entire underly
OK I opened this JIRA issue to track this:
https://issues.apache.org/jira/browse/LUCENE-1069
Mike
"Michael McCandless" <[EMAIL PROTECTED]> wrote:
>
> Woops! You are right, this is a silly bug in the CheckIndex tool. It is not
> properly taking into account deletions. I will open an issue
Its works. The solution is the implementation of SortComparatorSource
interface.
Thanks Chris,
On Nov 26, 2007 10:17 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : I think you have a couple of problems here. First, you'll have to
> : normalize the scores to get *any* of them to be the sa
What analyzers are you using both at index time and
query time? StandardAnalyzer will, for instance, split
the words at the hyphen.
I would recommend that you get a copy of Luke (google
lucene luke) and examine both the contents of your index,
and the query produced by using various analyzers. Als
There is a constructor in the RAMDirectory that already does that.
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/store/RAMDirectory.html
I don't think that worth modify the internal Lucene's code to achieve
a extra bit of performance...
What would you do on next version? Modify it agai
RAMDirectory has a constructor that takes in another Directory and
loads it into memory. No Serialization necessary. Just index to a
FSDirectory using Lucene's normal indexing methods (it takes care of
buffering them internally) and then load the FSDirectory into a
RAMDirectory. Have a l
Woops! You are right, this is a silly bug in the CheckIndex tool. It is not
properly taking into account deletions. I will open an issue & fix it.
Thanks for testing & reporting this, and sorry about that.
Mike
"Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I tried to use the Check
Hi,
I tried to use the CheckIndex tool (the latest svn code) and I was surprised
to notice that all my indexes from production (around 30) are corrupt. This
is highly unlikely because they were running for about one year and I had no
exception during search so far.
One recurring pattern I observe
You can serialize this object RAMDirectory em disk. When start the
application , it read the file .ser and load the object in memory.
The time of load of file .ser is much fast.
You need change any classes of Lucece: Add the "implements Serialzable" in
any classes.
On Nov 27, 2007 4:28 AM, Chhab
Do the annotations have positions ?
Do you want to do things like phrase-search e.g.
"PERSON_ANNOTATION works for Google"
Or is your idea of an annotation more simply a del.ici.ous-style tag associated
with the whole document?
Cheers
Mark
- Original Message
From: lucene user <[
I'd be VERY grateful for your help, folks! Thanks! I really need some
insight on this. THANKS!!
On Nov 26, 2007 6:43 PM, lucene user <[EMAIL PROTECTED]> wrote:
> Here are the three options that seem practical to us right now.
>
> (1) Do The annotation search in postgres using LIKE or the
>post
20 matches
Mail list logo