Lucene highlighting

2007-11-27 Thread Scott Smith
I've been looking at the highlighter examples. All of them seem to deal with fragments. I need to highlight an entire document as it is displayed (i.e., highlight all of the keywords in it). Can someone point me to some examples of this or does the highlighter code not do this? Thanks Sco

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread markharw00d
lucene user wrote: Am I being clear? Now you are. I don't know what you mean by "PERSON_ANNOTATION works for Google". I suppose I meant annotations in the sense GATE and UIMA refer to annotations. They are like a highlighter pen marking a particular section of a document and adding me

Lucene or nutch for indexing web documents

2007-11-27 Thread bbrown
I was considering not using nutch for indexing web documents. I was thinking either extracting the full HTML document or through the use of some kind of web scraper html parser utility extracting only the text content from a web page and then indexing that. I know it is strange, but I feel I have

Re: problem with details given by Explanation object

2007-11-27 Thread Ng Vinny
Sorry, if you mean the java code then it's as below: import java.io.File; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryPar

Re: problem with details given by Explanation object

2007-11-27 Thread Ng Vinny
Hi Erick, I indexed only files in pdf format so I cannot put them inline here in email. I did use Luke and put the same query into it and the same thing happened. Is there any chance i can send the two pdf files that cause the error to you to see if the error can be reproduced? Best, Ng On Nov

Re: problem with details given by Explanation object

2007-11-27 Thread Erick Erickson
Attachments often do not come through, at least they aren't visible to me using g-mail. So you might want to re-send them in-line. But another thing you can do is get a copy of luke and examine your index to see if the actual contents of doc1 and doc2 are what you expect. You can even run queries

problem with details given by Explanation object

2007-11-27 Thread Ng Vinny
Hi all, I am having a problem with Lucene 2.2.0 with regard to the contents of the Explanation objects after a PhraseQuery search. I indexed two documents doc1 and doc2 and then issue an OR Boolean query consisting of two PhraseQuery pq1 and pq2. Apparently, the details of the Explanation object

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
These annotations are private to a specific user and they can change at any time. What are the challenges associated with this fact? What are the best ways to address these challenges? There are likely to be lots of small changes to these annotations. Can we delete and re-insert these annotation do

RE: Searching user-private annotations associated with indexed documents

2007-11-27 Thread Binkley, Peter
One approach would be to take advantage of Lucene's ability to handle different kinds of documents in a single index. You could put the annotations in the same index as the main articles, but with extra fields, like this: Article document: Id: article1 Type: article Text: blah blah blah Annotati

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
These annotations are not positional within the underlying article. They are just comments the user associates with the entire underlying document, i.e., "This article gets the facts wrong about the real reasons the US went into Iraq." Could be a sentence or a few sentences about the entire underly

Re: CheckIndex tool issues

2007-11-27 Thread Michael McCandless
OK I opened this JIRA issue to track this: https://issues.apache.org/jira/browse/LUCENE-1069 Mike "Michael McCandless" <[EMAIL PROTECTED]> wrote: > > Woops! You are right, this is a silly bug in the CheckIndex tool. It is not > properly taking into account deletions. I will open an issue

Re: Score: Randomize form

2007-11-27 Thread Haroldo Nascimento
Its works. The solution is the implementation of SortComparatorSource interface. Thanks Chris, On Nov 26, 2007 10:17 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : I think you have a couple of problems here. First, you'll have to > : normalize the scores to get *any* of them to be the sa

Re: prefix query search problem if a hyphen exist in the search word

2007-11-27 Thread Erick Erickson
What analyzers are you using both at index time and query time? StandardAnalyzer will, for instance, split the words at the hyphen. I would recommend that you get a copy of Luke (google lucene luke) and examine both the contents of your index, and the query produced by using various analyzers. Als

Re: RAMDirectory vs FSDirectory

2007-11-27 Thread German Kondolf
There is a constructor in the RAMDirectory that already does that. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/store/RAMDirectory.html I don't think that worth modify the internal Lucene's code to achieve a extra bit of performance... What would you do on next version? Modify it agai

Re: RAMDirectory vs FSDirectory

2007-11-27 Thread Grant Ingersoll
RAMDirectory has a constructor that takes in another Directory and loads it into memory. No Serialization necessary. Just index to a FSDirectory using Lucene's normal indexing methods (it takes care of buffering them internally) and then load the FSDirectory into a RAMDirectory. Have a l

Re: CheckIndex tool issues

2007-11-27 Thread Michael McCandless
Woops! You are right, this is a silly bug in the CheckIndex tool. It is not properly taking into account deletions. I will open an issue & fix it. Thanks for testing & reporting this, and sorry about that. Mike "Bogdan Ghidireac" <[EMAIL PROTECTED]> wrote: > Hi, > > I tried to use the Check

CheckIndex tool issues

2007-11-27 Thread Bogdan Ghidireac
Hi, I tried to use the CheckIndex tool (the latest svn code) and I was surprised to notice that all my indexes from production (around 30) are corrupt. This is highly unlikely because they were running for about one year and I had no exception during search so far. One recurring pattern I observe

Re: RAMDirectory vs FSDirectory

2007-11-27 Thread Haroldo Nascimento
You can serialize this object RAMDirectory em disk. When start the application , it read the file .ser and load the object in memory. The time of load of file .ser is much fast. You need change any classes of Lucece: Add the "implements Serialzable" in any classes. On Nov 27, 2007 4:28 AM, Chhab

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread mark harwood
Do the annotations have positions ? Do you want to do things like phrase-search e.g. "PERSON_ANNOTATION works for Google" Or is your idea of an annotation more simply a del.ici.ous-style tag associated with the whole document? Cheers Mark - Original Message From: lucene user <[

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
I'd be VERY grateful for your help, folks! Thanks! I really need some insight on this. THANKS!! On Nov 26, 2007 6:43 PM, lucene user <[EMAIL PROTECTED]> wrote: > Here are the three options that seem practical to us right now. > > (1) Do The annotation search in postgres using LIKE or the >post