Questions about lucene index on HDFS

2008-08-21 Thread Jarvis . Guo
Hi all, Firstly I have known that there is a FsDirectory class in Nutch-0.9 so we can access the index on HDFS. But after I tested it, i found that we can only read the index but can not to append or modify, I think the reason is the one mentioned in the HDFS-file append issues, am I right?

Re: Lucene Index Structure

2008-08-21 Thread David Lee
I see, ok. Thanks to both of you! On Thu, Aug 21, 2008 at 4:51 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Also, the inverted index *will* store positional information (in the *.prx > files) even if term vectors are not stored. > > Mike > > > Yonik Seeley wrote: > > On Thu, Aug 21, 20

Re: Storing special characters in Lucene

2008-08-21 Thread Juan Pablo Morales
It was, after all an XML issue, the servlets creating the content that was being indexed were not sending UTF but the XML declaration stated the code WAS UTF, so it really was not a Lucene issue after all. Thanks for all the help. On Thu, Aug 21, 2008 at 6:18 PM, Juan Pablo Morales <[EMAIL PROTECT

Trouble Boosting BooleanQuery's with Multiple Clauses

2008-08-21 Thread Tavi Nathanson
I was wondering if anyone could explain the following weird behavior that I'm experiencing when boosting BooleanQuery's: When I create a TermQuery, add it as a SHOULD clause to a BooleanQuery, and boost that BooleanQuery, the boost shows up when I run IndexSearcher.explain(). However, when I add

Re: Lucene Index Structure

2008-08-21 Thread Michael McCandless
Also, the inverted index *will* store positional information (in the *.prx files) even if term vectors are not stored. Mike Yonik Seeley wrote: On Thu, Aug 21, 2008 at 7:20 PM, David Lee <[EMAIL PROTECTED]> wrote: Clarification question: If I don't store term vectors, then I: -- won't h

Re: Lucene Index Structure

2008-08-21 Thread Yonik Seeley
On Thu, Aug 21, 2008 at 7:20 PM, David Lee <[EMAIL PROTECTED]> wrote: > Clarification question: > > If I don't store term vectors, then I: > -- won't have information on the position of matching terms > -- I don't have the term frequency vector > > -- but I should still have the frequency of terms

Lucene Index Structure

2008-08-21 Thread David Lee
Clarification question: If I don't store term vectors, then I: -- won't have information on the position of matching terms -- I don't have the term frequency vector -- but I should still have the frequency of terms per document in the .frq file, right? So what's the difference between the term f

Re: Storing special characters in Lucene

2008-08-21 Thread Juan Pablo Morales
You are right, it does work. I'll look into my example to see where the difference is. On Thu, Aug 21, 2008 at 5:30 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > Here's a unit test: > import junit.framework.TestCase; > import org.apache.lucene.analysis.snowball.SnowballAnalyzer; > import org.ap

Re: Storing special characters in Lucene

2008-08-21 Thread Grant Ingersoll
Here's a unit test: import junit.framework.TestCase; import org.apache.lucene.analysis.snowball.SnowballAnalyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWr

RE: QueryParser Default Operator

2008-08-21 Thread Jordon Saardchit
Nvm, Extremely goofy project configuration here and classpath issues with much older versions. Ignore me! -Original Message- From: Jordon Saardchit [mailto:[EMAIL PROTECTED] Sent: Thursday, August 21, 2008 11:54 AM To: java-user@lucene.apache.org Subject: QueryParser Default Operator T

QueryParser Default Operator

2008-08-21 Thread Jordon Saardchit
This may have been answered before, but is there a reason why setting the default operator on a QueryParser throws a java.lang.NoSuchFieldError??? QueryParser parser = new QueryParser( "title", new TokenAnalyzerImpl() ); parser.setDefaultOperator( QueryParser.AND_OPERATOR ); // This line throws t

Re: Storing special characters in Lucene

2008-08-21 Thread Juan Pablo Morales
On Thu, Aug 21, 2008 at 12:47 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hola Juan, Hi Steve > > > On 08/21/2008 at 1:16 PM, Juan Pablo Morales wrote: > > I have an index in Spanish and I use Snowball to stem and > > analyze and it works perfectly. However, I am running into > > trouble stor

RE: Storing special characters in Lucene

2008-08-21 Thread Steven A Rowe
Hola Juan, On 08/21/2008 at 1:16 PM, Juan Pablo Morales wrote: > I have an index in Spanish and I use Snowball to stem and > analyze and it works perfectly. However, I am running into > trouble storing (not indexing, only storing) words that > have special characters. > > That is, I store the spe

Storing special characters in Lucene

2008-08-21 Thread Juan Pablo Morales
I have an index in Spanish and I use Snowball to stem and analyze and it works perfectly. However, I am running into trouble storing (not indexing, only storing) words that have special characters. That is, I store the special character but the it comes garbled when I read it back. To provide an e

Re: java.lang.NullPointerExcpetion while indexing on linux

2008-08-21 Thread Aditi Goyal
On Wed, Aug 20, 2008 at 6:12 PM, Michael McCandless <[EMAIL PROTECTED] > wrote: > > Aditi Goyal wrote: > > Thanks Mike. I found the problem. >> The problem was that I was not converting the value of the fields to utf-8 >> and hence while adding it to doc it was getting stored as None. >> So, when

Re: Case Sensitivity

2008-08-21 Thread Andre Rubin
Just to add to that, as I said before, in my case, I found more useful not to use UN_Tokenized. Instead, I used Tokenized with a custom analyzer that uses the KeywordTokenizer (entire input as only one token) with the LowerCaseFilter: This way I get the best of both worlds. public class KeywordLow