Re: factor in stopwords when searching

2008-03-22 Thread Chris Lu
Hi, Erik, I understand your rant. :) Well, the solution I finalized with is this, as suggested by Jake and Grant. For those stop words, when indexing content, I will treat them as normal words. When processing the user query, there will be normal query with stop words skipped, and another part tha

RE: Field values ...

2008-03-22 Thread Chris Hostetter
: I want to do something like: : : List infoList = new ArrayList (); : foreach (Document doc in LuceneIndex) : { :String id = doc.get ("Id"); :String phone = doc.get ("Phone"); :infoList.add (new Info (id, phone)); : } If "Id" and "Phone" are stored value

Re: factor in stopwords when searching

2008-03-22 Thread Erick Erickson
Well, whether it's a good user experience is exactly the question. I've spent far too much time satisfying customer (or product manager) requests that add zero value to the product *in the user's eyes*. And I quote: "This is asked by some customer, who may not know what's "stop words" at all." wh

Re: factor in stopwords when searching

2008-03-22 Thread Chris Lu
This is asked by some customer, who may not know what's "stop words" at all. Jake's approach should be quite similar to what some search engine companies are doing. It'll cost some storage, but can achieve a good user experience. The benefit is kind of obvious in real world. When users enter some

Re: Access Denied in opening IndexSearcher

2008-03-22 Thread Erick Erickson
Two things: 1> get a copy of Luke and try to navigate to your dir and open it. That'll tell you if you are looking in the right place. 2> Post the code snippets where you open your index for writing and where you open it for reading. That'll give folks something to analyze. Best Erick On Sat, M

Re: factor in stopwords when searching

2008-03-22 Thread Erick Erickson
What's your reason for trying? The whole point of stop words is that they should be considered "no ops". That is, they add nothing to the semantics of whatever is being processed. I' don't understand the use case for why you want to go outside that assumption. Another way of asking this is "what t

Re: Field name size and index size

2008-03-22 Thread Michael McCandless
Summary: I think there will be no real impact if you use longer field names. Details: Index size will be just a tiny bit bigger. There is a single file per segment (*.fnm) that resolves the field names into integer IDs, then the rest of the index uses these integer IDs. So only that

Field name size and index size

2008-03-22 Thread John
Hi, Lets say my data source consists of records like so (the example is Field=Value): ? AA=Value1 ? BB=Value2 ? CC=Value3 ? DD=Value4 And lets say I a second copy of my data but this time it looks like so: ? A=Value1 ? B=Value2 ? C=Value3 ? D=Value4 I..e, same

Access Denied in opening IndexSearcher

2008-03-22 Thread Jeet Singh
Hi, This is my first post to this group. I'm using Lucene 2.3 on XP machine. I've an index of 3000 pages in a dir named 'wcrawl', that I want to search through. In the code, when i'm trying to open IndexSearcher at a specific 'Directory' location, it gives error of FileNotFound: c:\raw\wcraw