Re: Handling synonyms using Lucene

2009-08-07 Thread Anshum
Hi Mitu, Though your approach would work I'd suggest you build a custom analyzer instead. Perhaps that'd be a bettter approach. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw

Handling synonyms using Lucene

2009-08-07 Thread mitu2009
Hi, What is the best way to handle synonyms (phrases) using Lucene? Especially, when I need to execute queries like :a OR b OR c NOT d How about adding a new field called "synonyms" to each document while indexing? This field's value would have a list of all synonyms. It would be added to a docu

UNC speed vs DOS path speed

2009-08-07 Thread Woolf, Ross
On a Windows machine I have noticed that using a UNC path instead of a DOS path when instantiating an index writer causes the performance to slow considerably, even when the UNC is to the same location as DOS path. Is anyone aware of this and why? Is there anything that can be done to improve

StandardAnalyzer and Windows vs. Linux "path"

2009-08-07 Thread ohaya
Hi, I've been doing development of my indexer app, which uses StandardAnalyzer on a WIndows machine, and today, I deployed an initial onto a Redhat Linux (RHEL) machine. On my development machine, I have the files that are being indexed in something like: C:\lucene-devel\files\dir1\xxx.d

RE: Is there a way for me to handle a multiword synonym correctly?

2009-08-07 Thread Donna L Gresh
I have to think about this a bit, but that may work. I just have to make sure no "undesirable" side effects occur. I certainly want to be able to search for a phrase and not have it match all the individual bits, but that should already work using the mechanism I already have in place. Donna

Re: Efficient optimization of large indexes?

2009-08-07 Thread Michael McCandless
On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote: >> Actually IndexWriter must periodically flush, which will always >> create new segments, which will then always require merging.  Ie >> there's no way to just add everything to only one segment in one >> shot. >> > > Hmm, that makes sense now that you

RE: reading index

2009-08-07 Thread Uwe Schindler
You should *not* create a new Searcher for every request. Open the Searcher one time (e.g. in your servlets init() method) and keep it open. Close it on we application shutdown. If your index changes inbetween, you should reopen it (e.g. by testing for IndexReader.isCurrent() and if not, reopening

Re: reading index

2009-08-07 Thread m.harig
Thanks, this is my code snippet public void doSearch(){ .. Query query = . IndexSearcher searcher = new IndexSearcher(directory);

RE: Is there a way for me to handle a multiword synonym correctly?

2009-08-07 Thread Carl Austin
I may be over simplifying here but in this case don't you just need to use an analyzer that breaks the word "SAP.EM.FIN.AM" on full stops and throws them out, so that it is indexed as terms "SAP" "EM" "FIN" "AM". This is the same as it will index "SAP EM FIN AM" as long as you break on whitespace t

Re: Is there a way for me to handle a multiword synonym correctly?

2009-08-07 Thread Matthew Hall
Create a field that is specifically for this type of matches. What you could then do is at indexing time manipulate your data in such a way that it can be matched in a punctuation irrelevant way. So in this field you would convert all non letter characters into spaces, and reduce all white sp

Is there a way for me to handle a multiword synonym correctly?

2009-08-07 Thread Donna L Gresh
I saw some discussion on the board but I'm not sure I've got quite the same problem. As an example, I have a query that might be a technical skill: SAP EM FIN AM I would like that to match a document that has *either* SAP.EM.FIN.AM or "SAP EM FIN AM" (in that order and all together, not spread

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Hi Matt, Good catch! As I just posted, I *just* noticed that (Luke use Keyword Analyzer) :)!!! Once I switched Luke to using Standard Analyzer, the Luke search results matched my web query results. Thanks! Jim Matthew Hall wrote: > Luke defaults to KeywordAnalyzer when you do a sea

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Andrzej, Hah! I tried as you suggested using Luke, and I found at least part of my problem. Luke was defaulting to KeywordAnalyzer. I changed that to StandardAnalyzer, and did queries for: path:x and path:xx.dat For the first, the Rewritten was:

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Ian, I just re-confirmed that StandardAnalyzer is used in both my indexer app and in the query/search web app. The actual file paths look like: C:\lucene-devel\dat\.dat or C:\lucene-devel\data\testdir\\.dat For field "path", Luke shows: lucene data c devel dat

Re: Language Detection for Analysis?

2009-08-07 Thread Grant Ingersoll
There are several free Language Detection libraries out there, as well as a few commercial ones. I think Karl Wettin has even written one as a plugin for Lucene. Nutch also has one, AIUI. I would just Google "language detection". Also see http://www.lucidimagination.com/search/?q=languag

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread Matthew Hall
Luke defaults to KeywordAnalyzer when you do a search on it. You have to specifically choose StandardAnalyzer. You are probably already doing this, but I figure its worth a check. Matt Andrzej Bialecki wrote: oh...@cox.net wrote: Hi Phil, Well, kind of... but... Then, why, when I do the

Group Admin

2009-08-07 Thread Ganesh
Hello all, I am having a field UserID, for every record. The results will be filtered for every User based on this field. We have a feature of group admin where a admin could view all records of a set of Users. My requirement is a group admin of 3 Users could view only 3 members data and he sho

Re: reading index

2009-08-07 Thread Ian Lea
It's not clear to me what you mean by reading the index every time. If you mean that you open a new searcher for every search, then no, it's not good. If you mean that every search or paging request gets passed to lucene then that is standard practice and is fine. See http://wiki.apache.org/lucen

reading index

2009-08-07 Thread m.harig
hello all, thanks to lucene. Am using lucene 2.4.0 for my application. My doubt is , can i read the index for many number of times? i mean , i've a search application which reads the index , which is 300MB in size, am reading my index at every time the user hits the page . Is it goo

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread Andrzej Bialecki
oh...@cox.net wrote: Hi Phil, Well, kind of... but... Then, why, when I do the search in Luke, do I get the results I cited: ==> succeeds .yyy ==> fails (no results) I guess that I've been assuming that the search in Luke is "correct" and I've been using that to "test my understa

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread Ian Lea
It is a good general assumption that Luke is correct. Can you confirm that you are using StandardAnalyzer everywhere, for indexing and searching? This sort of issue is often caused by using different analyzers. What does Luke show as the indexed terms for path? In a little index I've just creat

Re: Analysis Question

2009-08-07 Thread Ian Lea
You could write your own analyzer that worked out a boost as it analyzed the document fields and had a getBoost() method that you would call to get the value to add to the document as a separate field. If you write your own you can pass it what you like and it can do whatever you want. -- Ian.