Hi Mitu,
Though your approach would work I'd suggest you build a custom analyzer
instead. Perhaps that'd be a bettter approach.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
Hi,
What is the best way to handle synonyms (phrases) using Lucene? Especially,
when I need to execute queries like :a OR b OR c NOT d
How about adding a new field called "synonyms" to each document while
indexing? This field's value would have a list of all synonyms. It would be
added to a docu
On a Windows machine I have noticed that using a UNC path instead of a DOS path
when instantiating an index writer causes the performance to slow considerably,
even when the UNC is to the same location as DOS path. Is anyone aware of this
and why? Is there anything that can be done to improve
Hi,
I've been doing development of my indexer app, which uses StandardAnalyzer on a
WIndows machine, and today, I deployed an initial onto a Redhat Linux (RHEL)
machine.
On my development machine, I have the files that are being indexed in something
like:
C:\lucene-devel\files\dir1\xxx.d
I have to think about this a bit, but that may work. I just have to make
sure no "undesirable" side effects occur. I certainly want to be able to
search for a phrase and not have it match all the individual bits, but
that should already work using the mechanism I already have in place.
Donna
On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote:
>> Actually IndexWriter must periodically flush, which will always
>> create new segments, which will then always require merging. Ie
>> there's no way to just add everything to only one segment in one
>> shot.
>>
>
> Hmm, that makes sense now that you
You should *not* create a new Searcher for every request. Open the Searcher
one time (e.g. in your servlets init() method) and keep it open. Close it on
we application shutdown.
If your index changes inbetween, you should reopen it (e.g. by testing for
IndexReader.isCurrent() and if not, reopening
Thanks,
this is my code snippet
public void doSearch(){
..
Query query =
.
IndexSearcher searcher = new IndexSearcher(directory);
I may be over simplifying here but in this case don't you just need to
use an analyzer that breaks the word "SAP.EM.FIN.AM" on full stops and
throws them out, so that it is indexed as terms "SAP" "EM" "FIN" "AM".
This is the same as it will index "SAP EM FIN AM" as long as you break
on whitespace t
Create a field that is specifically for this type of matches.
What you could then do is at indexing time manipulate your data in such
a way that it can be matched in a punctuation irrelevant way.
So in this field you would convert all non letter characters into
spaces, and reduce all white sp
I saw some discussion on the board but I'm not sure I've got quite the
same problem. As an example, I have a query that might be a technical
skill:
SAP EM FIN AM
I would like that to match a document that has *either* SAP.EM.FIN.AM or
"SAP EM FIN AM" (in that order and all together, not spread
Hi Matt,
Good catch! As I just posted, I *just* noticed that (Luke use Keyword
Analyzer) :)!!!
Once I switched Luke to using Standard Analyzer, the Luke search results
matched my web query results.
Thanks!
Jim
Matthew Hall wrote:
> Luke defaults to KeywordAnalyzer when you do a sea
Andrzej,
Hah!
I tried as you suggested using Luke, and I found at least part of my problem.
Luke was defaulting to KeywordAnalyzer.
I changed that to StandardAnalyzer, and did queries for:
path:x
and
path:xx.dat
For the first, the Rewritten was:
Ian,
I just re-confirmed that StandardAnalyzer is used in both my indexer app and in
the query/search web app.
The actual file paths look like:
C:\lucene-devel\dat\.dat
or
C:\lucene-devel\data\testdir\\.dat
For field "path", Luke shows:
lucene
data
c
devel
dat
There are several free Language Detection libraries out there, as well
as a few commercial ones. I think Karl Wettin has even written one as
a plugin for Lucene. Nutch also has one, AIUI. I would just Google
"language detection".
Also see http://www.lucidimagination.com/search/?q=languag
Luke defaults to KeywordAnalyzer when you do a search on it. You have
to specifically choose StandardAnalyzer. You are probably already doing
this, but I figure its worth a check.
Matt
Andrzej Bialecki wrote:
oh...@cox.net wrote:
Hi Phil,
Well, kind of... but...
Then, why, when I do the
Hello all,
I am having a field UserID, for every record. The results will be filtered for
every User based on this field. We have a feature of group admin where a admin
could view all records of a set of Users. My requirement is a group admin of 3
Users could view only 3 members data and he sho
It's not clear to me what you mean by reading the index every time.
If you mean that you open a new searcher for every search, then no,
it's not good.
If you mean that every search or paging request gets passed to lucene
then that is standard practice and is fine.
See http://wiki.apache.org/lucen
hello all,
thanks to lucene. Am using lucene 2.4.0 for my application. My
doubt is , can i read the index for many number of times? i mean , i've a
search application which reads the index , which is 300MB in size, am
reading my index at every time the user hits the page . Is it goo
oh...@cox.net wrote:
Hi Phil,
Well, kind of... but...
Then, why, when I do the search in Luke, do I get the results I cited:
==> succeeds
.yyy ==> fails (no results)
I guess that I've been assuming that the search in Luke is "correct" and I've been using
that to "test my understa
It is a good general assumption that Luke is correct.
Can you confirm that you are using StandardAnalyzer everywhere, for
indexing and searching? This sort of issue is often caused by using
different analyzers.
What does Luke show as the indexed terms for path? In a little index
I've just creat
You could write your own analyzer that worked out a boost as it
analyzed the document fields and had a getBoost() method that you
would call to get the value to add to the document as a separate
field. If you write your own you can pass it what you like and it can
do whatever you want.
--
Ian.
22 matches
Mail list logo