Re: Dealing with acronyms

2006-04-26 Thread Rajesh Munavalli
> > > So I guess its done by writing or extending an anylzer? > Yes...thats correct. --Rajesh Munavalli Blog: http://munavalli.blogspot.com

Re: Dealing with acronyms

2006-04-26 Thread Rajesh Munavalli
ential acronym. For ex: - All Caps - The acronym appears repeatedly in the rest of the text - Found in the acronym dictionary...etc Hope this helps, --Rajesh Munavalli Blog: http://munavalli.blogspot.com

Re: Syntax help

2006-04-14 Thread Rajesh Munavalli
made with thinlets. I had not heard of > this...I'll see if it helps me figure out whats going on. > > --Bill > > > On 4/14/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote: > > > > It would be helpful to download Luke (http://www.getopt.org/luke/

Re: Syntax help

2006-04-14 Thread Rajesh Munavalli
It would be helpful to download Luke (http://www.getopt.org/luke/) and analyze whats getting indexed. Have you tried that? On 4/14/06, Bill Snyder <[EMAIL PROTECTED]> wrote: > > Hello, > > We am using Lucene to facilitate searching of our applications log files. > I > am noticing some inconsistenc

Lucene Sandbox - SearchBean

2006-04-07 Thread Rajesh Munavalli
Can someone tell me where I can find the source code for SearchBean (Lucene Sandbox)? Thanks, --Rajesh

Query filter and span query

2006-03-03 Thread Rajesh Munavalli
at the number of documents N would be much less than the total number of documents in the index, is it better to query them on reduced number of N documents as in Option 2? Thanks, Rajesh Munavalli

Re: Term Vector

2006-03-01 Thread Rajesh Munavalli
This has been discussed previously. Here are the links http://www.gossamer-threads.com/lists/lucene/java-user/9189#9189 http://www.gossamer-threads.com/lists/lucene/java-user/32362#32362 Hope that helps, Rajesh Munavalli On 3/1/06, Srikanth Kallurkar <[EMAIL PROTECTED]> wrote: >

Re: Phrase query vs span query

2006-02-22 Thread Rajesh Munavalli
NE-413 > but this will only help to get an impression of how to match in the > ordered > and unordered cases. > It might be possible to generalize the various span algorithms there and > in the trunk to work with fewer "terms". > I will consider that option. Thanks, Rajesh Munavalli

Re: Phrase query vs span query

2006-02-21 Thread Rajesh Munavalli
ow much the score is influenced by the proximity of > the words in the query, vs the frequency of hte phrases in the docs, see > my recent posting about the use of tf in Similarity -- which i think is > accurate since nobody replied and said i was wrong... > > > http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html I will take a closer look at the explaination. Thanks, Rajesh Munavalli

Phrase query vs span query

2006-02-21 Thread Rajesh Munavalli
pproximate the rankings I am expecting. In that case which of the following queries will perform better (in terms of QUERY SPEED and RANKING) (a) phrase query with certain slope factor (b) span query Thanks, Rajesh Munavalli

query formulation

2006-02-10 Thread Rajesh Munavalli
ld field1:t1 t2 t3 t4 AND field2:t5 t6 field2:t1 t2 t3 t4 AND field2:t6 t7 field2:t1 t2 t3 t4 AND field2:t5 t7 ... ... Rank 3: Two terms missing from either of the field ... Rank n: Only one term exists in both field1 and field 2 Thanks, Rajesh Munavalli

Re: Related searches

2006-01-31 Thread Rajesh Munavalli
t;indemnity" (actual synonyms for "car" and "insurance" retrieved from WordNet). -- Rajesh Munavalli On 1/31/06, Klaus <[EMAIL PROTECTED]> wrote: > > Hi Leon, > > have you tried the WorldNet ad-on? You can easily expand the query with > synonyms. >

Re: Related searches

2006-01-31 Thread Rajesh Munavalli
ly those having high TF) with query terms. The intution is that words co-occurring are related. Google for "local global document analysis" and "word co-occurrence similarity" Rajesh Munavalli On 1/30/06, Leon Chaddock <[EMAIL PROTECTED]> wrote: > > Hi,

Re: indexing whole harddrive

2006-01-31 Thread Rajesh Munavalli
hem indexDocs(new File(file, files[i])); } } -- Rajesh Munavalli On 1/31/06, Azlan Abdul Latiff <[EMAIL PROTECTED]> wrote: > > how can I index the whole hard drive? I tried using "c:/" but it didnt > work. > > The results only return c:/ directory where

Re: Help with indexing and query strategy

2006-01-30 Thread Rajesh Munavalli
ry: (primary:"ny"^1 AND secondary:"united states of america"^SLOPE 1) OR (primary:"ny united"^2 AND secondary:"states of america"^SLOPE 1) OR (primary:ny "united states"^3 AND secondary:"of america"^SLOPE 1) OR

Re: Help with indexing and query strategy

2006-01-27 Thread Rajesh Munavalli
quot;NY, USA" you should be able to retrieve 1, 2 and 3 eventhough the primary information for Doc3 is "Albany". -- Rajesh Munavalli On 1/27/06, Colin Young <[EMAIL PROTECTED]> wrote: > > The reason I only want 2 hits is because [2] is more "specific" in my &

Re: Help with indexing and query strategy

2006-01-27 Thread Rajesh Munavalli
Hi Colin, Even assuming you came up with a good way of indexing, the example query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are valid retrievals. Could you please justify which 2 hits you want and why? Thanks, Rajesh Munavalli Colin Young wrote: I'm havi

Re: Help with indexing and query strategy

2006-01-27 Thread Rajesh Munavalli
Hi Colin, Even assuming you came up with a good way of indexing, the example query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are valid retrievals. Could you please justify which 2 hits you want and why? Thanks, Rajesh Munavalli On 1/27/06, Colin Young <[EMAIL PROT

Specific length field in wild card queries

2006-01-23 Thread Rajesh Munavalli
I am aware that Lucene does not allow wildcard queries starting with "*". The aim of the query is to find "lucene" in field F1 and "group" in field F4 but should find only those documents where (1) Field F2 should not be empty. (2) Field F3 should contain ind

Re: information theory based expanded query term boosting

2006-01-19 Thread Rajesh Munavalli
on the expanded term be? Is it in the order of 10, 100 or some logarithmic scale? Do you have any results (preliminary) results on problem (2)? Thanks, Rajesh Munavalli José Ramón Pérez Agüera wrote: what articles you have read? i work in automatic query expansion and

information theory based expanded query term boosting

2006-01-19 Thread Rajesh Munavalli
empirical boost levels is that there is no cross system comparison and is highly dependent on the test bed. Thanks, Rajesh Munavalli - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Wordnet JWLN

2005-11-17 Thread Rajesh Munavalli
There is also a package from Stanford NLP group for POS tagging using WordNet. They claim to have the best accuracy. Here is the link. http://www-nlp.stanford.edu/ -Original Message- From: José Ramón Pérez Agüera [mailto:[EMAIL PROTECTED] Sent: Thu 11/17/2005 9:52 AM To: java-user@lucene

RE: IO bandwidth throttling

2005-09-01 Thread Rajesh Munavalli
Try this: (CFQ) I/O scheduler http://lwn.net/Articles/57732/ Rajesh Munavalli > -Original Message- > From: Chris Lamprecht [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 01, 2005 4:00 PM > To: java-user@lucene.apache.org > Subject: Re: IO bandwidth throttl

RE: Case-sensitive search

2005-08-22 Thread Rajesh Munavalli
case queries we don't have to OR the queries. Rajesh Munavalli > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Monday, August 22, 2005 3:33 PM > To: java-user@lucene.apache.org > Subject: Re: Case-sensitive search > > > On Aug 2

RE: Case-sensitive search

2005-08-22 Thread Rajesh Munavalli
ugh the user mistyped the case ("machine" instead of "Machine"), the query would retrieve documents. I am not sure about the performance though. Erik would be the right person to help us understand performance constraints in doing so. Rajesh Munavalli > -Original Messa

RE: Case-sensitive search

2005-08-22 Thread Rajesh Munavalli
You could also treat the case-sensitive and case-insensitive as Synonyms and index them at the same position. This would be helpful in phrase queries. Rajesh Munavalli > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Monday, August 22, 2005 10:0

RE: intra-word delimiters

2005-08-15 Thread Rajesh Munavalli
. Hope it helps... Rajesh Munavalli -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Mon 8/15/2005 7:47 PM To: java-user@lucene.apache.org Subject: Re: intra-word delimiters That was the plan, but step (4) really seems problematic. - term expansion this way can

index files

2005-08-12 Thread Rajesh Munavalli
;deletable" and "segments". Contents looked fine when I tried to see the contents of index using Luke. But dont get any results when search. What am I missing? thanks, Rajesh Munavalli

multiple fields in position increments

2005-08-02 Thread Rajesh Munavalli
In the above example, the token "Rajesh" is associated with two fields. At the time of indexing I would like to add the second token with ZERO position increment. thanks, Rajesh Munavalli

RE: n-gram indexing

2005-08-01 Thread Rajesh Munavalli
There might be other ways to do which I am not aware of. Let me know what your thoughts on this. I would really appreciate any suggestions you might have. thanks, Rajesh Munavalli -Original Message- From: [EMAIL PROTECTED] on behalf of Chris Hostetter Sent: Fri 7/2

RE: n-gram indexing

2005-07-29 Thread Rajesh Munavalli
token 0 "united", "united states", "united states of" 1 "states", "states of", "states of america" 2 "of", "of america" 3 "america" Does Lucene

RE: Relations between documents

2005-07-25 Thread Rajesh Munavalli
eve all the documents which contains "cancer" in variable "abstract" Step 2) Second query will be to retrieve all variables containing documents retrieved from Step 1 Rajesh Munavalli -Original Message- From: Magne Skjeret [mailto:[EMAIL PROTECTED] Sent: Monday, July

RE: n-gram indexing

2005-07-19 Thread Rajesh Munavalli
se : queries. I am not sure if there is a better way to achieve the same : effect. : : Thanks, : : Rajesh : : : -Original Message- : From: Andy Roberts [mailto:[EMAIL PROTECTED] : Sent: Monday, July 18, 2005 5:56 PM : To: java-user@lucene.apache.org : Subject: Re: n-gram indexing : : On Mo

RE: n-gram indexing

2005-07-18 Thread Rajesh Munavalli
Message- From: Andy Roberts [mailto:[EMAIL PROTECTED] Sent: Monday, July 18, 2005 5:56 PM To: java-user@lucene.apache.org Subject: Re: n-gram indexing On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote: > At what point do I add n-grams? Does the order in which I add n-grams > affec

n-gram indexing

2005-07-18 Thread Rajesh Munavalli
At what point do I add n-grams? Does the order in which I add n-grams affect exact phrase queries later? My questions are (1) Should I add all the 1-grams followed by 2-grams followed by 3-grams..etc sentence by sentence OR (2) Add all the 1 grams of entire document first before starting 2-grams