>
>
> So I guess its done by writing or extending an anylzer?
>
Yes...thats correct.
--Rajesh Munavalli
Blog: http://munavalli.blogspot.com
ential acronym. For ex:
- All Caps
- The acronym appears repeatedly in the rest of the text
- Found in the acronym dictionary...etc
Hope this helps,
--Rajesh Munavalli
Blog: http://munavalli.blogspot.com
made with thinlets. I had not heard of
> this...I'll see if it helps me figure out whats going on.
>
> --Bill
>
>
> On 4/14/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote:
> >
> > It would be helpful to download Luke (http://www.getopt.org/luke/
It would be helpful to download Luke (http://www.getopt.org/luke/) and
analyze whats getting indexed. Have you tried that?
On 4/14/06, Bill Snyder <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> We am using Lucene to facilitate searching of our applications log files.
> I
> am noticing some inconsistenc
Can someone tell me where I can find the source code for SearchBean (Lucene
Sandbox)?
Thanks,
--Rajesh
at the number of
documents N would be much less than the total number of documents in the
index, is it better to query them on reduced number of N documents as in
Option 2?
Thanks,
Rajesh Munavalli
This has been discussed previously. Here are the links
http://www.gossamer-threads.com/lists/lucene/java-user/9189#9189
http://www.gossamer-threads.com/lists/lucene/java-user/32362#32362
Hope that helps,
Rajesh Munavalli
On 3/1/06, Srikanth Kallurkar <[EMAIL PROTECTED]> wrote:
>
NE-413
> but this will only help to get an impression of how to match in the
> ordered
> and unordered cases.
> It might be possible to generalize the various span algorithms there and
> in the trunk to work with fewer "terms".
>
I will consider that option.
Thanks,
Rajesh Munavalli
ow much the score is influenced by the proximity of
> the words in the query, vs the frequency of hte phrases in the docs, see
> my recent posting about the use of tf in Similarity -- which i think is
> accurate since nobody replied and said i was wrong...
>
>
> http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html
I will take a closer look at the explaination.
Thanks,
Rajesh Munavalli
pproximate the rankings I am expecting. In that case which of the
following queries will perform better (in terms of QUERY SPEED and RANKING)
(a) phrase query with certain slope factor
(b) span query
Thanks,
Rajesh Munavalli
ld
field1:t1 t2 t3 t4 AND field2:t5 t6
field2:t1 t2 t3 t4 AND field2:t6 t7
field2:t1 t2 t3 t4 AND field2:t5 t7
...
...
Rank 3: Two terms missing from either of the field
...
Rank n: Only one term exists in both field1 and field 2
Thanks,
Rajesh Munavalli
t;indemnity" (actual
synonyms for "car" and "insurance" retrieved from WordNet).
--
Rajesh Munavalli
On 1/31/06, Klaus <[EMAIL PROTECTED]> wrote:
>
> Hi Leon,
>
> have you tried the WorldNet ad-on? You can easily expand the query with
> synonyms.
>
ly those having high TF)
with query terms. The intution is that words co-occurring are related.
Google for "local global document analysis" and "word co-occurrence
similarity"
Rajesh Munavalli
On 1/30/06, Leon Chaddock <[EMAIL PROTECTED]> wrote:
>
> Hi,
hem
indexDocs(new File(file, files[i]));
}
}
--
Rajesh Munavalli
On 1/31/06, Azlan Abdul Latiff <[EMAIL PROTECTED]> wrote:
>
> how can I index the whole hard drive? I tried using "c:/" but it didnt
> work.
>
> The results only return c:/ directory where
ry:
(primary:"ny"^1 AND secondary:"united states of america"^SLOPE 1) OR
(primary:"ny united"^2 AND secondary:"states of america"^SLOPE 1) OR
(primary:ny "united states"^3 AND secondary:"of america"^SLOPE 1) OR
quot;NY, USA" you should be able to retrieve 1, 2 and 3 eventhough the
primary information for Doc3 is "Albany".
--
Rajesh Munavalli
On 1/27/06, Colin Young <[EMAIL PROTECTED]> wrote:
>
> The reason I only want 2 hits is because [2] is more "specific" in my
&
Hi Colin,
Even assuming you came up with a good way of indexing, the
example query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are
valid retrievals. Could you please justify which 2 hits you want and why?
Thanks,
Rajesh Munavalli
Colin Young wrote:
I'm havi
Hi Colin,
Even assuming you came up with a good way of indexing, the example
query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are valid
retrievals. Could you please justify which 2 hits you want and why?
Thanks,
Rajesh Munavalli
On 1/27/06, Colin Young <[EMAIL PROT
I am aware that Lucene does not allow wildcard queries starting with
"*". The aim of the query is to find "lucene" in field F1 and "group" in
field F4 but should find only those documents where
(1) Field F2 should not be empty.
(2) Field F3 should contain ind
on
the expanded term be? Is it in the order of 10, 100 or some logarithmic
scale?
Do you have any results (preliminary) results on problem (2)?
Thanks,
Rajesh Munavalli
José Ramón Pérez Agüera wrote:
what articles you have read? i work in automatic query expansion and
empirical boost levels is that there is no cross system comparison
and is highly dependent on the test bed.
Thanks,
Rajesh Munavalli
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
There is also a package from Stanford NLP group for POS tagging using WordNet.
They claim to have the best accuracy. Here is the link.
http://www-nlp.stanford.edu/
-Original Message-
From: José Ramón Pérez Agüera [mailto:[EMAIL PROTECTED]
Sent: Thu 11/17/2005 9:52 AM
To: java-user@lucene
Try this: (CFQ) I/O scheduler
http://lwn.net/Articles/57732/
Rajesh Munavalli
> -Original Message-
> From: Chris Lamprecht [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 01, 2005 4:00 PM
> To: java-user@lucene.apache.org
> Subject: Re: IO bandwidth throttl
case queries we don't have to OR the queries.
Rajesh Munavalli
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 22, 2005 3:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: Case-sensitive search
>
>
> On Aug 2
ugh the user mistyped the case ("machine" instead of "Machine"),
the query would retrieve documents. I am not sure about the performance
though. Erik would be the right person to help us understand performance
constraints in doing so.
Rajesh Munavalli
> -Original Messa
You could also treat the case-sensitive and case-insensitive as Synonyms
and index them at the same position. This would be helpful in phrase
queries.
Rajesh Munavalli
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 22, 2005 10:0
.
Hope it helps...
Rajesh Munavalli
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Mon 8/15/2005 7:47 PM
To: java-user@lucene.apache.org
Subject: Re: intra-word delimiters
That was the plan, but step (4) really seems problematic.
- term expansion this way can
;deletable" and "segments". Contents looked fine when I tried to see the
contents of index using Luke. But dont get any results when search. What
am I missing?
thanks,
Rajesh Munavalli
In the above example, the token "Rajesh" is associated with two fields.
At the time of indexing I would like to add the second token with ZERO position
increment.
thanks,
Rajesh Munavalli
There might be other ways to do which I am not aware of. Let me know
what your thoughts on this. I would really appreciate any suggestions you might
have.
thanks,
Rajesh Munavalli
-Original Message-
From: [EMAIL PROTECTED] on behalf of Chris Hostetter
Sent: Fri 7/2
token
0 "united", "united states", "united states of"
1 "states", "states of", "states of america"
2 "of", "of america"
3 "america"
Does Lucene
eve all the documents
which contains "cancer" in variable "abstract"
Step 2) Second query will be to retrieve all variables containing
documents retrieved from Step 1
Rajesh Munavalli
-Original Message-
From: Magne Skjeret [mailto:[EMAIL PROTECTED]
Sent: Monday, July
se
: queries. I am not sure if there is a better way to achieve the same
: effect.
:
: Thanks,
:
: Rajesh
:
:
: -Original Message-
: From: Andy Roberts [mailto:[EMAIL PROTECTED]
: Sent: Monday, July 18, 2005 5:56 PM
: To: java-user@lucene.apache.org
: Subject: Re: n-gram indexing
:
: On Mo
Message-
From: Andy Roberts [mailto:[EMAIL PROTECTED]
Sent: Monday, July 18, 2005 5:56 PM
To: java-user@lucene.apache.org
Subject: Re: n-gram indexing
On Monday 18 Jul 2005 21:27, Rajesh Munavalli wrote:
> At what point do I add n-grams? Does the order in which I add n-grams
> affec
At what point do I add n-grams? Does the order in which I add n-grams
affect exact phrase queries later? My questions are
(1) Should I add all the 1-grams followed by 2-grams followed by
3-grams..etc sentence by sentence OR
(2) Add all the 1 grams of entire document first before starting 2-grams
35 matches
Mail list logo