You just have to make sure that what you are searching is indexed (and
esp. in the same format/case).
Use Luke (http://www.getopt.org/luke/) to browse through your index.
This might give you an insight of what you have indexed and what you are
searching for.
Regards,
kapilChhabra
-Original Me
Thanks Joe
I'm using this function as my analyzer
public static Analyzer getDefaultAnalyzer() {
PerFieldAnalyzerWrapper perFieldAnalyzer = new
PerFieldAnalyzerWrapper(new
StopAnalyzer());
perFieldAnalyzer.addAnalyzer("contents", new StopAnalyzer());
perFi
Yes, it is easily doable through "Payload" facility. During indexing process
(mainly tokenization), you need to push this extra information in each
token. And then you can use BoostingTermQuery for using Payload value to
include Payload in the score. You also need to implement Similarity for this
(
I know how to do english text with POI and PDFBox and so on. Now, I want to
start indexing non-english language such as french and spanish. Which
extraction libs are available for me?
I want to do:
Excel
Word
PowerPoint
PDF
HTML
RTF
Thanks!
Michael
--
You are probably using the StandardAnalyzer which removes stop words such as
"and".
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 8/1/07, masz-wow <[EMAIL PROTECTED]> wrote:
>
>
> I understand that only document that has been indexed will be able to
> search.
> I already
I understand that only document that has been indexed will be able to search.
I already manage to index the document and also search the content of the
document.
The problem is, why is that there are a few words that cannot be search?
E.g : A document contains this sentence
"So on the next Monday
Hi all,
I was wondering if it is possible to do boosting by search terms'
position in the document.
for example:
search terms appear in the first 100 words, or first 10% words, or in
first two paragraphs would be given higher score.
Is it achievable through using the new Payload function in luce
Hi,
Thanks for the link provided, actually I've go through those article when I
developing the index and search function for my application. I haven’t try
profiler yet, but I monitor the CPU usage and notice that whatever index or
search performing, the CPU usage raise to 100%. Below I will try to
Hi Guys,
For some reason, I said I was using "PrefixQuery" for exact queries.
What I meant to say is PhraseQuery... but the editor between my brain and
fingers had gone home.
The TermQuery idea may be the simplest solution, because I store the name
un-tokenized for sorting purposes.
Otherwise;
Can anyone explain to me why commit() on IndexReader is a protected method?
I want to do periodic deletes from my main index. I don't want to reopen
the index (all that is changing are things are being deleted), so I
don't want to call close(), but I can't call commit() from outside the
class
You can boost any clause of a query:
http://lucene.apache.org/java/docs/queryparsersyntax.html
title:foo^5 header:foo^2 body:foo
On 31-Jul-07, at 1:00 PM, Askar Zaidi wrote:
I'll have to use StringBuffer and get the Explanation in it as a
String.
Then parse StringBuffer to get the scores of
Guys,
Heres someone who did this hack:
http://blog.mindbridge.com/?p=55
Cheers,
AZ
On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote:
>
> I'll have to use StringBuffer and get the Explanation in it as a String.
> Then parse StringBuffer to get the scores of each field, then add them and
> then
I'll have to use StringBuffer and get the Explanation in it as a String.
Then parse StringBuffer to get the scores of each field, then add them and
then boost the scores. That seems to be a non-trivial task. Is there any
other way around it ?
Considering Boosting, can I boost the score of a field
Using the Explanation method can help me get the exact score of a field. I
am concerned with how I can access it , this is what I am doing:
for(int i=0;i wrote:
>
> Boost the other three fields at search time. Boosting during
> index time expresses "this document's title is worth more than
> oth
Boost the other three fields at search time. Boosting during
index time expresses "this document's title is worth more than
other doucments' titles". Boosting during search time expresses
"I care about matches on this clause more than I do on other
clauses".
Will it help? How should I know? It's *
Boosting during Indexing or boosting during search ?
I have 4 fields:
{tags},{title},{summary},{contents}
Typically a phrase occurs too many times in contents as compared to the
other fields. If I get the score of contents field , I can pass it through
an adjuster function which will bring the s
Wouldn't boosting handle this for you?
On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote:
>
> To be more specific:
>
> I want to retrieve the scores of individual fields inside a document so
> that
> I can manipulate the score of one field. This is the requirement of my
> application. After the ma
Hello Shailendra,
AFAICS you are reasoning from a static doc-id POV, while documents do not have
a static doc-id in lucene. When you have a frequently updated index, you'll end
up invalidating cached BitSet's (which as the number of categories and number
of documents grow can absorb quite amoun
Though I am not sure what is the possible use case for thing like below, but
here is the pointer:
Using IndexSearcher you can get the "Explanation" for the given query and
document-id. Complex Explanation has multiple sub-explanations and so forth.
Simple Explanation would contain the weight of th
To be more specific:
I want to retrieve the scores of individual fields inside a document so that
I can manipulate the score of one field. This is the requirement of my
application. After the manipulation I can add these scores and then show the
total.
thanks,
AZ
On 7/31/07, Askar Zaidi <[EMAIL
Hi,
Does anyone know how to retrieve the score of an individual field instead of
doing:
hits = score(i); This will get me the entire score of the document. I'd like
to get the score of a single field by specifying the field name.
thanks,
AZ
On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote:
>
>
A better way is following:
Cache the list of doc-ids for each category - you can cache this in a
BitSet.. a bit at index "doc-id" is on if the category is present in
document "doc-id", else it is off.
For user query, you need to calculate the BitSet, similar to above way. This
can be done in a Hit
Hello all,
First a little background - we are developing a clustered application
that will in part leverage Lucene to provide index and search
capabilities. We have already spent time investigating various index
storage implementations (database vs. filesystem) and we've decided for
performan
Hey guys,
I was wondering if there is a way to retrieve score of a field in a document
?
If my document looks like this:
{itemID},{field 1},{field 2}
I'd like to get score of individual fields 1 and 2 rather than the score of
the entire document.
Is it possible ?
thanks,
AZ
You're going to have to delve into the details of what the various analyzers
do. And perhaps write your own.
The syntax "something and"*, with the asterisk outside the quotes isn't
supported syntax as far as I know.
Adding quotes changes the syntax, so "some word*" is a phrase query,
which probab
The code that is making use of that makeStopFilter is not written by me. It
has read-only permission. So, I can't make any changes to it.
On 7/31/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Why not fix your code to be 2.1 compliant instead? For instance,
> StopFilter has a constructor that t
Why not fix your code to be 2.1 compliant instead? For instance,
StopFilter has a constructor that takes Set and a constructor
that takes an array of String for stopwords.
Otherwise, please tell us more about what you are doing with
MakeStopTable and why making your code 2.1 compliant isn't an op
> is this just one single example of different words that should
> return the same results? You might consider implementing a synonym
> analyzer otherwise.
No, the query should match all of them.
The query:
NAME:De Agos* AND FIRST:Maria
should return 2 documents:
NAME: De agostino
FIRST: M
Hello,
is this just one single example of different words that should return the same
results? You might consider implementing a synonym analyzer otherwise.
In your case, storing NAME as UN_TOKENIZED should enable your NAME:"De Agos"*
search
Regards Ard
>
> Hi,
> I would like to make a searc
Hi,
I would like to make a search query that should match the following
documents:
NAME: De agostino
FIRST: Maria
NAME: De agostato
FIRST: Maria
How to design the query? The following:
NAME:De Agos* AND FIRST:Maria
Doesn't work since there is a space in the name. And:
NAME:"De
31 jul 2007 kl. 12.00 skrev karl wettin:
31 jul 2007 kl. 10.23 skrev Vijay Santhanam:
How do I make search for a specific number of tokens in a field?
I think you are looking for SpanFirstQuery.
Also, this is a similar thread with alternative solutions:
http://www.nabble.com/Search-for-do
31 jul 2007 kl. 10.23 skrev Vijay Santhanam:
How do I make search for a specific number of tokens in a field?
I think you are looking for SpanFirstQuery.
--
karl
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional co
31 jul 2007 kl. 08.37 skrev SK R:
https://issues.apache.org/jira/browse/LUCENE-966 . But they are in
txt
format and how can i get and test that improved analyzer?(please
provide the
steps)
Those are patches created with "svn diff". You use "patch" to apply them
on the source code.
http:
Hi Vijay,
with a frequent usage pattern of searching (exactly) for a whole fields
value (e.g. the whole name) it may be worth to store that field (name:)
twice:
1) as field name_tokenized: with Field.Index.TOKENIZED for normal
"contains" querys and
2) as field name_untokenized: with Field.Index.
31 jul 2007 kl. 05.25 skrev Chew Yee Chuang:
But just notice that when Lucene performing search or index,
the CPU usage on my machine raise to 100%, because of this issue,
some of my
others backend process will slow down eventually. Just want to know
does
anyone face this problem before ? an
Hi Guys,
Currently I construct a PrefixQuery to exact search through an index of
documents that represent Compact Discs, something like www.discogs.com.
On the search page, we offer a suggestion list as the user enters text, like
google suggest.
When a user selects an item out of this list, we ma
36 matches
Mail list logo