RE: Problem Search using lucene

2007-07-31 Thread Chhabra, Kapil
You just have to make sure that what you are searching is indexed (and esp. in the same format/case). Use Luke (http://www.getopt.org/luke/) to browse through your index. This might give you an insight of what you have indexed and what you are searching for. Regards, kapilChhabra -Original Me

Re: Problem Search using lucene

2007-07-31 Thread masz-wow
Thanks Joe I'm using this function as my analyzer public static Analyzer getDefaultAnalyzer() { PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(new StopAnalyzer()); perFieldAnalyzer.addAnalyzer("contents", new StopAnalyzer()); perFi

Re: Can I do boosting based on term postions?

2007-07-31 Thread Shailendra Sharma
Yes, it is easily doable through "Payload" facility. During indexing process (mainly tokenization), you need to push this extra information in each token. And then you can use BoostingTermQuery for using Payload value to include Payload in the score. You also need to implement Similarity for this (

extracting non-english text from word, pdf, etc....??

2007-07-31 Thread Michael Prichard
I know how to do english text with POI and PDFBox and so on. Now, I want to start indexing non-english language such as french and spanish. Which extraction libs are available for me? I want to do: Excel Word PowerPoint PDF HTML RTF Thanks! Michael --

Re: Problem Search using lucene

2007-07-31 Thread Joe Attardi
You are probably using the StandardAnalyzer which removes stop words such as "and". -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 8/1/07, masz-wow <[EMAIL PROTECTED]> wrote: > > > I understand that only document that has been indexed will be able to > search. > I already

Problem Search using lucene

2007-07-31 Thread masz-wow
I understand that only document that has been indexed will be able to search. I already manage to index the document and also search the content of the document. The problem is, why is that there are a few words that cannot be search? E.g : A document contains this sentence "So on the next Monday

Can I do boosting based on term postions?

2007-07-31 Thread Cedric Ho
Hi all, I was wondering if it is possible to do boosting by search terms' position in the document. for example: search terms appear in the first 100 words, or first 10% words, or in first two paragraphs would be given higher score. Is it achievable through using the new Payload function in luce

RE: High CPU usage duing index and search

2007-07-31 Thread Chew Yee Chuang
Hi, Thanks for the link provided, actually I've go through those article when I developing the index and search function for my application. I haven’t try profiler yet, but I monitor the CPU usage and notice that whatever index or search performing, the CPU usage raise to 100%. Below I will try to

Exact searches with PhraseQuery

2007-07-31 Thread Vijay Santhanam
Hi Guys, For some reason, I said I was using "PrefixQuery" for exact queries. What I meant to say is PhraseQuery... but the editor between my brain and fingers had gone home. The TermQuery idea may be the simplest solution, because I store the name un-tokenized for sorting purposes. Otherwise;

calling commit() on IndexReader

2007-07-31 Thread Tim Sturge
Can anyone explain to me why commit() on IndexReader is a protected method? I want to do periodic deletes from my main index. I don't want to reopen the index (all that is changing are things are being deleted), so I don't want to call close(), but I can't call commit() from outside the class

Re: Lucene Field score value

2007-07-31 Thread Mike Klaas
You can boost any clause of a query: http://lucene.apache.org/java/docs/queryparsersyntax.html title:foo^5 header:foo^2 body:foo On 31-Jul-07, at 1:00 PM, Askar Zaidi wrote: I'll have to use StringBuffer and get the Explanation in it as a String. Then parse StringBuffer to get the scores of

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
Guys, Heres someone who did this hack: http://blog.mindbridge.com/?p=55 Cheers, AZ On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > I'll have to use StringBuffer and get the Explanation in it as a String. > Then parse StringBuffer to get the scores of each field, then add them and > then

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
I'll have to use StringBuffer and get the Explanation in it as a String. Then parse StringBuffer to get the scores of each field, then add them and then boost the scores. That seems to be a non-trivial task. Is there any other way around it ? Considering Boosting, can I boost the score of a field

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
Using the Explanation method can help me get the exact score of a field. I am concerned with how I can access it , this is what I am doing: for(int i=0;i wrote: > > Boost the other three fields at search time. Boosting during > index time expresses "this document's title is worth more than > oth

Re: Lucene Field score value

2007-07-31 Thread Erick Erickson
Boost the other three fields at search time. Boosting during index time expresses "this document's title is worth more than other doucments' titles". Boosting during search time expresses "I care about matches on this clause more than I do on other clauses". Will it help? How should I know? It's *

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
Boosting during Indexing or boosting during search ? I have 4 fields: {tags},{title},{summary},{contents} Typically a phrase occurs too many times in contents as compared to the other fields. If I get the score of contents field , I can pass it through an adjuster function which will bring the s

Re: Lucene Field score value

2007-07-31 Thread Erick Erickson
Wouldn't boosting handle this for you? On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > To be more specific: > > I want to retrieve the scores of individual fields inside a document so > that > I can manipulate the score of one field. This is the requirement of my > application. After the ma

RE: How to show category count with results?

2007-07-31 Thread Ard Schrijvers
Hello Shailendra, AFAICS you are reasoning from a static doc-id POV, while documents do not have a static doc-id in lucene. When you have a frequently updated index, you'll end up invalidating cached BitSet's (which as the number of categories and number of documents grow can absorb quite amoun

Re: Lucene Field score value

2007-07-31 Thread Shailendra Sharma
Though I am not sure what is the possible use case for thing like below, but here is the pointer: Using IndexSearcher you can get the "Explanation" for the given query and document-id. Complex Explanation has multiple sub-explanations and so forth. Simple Explanation would contain the weight of th

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
To be more specific: I want to retrieve the scores of individual fields inside a document so that I can manipulate the score of one field. This is the requirement of my application. After the manipulation I can add these scores and then show the total. thanks, AZ On 7/31/07, Askar Zaidi <[EMAIL

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
Hi, Does anyone know how to retrieve the score of an individual field instead of doing: hits = score(i); This will get me the entire score of the document. I'd like to get the score of a single field by specifying the field name. thanks, AZ On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > >

Re: How to show category count with results?

2007-07-31 Thread Shailendra Sharma
A better way is following: Cache the list of doc-ids for each category - you can cache this in a BitSet.. a bit at index "doc-id" is on if the category is present in document "doc-id", else it is off. For user query, you need to calculate the BitSet, similar to above way. This can be done in a Hit

Clustered Indexing on common network filesystem

2007-07-31 Thread Zach Bailey
Hello all, First a little background - we are developing a clustered application that will in part leverage Lucene to provide index and search capabilities. We have already spent time investigating various index storage implementations (database vs. filesystem) and we've decided for performan

Lucene Field score value

2007-07-31 Thread Askar Zaidi
Hey guys, I was wondering if there is a way to retrieve score of a field in a document ? If my document looks like this: {itemID},{field 1},{field 2} I'd like to get score of individual fields 1 and 2 rather than the score of the entire document. Is it possible ? thanks, AZ

Re: Search query with wildcard and spaces

2007-07-31 Thread Erick Erickson
You're going to have to delve into the details of what the various analyzers do. And perhaps write your own. The syntax "something and"*, with the asterisk outside the quotes isn't supported syntax as far as I know. Adding quotes changes the syntax, so "some word*" is a phrase query, which probab

Re: Problem in Lucene

2007-07-31 Thread Srinivasarao Vundavalli
The code that is making use of that makeStopFilter is not written by me. It has read-only permission. So, I can't make any changes to it. On 7/31/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Why not fix your code to be 2.1 compliant instead? For instance, > StopFilter has a constructor that t

Re: Problem in Lucene

2007-07-31 Thread Erick Erickson
Why not fix your code to be 2.1 compliant instead? For instance, StopFilter has a constructor that takes Set and a constructor that takes an array of String for stopwords. Otherwise, please tell us more about what you are doing with MakeStopTable and why making your code 2.1 compliant isn't an op

RE: Search query with wildcard and spaces

2007-07-31 Thread jean-eric . cuendet
> is this just one single example of different words that should > return the same results? You might consider implementing a synonym > analyzer otherwise. No, the query should match all of them. The query: NAME:De Agos* AND FIRST:Maria should return 2 documents: NAME: De agostino FIRST: M

RE: Search query with wildcard and spaces

2007-07-31 Thread Ard Schrijvers
Hello, is this just one single example of different words that should return the same results? You might consider implementing a synonym analyzer otherwise. In your case, storing NAME as UN_TOKENIZED should enable your NAME:"De Agos"* search Regards Ard > > Hi, > I would like to make a searc

Search query with wildcard and spaces

2007-07-31 Thread jean-eric . cuendet
Hi, I would like to make a search query that should match the following documents: NAME: De agostino FIRST: Maria NAME: De agostato FIRST: Maria How to design the query? The following: NAME:De Agos* AND FIRST:Maria Doesn't work since there is a space in the name. And: NAME:"De

Re: Exact field searches

2007-07-31 Thread karl wettin
31 jul 2007 kl. 12.00 skrev karl wettin: 31 jul 2007 kl. 10.23 skrev Vijay Santhanam: How do I make search for a specific number of tokens in a field? I think you are looking for SpanFirstQuery. Also, this is a similar thread with alternative solutions: http://www.nabble.com/Search-for-do

Re: Exact field searches

2007-07-31 Thread karl wettin
31 jul 2007 kl. 10.23 skrev Vijay Santhanam: How do I make search for a specific number of tokens in a field? I think you are looking for SpanFirstQuery. -- karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional co

Re: How to get FastAnalyzer?

2007-07-31 Thread karl wettin
31 jul 2007 kl. 08.37 skrev SK R: https://issues.apache.org/jira/browse/LUCENE-966 . But they are in txt format and how can i get and test that improved analyzer?(please provide the steps) Those are patches created with "svn diff". You use "patch" to apply them on the source code. http:

RE: Exact field searches

2007-07-31 Thread Steinert, Fabian
Hi Vijay, with a frequent usage pattern of searching (exactly) for a whole fields value (e.g. the whole name) it may be worth to store that field (name:) twice: 1) as field name_tokenized: with Field.Index.TOKENIZED for normal "contains" querys and 2) as field name_untokenized: with Field.Index.

Re: High CPU usage duing index and search

2007-07-31 Thread karl wettin
31 jul 2007 kl. 05.25 skrev Chew Yee Chuang: But just notice that when Lucene performing search or index, the CPU usage on my machine raise to 100%, because of this issue, some of my others backend process will slow down eventually. Just want to know does anyone face this problem before ? an

Exact field searches

2007-07-31 Thread Vijay Santhanam
Hi Guys, Currently I construct a PrefixQuery to exact search through an index of documents that represent Compact Discs, something like www.discogs.com. On the search page, we offer a suggestion list as the user enters text, like google suggest. When a user selects an item out of this list, we ma