Re: Keep hits in results

2006-09-05 Thread jacky
doron, Thanks! But in lucene api: For performance reasons it is recommended to open only one IndexSearcher and use it for all of your searches. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSearcher.html That is a searcher will remain open unless you update the index. So

Re: Doc add limit, im experiencing it too

2006-09-05 Thread Michael Imbeault
Yeah sorry about that, I hit the wrong one :( Posting at 3am is never a good thing! To bed! Doron Cohen wrote: I believe this should go to the solr-user@lucene.apache.org ? Michael Imbeault <[EMAIL PROTECTED]> wrote on 05/09/2006 23:26:55: -- Michael Imbeault CHUL Research Center (CHUQ) 2

Re: which way to index pdf,word,excel

2006-09-05 Thread James liu
thk,,,Cohen and lin. 2006/9/6, Doron Cohen <[EMAIL PROTECTED]>: I think that Nutch would crawl and search all these 3 types. Not sure that Nutch would provide the framework you seem to look for, but perhaps it is worth to take a look - http://lucene.apache.org/nutch/ "James liu" <[EMAIL PROT

Re: Doc add limit, im experiencing it too

2006-09-05 Thread Doron Cohen
I believe this should go to the solr-user@lucene.apache.org ? Michael Imbeault <[EMAIL PROTECTED]> wrote on 05/09/2006 23:26:55: > Old issue (see > http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html), > but I'm experiencing the same exact thing on windows xp, latest tomcat. > I

Re: which way to index pdf,word,excel

2006-09-05 Thread Doron Cohen
I think that Nutch would crawl and search all these 3 types. Not sure that Nutch would provide the framework you seem to look for, but perhaps it is worth to take a look - http://lucene.apache.org/nutch/ "James liu" <[EMAIL PROTECTED]> wrote on 05/09/2006 23:10:16: > i wanna find frame which can

Doc add limit, im experiencing it too

2006-09-05 Thread Michael Imbeault
Old issue (see http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html), but I'm experiencing the same exact thing on windows xp, latest tomcat. I noticed that the tomcat process gobbles memory (10 megs a second maybe) and then jams at 125 megs. Can't find a fix yet. I'm using a p

Re: which way to index pdf,word,excel

2006-09-05 Thread James liu
i wanna find frame which can index xml,word,excel,pdf,,,not one. i just wanna know who know the frame like what i wanna. 2006/9/6, yueyu lin <[EMAIL PROTECTED]>: First, Lucene is just a index toolkit, you have to USE it to implement your application. If you want to index something, you must

Re: Keep hits in results

2006-09-05 Thread Doron Cohen
Hits is not really a simple container - it references a certain searcher - that same searcher that was used to find these hits. When a request for a result document is made, the Hits object delegates this request to the searcher. So in order to "page through" the results using an existing Hits obje

Re: which way to index pdf,word,excel

2006-09-05 Thread yueyu lin
First, Lucene is just a index toolkit, you have to USE it to implement your application. If you want to index something, you must have knowledge how to extract information from them and what kind of keys they need to be set. Then you can do what you want to. On 9/5/06, James liu <[EMAIL PROTECTE

Re: which way to index pdf,word,excel

2006-09-05 Thread James liu
i wanna find frame which can index xml,word,excel,pdf,,,not one. 2006/9/6, Doron Cohen <[EMAIL PROTECTED]>: Lucene FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ - has a few entries just for this: How can I index HTML documents? How can I index XML documents? How can I index Open

Keep hits in results

2006-09-05 Thread jacky
hi, The following words are quoted from "lucene in action": "There are a couple of implementation approaches: 1. Keep the original Hits and IndexSearcher instances available while the user is navigating the search results. 2. Requery each time the user navigates to a new page. It turns out th

Re: which way to index pdf,word,excel

2006-09-05 Thread Doron Cohen
Lucene FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ - has a few entries just for this: How can I index HTML documents? How can I index XML documents? How can I index OpenOffice.org files? How can I index MS-Word documents? How can I index MS-Excel documents? How can I index MS

which way to index pdf,word,excel

2006-09-05 Thread James liu
i find lius many question so i wanna give up and find new. who recommend ?

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Chris Hostetter
: Sorry for the confusion and thanks for taking the time to educate me. So, if : I am just indexing literal values, what is the best way to do that (what : analyzer)? Sounds like this approach, even though it works, is not the : preferred method. if you truely want just the literal values then

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Philip Brown
Sorry for the confusion and thanks for taking the time to educate me. So, if I am just indexing literal values, what is the best way to do that (what analyzer)? Sounds like this approach, even though it works, is not the preferred method. analyzer = new PerFieldAnalyzerW

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Chris Hostetter
1) consider using JUnit tests .. it makes it a lot easier for other people to understand your expecations, and if it winds up demonstraing a genuine bug in Lucene, it's easy to add to the test tree. 2) as i said before, your fields must be TOKENIZED, or your analyzer is irrelevant at index time.

Re: parser question

2006-09-05 Thread Mark Miller
QueryParser.setDefaultOperator(Operator op) Chris Salem wrote: With all the parsers I have tried a space in a query, such as doing a search for "sales manager", interprets the space as an OR, is there a way to change it so that it interprets a space as an AND? Chris Salem 440.946.5214 x5458

parser question

2006-09-05 Thread Chris Salem
With all the parsers I have tried a space in a query, such as doing a search for "sales manager", interprets the space as an OR, is there a way to change it so that it interprets a space as an AND? Chris Salem 440.946.5214 x5458 [EMAIL PROTECTED] (The following links were included with this e

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Philip Brown
Here's a little sample program (borrowed some code from Erick Erickson :)). Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference in the output. Is this what you'd expect? - Philip package com.test; import java.io.IOException; import java.util.HashSet; import java.util.regex.

jvm crashes on FieldCache.DEFAULT.getStrings(reader, field);

2006-09-05 Thread Doron Cohen
[discussion moved here from dev-list] Could it be an out-of-mem error? Can you run it with a debugger, to see what really happens? JVMs usually create a javacore file, and in case of an out-of-mem also a heapdump file - these give more info on the problem. In case this file was not created in thi

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Mark Miller
Some info to help you on you're journey :) 1. If you add a field as untokenized then it will not be analyzed when added to the index. However, QueryParser will not know that this happened and will tokenize queries on that field. 2. The solution that Hoss has explained to you is to leave the defa

Re: WildcardFilter

2006-09-05 Thread eks dev
I would rather use this BitSet bits = new BitSet(reader.maxDocs()); //Not sure of exact method, lucene is not on this PC... instead of = new BitSet(reader.maxDocs()) - Original Message From: Mark Miller <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 5 September, 200

Re: WildcardFilter

2006-09-05 Thread Chris Hostetter
: Could someone with some experience spot-check this WildcardFilter...it seems : to work fine in simple testing, but I'd like to know if there are any : glaring deficiencies. Have not had much to do with filters before. It looks fine to me. -Hoss --

Re: Scoring based on fields and categorization

2006-09-05 Thread Chris Hostetter
: the contents, but also two numerical values in other document fields. For : example, let’s assume that the normal score for Document A is 0.33 (as : calculated by Lucene). What I need is that it’s true score is 0.33 * (value : of field A) * (value of field B). What is the best way to accomplish

WildcardFilter

2006-09-05 Thread Mark Miller
Could someone with some experience spot-check this WildcardFilter...it seems to work fine in simple testing, but I'd like to know if there are any glaring deficiencies. Have not had much to do with filters before. public class WildcardFilter extends Filter { private Term term; public Wild

Re: Phrase search using quotes -- special Tokenizer

2006-09-05 Thread Chris Hostetter
: So, if I do as you suggest below (using PerFieldAnalyzerWrapper with : StandardAnalyzer) then I still need to enclose in quotes the phrases : (keywords with spaces) when I issue the search, and they are only returned Yes, quotes will be neccessary to tell the QueryParser "this is one chunk of t

Re: obtaining the number of documents stored in a .cfs file

2006-09-05 Thread Andrzej Bialecki
Stanislav Jordanov wrote: Suppose I have a bunch of valid .cfs files while the segmens/segments.new file is missing or invalid. The task is to 'recover' the present .cfs files into a valid index. I think it will be necessary and sufficient to create a segments file that references the .cfs file

Re: Filter inside SpanQuery

2006-09-05 Thread Paul Elschot
On Tuesday 05 September 2006 15:59, Mark Miller wrote: > Okay, more realistically, anyone have any experience with Randy Puttnick's > modifaction of wildcardquery and fuzzyquery? Any ideas on getting something > like those in a SpanQuery? You can use the IndexSearcher method that searches a query

Re: Highlighting "really" found terms

2006-09-05 Thread Karel Tejnora
Not for now, but I'd like to contribute span support soon. Karel An alternative highlighter implementation was recently contributed here: http://issues.apache.org/jira/browse/LUCENE-644?page=all I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it wi

Re: Highlighting "really" found terms

2006-09-05 Thread mark harwood
See here for a thread reviewing the challenges and possible solutions associated with this problem: http://www.mail-archive.com/java-user@lucene.apache.org/msg02543.html An alternative highlighter implementation was recently contributed here: http://issues.apache.org/jira/browse/LUCENE-644?

RE: Filter inside SpanQuery

2006-09-05 Thread Mark Miller
Okay, more realistically, anyone have any experience with Randy Puttnick's modifaction of wildcardquery and fuzzyquery? Any ideas on getting something like those in a SpanQuery? - Mark

Filter inside SpanQuery

2006-09-05 Thread Mark Miller
Anybody experimented with a filter in a spanquery? Pipedream? thanks, Mark

Highlighting "really" found terms

2006-09-05 Thread Pierre Van Ingelandt
Hello, After a search, I need to highlight only the terms that do "really" correspond to the query. For instance : 1/ I search docs with toto and titi in the SAME sentence (using SpanNotQuery(spanNearQuery({"toto","titi"},9)),".") ) 2/ Then I try to highlight "toto" and "titi" found (I use the

obtaining the number of documents stored in a .cfs file

2006-09-05 Thread Stanislav Jordanov
Suppose I have a bunch of valid .cfs files while the segmens/segments.new file is missing or invalid. The task is to 'recover' the present .cfs files into a valid index. I think it will be necessary and sufficient to create a segments file that references the .cfs files. The only problem I've en

Re: IndexSearcher executed concurrently

2006-09-05 Thread jacky
Oh, that is great! I didn't notice this javadoc. Maybe i need to update my lucene lib. I had thought one user requests his query, other queries maybe impact on the result since using a single IndexSearcher. Forget these mails. Thanks a lot.. On 9/5/06, karl wettin <[EMAIL PROTECTED]>

RE: Scoring based on fields and categorization

2006-09-05 Thread karl wettin
On Tue, 2006-09-05 at 13:32 +0100, Gonçalo Gaiolas wrote: > should this boosting occur during index time or at query time? I'm a > bit confused as to where should I apply this boost in order to affect > the results of a search query. You boost at index time. -

RE: Scoring based on fields and categorization

2006-09-05 Thread Gonçalo Gaiolas
Hi Karl, Thanks for the super quick response! One question - should this boosting occur during index time or at query time? I'm a bit confused as to where should I apply this boost in order to affect the results of a search query. Once again thanks a lot! Gonçalo -Original Message- Fro

Re: IndexSearcher executed concurrently

2006-09-05 Thread karl wettin
On Tue, 2006-09-05 at 17:57 +0800, jacky wrote: > 1. I wander if concurrent users can get the right results with > different queries since the class has only one IndexSearcher instance. > > 2. As we know, a new IndexSearcher can be created when user request > his query. If first method gets the r

Re: Scoring based on fields and categorization

2006-09-05 Thread karl wettin
On Tue, 2006-09-05 at 11:54 +0100, Gonçalo Gaiolas wrote: > - Scoring should take in consideration not only the relevance of > the contents, but also two numerical values in other document fields. For > example, let’s assume that the normal score for Document A is 0.33 (as > calculated by

Re: Where do I get org.apache.commons.collections package sources?

2006-09-05 Thread karl wettin
On Tue, 2006-09-05 at 02:38 -0700, Venkateshprasanna wrote: > I saw these classes and want to use them for my implementation as well. > But I am not getting the source code for the specified package: > org.apache.commons.collections http://jakarta.apache.org/commons/collections/ ---

Scoring based on fields and categorization

2006-09-05 Thread Gonçalo Gaiolas
Hi there, I need to make two changes to Lucene : - Scoring should take in consideration not only the relevance of the contents, but also two numerical values in other document fields. For example, let’s assume that the normal score for Document A is 0.33 (as calculated by Lucene).

IndexSearcher executed concurrently

2006-09-05 Thread jacky
hi, The source code in the end is the class to search sth. 1. I wander if concurrent users can get the right results with different queries since the class has only one IndexSearcher instance. 2. As we know, a new IndexSearcher can be created when user request his query. If first metho

Re: QueryParser returns all documents

2006-09-05 Thread Laurent Hoss
Why not add a single Field to each Document, like |d.add(*new *Field("doctype","document", Field.Store.YES, Field.Index.TOKENIZED));| Then searching for "doctype:document" returns all documents -Laurent lude wrote: Why would you want to do this? This is a 'feature-request' of our searcheng

Re: QueryParser returns all documents

2006-09-05 Thread Ronnie Kolehmainen
You could define your own query syntax (for example an empty string) for a query matching all docs, examine the query string before passing it to QueryParser, and instead create a MatchAllDocsQuery when a you have a match. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/MatchAll

Re: QueryParser returns all documents

2006-09-05 Thread lude
Why would you want to do this? This is a 'feature-request' of our searchengine. The user should have the possibilty to query for all(!) documents. This would allow him to see all available document listet. Is there a simple way to define a query that returns all documents of an index? Thanks l

Where do I get org.apache.commons.collections package sources?

2006-09-05 Thread Venkateshprasanna
I saw these classes and want to use them for my implementation as well. But I am not getting the source code for the specified package: org.apache.commons.collections Is there any other way of implementing the same? Why only classes from that package has to be used? Regards, Venkateshprasanna