hyphen not being removed by standard filter

2006-02-22 Thread Mufaddal Khumri
Hi, I might be missing something. I have a custom analyzer the gist of which is: public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new StandardTokenizer(reader); result = new StandardFilter(result);

Re: TREC,INEX and Lucene

2006-02-22 Thread Dave Kor
Malcom, I've used Lucene in TREC last year in my QA list module, as have many of my contempories. On 2/22/06, Malcolm Clark <[EMAIL PROTECTED]> wrote: > Hi all, > I am planning on participating in the INEX and hopefully passively on a > couple of TREC tracks mainly using the Lucene API. > Is anyon

Re: Searching/sorting strategy for many properties for semantic web app

2006-02-22 Thread David Pratt
Hi Erik. Many thanks for your reply. I'll likely see if I can find a list to pose a couple of questions there way. I am having fun with Lucene since it is new to me and I am impressed with the speed I am getting. I am reading anything I can get hold of and trying different code experiments. So

Re: How can I get a term's frequency?

2006-02-22 Thread Daniel Noll
sog wrote: > en, but IndexReader.getTermFreqVector is an abstract method, I do not > know how to implement it in an efficient way. Anyone has good advise? You probably don't need to implement it, it's been implemented already. Just call the method. > I can do it in this way: > > QueryTermVecto

RE: IndexSearcher

2006-02-22 Thread Gus Kormeier
Thanks Hoss, I did figure out that I was putting about 400 stored fields per document into my new index; more than my prior indexes. Reducing the number of stored fields seems to have helped significantly. I do call writer.optimize() after loading in documents, but not sure how I would se

Re: IndexSearcher

2006-02-22 Thread Chris Hostetter
: I have one index where the instantiation is very fast, to the point where I : don't need to do any pooling. A new index I have created, takes a very long : time to create the IndexSearcher object. With a 30mb index, it can take : about 30 seconds just to instantiate an IndexSearcher(). It almo

Lucene 1.9 RC1 release available

2006-02-22 Thread Doug Cutting
Release 1.9 RC1 of Lucene is now available from: http://www.apache.org/dyn/closer.cgi/lucene/java/ This release candidate has many improvements since release 1.4.3, including new features, performance improvements, bug fixes, etc. For details, see: http://svn.apache.org/viewcvs.cgi/*checkout*/

Re: Throughput doesn't increase when using more concurrent threads

2006-02-22 Thread Otis Gospodnetic
Hi, Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) --- I would also play with disk IO schedulers, if you can. CentOS is based on RedHat, I believe, and RedHat (ext3, really) now has about 4 different IO schedulers that, according to ar

Re: search a subdirectory (New to Lucene)

2006-02-22 Thread Erik Hatcher
I presume by saying "subdirectory" you're referring to filesystem directories and you're indexing a directory tree of files. If you index the path (perhaps relative from the root is best) as a keyword field (untokenized, but indexed) you could perform filtering on a / path/subpath sort of

Re: Searching/sorting strategy for many properties for semantic web app

2006-02-22 Thread Erik Hatcher
One very nice implementation to take a look at is the Simile project at MIT. The Piggy Bank and Longwell projects use Lucene to index RDF and integrate full-text and structural queries nicely together. http://simile.mit.edu Erik On Feb 21, 2006, at 10:20 PM, David Pratt wrote:

Re: Throughput doesn't increase when using more concurrent threads

2006-02-22 Thread Yonik Seeley
Hmmm, not sure what that could be. You could try using the default FSDir instead of MMapDir to see if the differences are there. Some things that could be different: - thread scheduling (shouldn't make too much of a difference though) - synchronization workings - page replacement policy... how to

Re: Throughput doesn't increase when using more concurrent threads

2006-02-22 Thread Peter Keegan
I am doing a performance comparison of Lucene on Linux vs Windows. I have 2 identically configured servers (8-CPUs (real) x 3GHz Xeon processors, 64GB RAM). One is running CentOS 4 Linux, the other is running Windows server 2003 Enterprise Edition x64. Both have 64-bit JVMs from Sun. The Lucene se

RE: IndexSearcher

2006-02-22 Thread John Powers
I guess what I meant was to have all your servlets use the same instance. They could get it from the class or from a parent of all your servlets. Then you can let the indexsearcher take care of all the search requests. -Original Message- From: Gus Kormeier [mailto:[EMAIL PROTECTED] Sen

Re: Boolean Precedence

2006-02-22 Thread Eric Jain
Erik Hatcher wrote: I worked on it to a point, but I don't recall what open issues there were when I left it though they were fiddly. The test case may point you in the right direction:

RE: IndexSearcher

2006-02-22 Thread Gus Kormeier
It's in a servlet, so one work around I have been going with is to just open it at init(). That gives me some threading concerns. And I didn't have to do that in the past, -Gus -Original Message- From: John Powers [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 22, 2006 9:35 AM To: j

search a subdirectory (New to Lucene)

2006-02-22 Thread John Hamilton
I'm new to Lucene and was wondering what is the best way to perform a search on a subdirectory or subdirectories within the index? My thought at this point is to build a query to first search for files in the required directory(ies) and then use that query to make a QueryFilter and use that Que

RE: ArrayIndexOutOfBoundsException being thrown ...

2006-02-22 Thread Mufaddal Khumri
I switched back to lucene-1.4.3.jar and i dont get the exception any more? Is this a bug in the new jar? -Mufaddal. -Original Message- From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] Sent: Wed 2/22/2006 10:20 AM To: java-user@lucene.apache.org Subject: ArrayIndexOutOfBoundsException bei

RE: IndexSearcher

2006-02-22 Thread John Powers
This doesn't really address your question, but... Once you have the single indexsearcher, do you need any others? Could your app just use the single instance? -Original Message- From: Gus Kormeier [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 22, 2006 11:28 AM To: java-user@luc

IndexSearcher

2006-02-22 Thread Gus Kormeier
Maybe too general a question, but is there anything about creating an IndexSearcher( directory) object that would make the instantiation really slow? I have one index where the instantiation is very fast, to the point where I don't need to do any pooling. A new index I have created, takes a very

ArrayIndexOutOfBoundsException being thrown ...

2006-02-22 Thread Mufaddal Khumri
Getting an ArrayIndexOutOfBoundsException ... Line 31 in IndexSearcherManager.java: ... public static IndexSearcher getIndexSearcher(String indexPath) { logger.debug("indexPath = " + indexPath); searcher =

Re: Phrase query vs span query

2006-02-22 Thread Paul Elschot
On Wednesday 22 February 2006 16:39, Rajesh Munavalli wrote: > > I wasn't aware of the capability to nest spanquery. Is there a link where I > could read more about this? from the practice-what-you-preach dept. http://www.lucenebook.com/search?query=span+nest The subclasses of SpanQuery: http:/

Re: Phrase query vs span query

2006-02-22 Thread Rajesh Munavalli
On 2/22/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > > > Typical Query: > > - > > Consists of 15 to 30 query terms. In other words, these query terms > > represent a conceptual section. > > Would you need synonyms of these terms, too? Yes. > > (2) After considering the

Re: Lucene, Cannot rename segments.new to segments

2006-02-22 Thread Yonik Seeley
Hi Patrick, we use JIRA now: http://issues.apache.org/jira/browse/LUCENE-425 That issue might be related to this bug, recently fixed: https://issues.apache.org/jira/browse/LUCENE-481 If your problem is reproducible enough, could you try your application with 1.9-rc1? http://www.apache.org/dyn/clo

TREC,INEX and Lucene

2006-02-22 Thread Malcolm Clark
Hi all, I am planning on participating in the INEX and hopefully passively on a couple of TREC tracks mainly using the Lucene API. Is anyone else on this list planning on using Lucene during participation? I am particularly interested in the SPAM, Blog and ADHOC tracks. Malcolm Clark ---

Re: :Lucene 1.9 RC1 is not working properly with older version of Code 1.43:

2006-02-22 Thread Yonik Seeley
Hi Ravi, Could you try 1.9RC1 without changing your code to remove the deprecated calls first? If that works, try changing one type of deprecated call at a time until the culprit is found. It may either be a bug in API usage in your code, or a bug in Lucene. -Yonik On 2/22/06, Ravi <[EMAIL PROTE

Lucene, Cannot rename segments.new to segments

2006-02-22 Thread Patrick Kimber
I am getting intermittent errors with Lucene. Here are two examples: java.io.IOException: Cannot rename E:\lucene\segments.new to E:\lucene\segments java.io.IOException: Cannot rename E:\lucene\_8ya.tmp to E:\lucene\_8ya.del This issue has an open BugZilla entry: http://issues.apache.org/bugzilla

:Lucene 1.9 RC1 is not working properly with older version of Code 1.43:

2006-02-22 Thread Ravi
Hi , I got the latest source code of Lucene 1.9 RC1 and modified my code according to that by removing the deprecated methods. But once I have updated to this version the search is not working at all.. if I try with luke it is working fine but If I try with program it is not returning any erro

Re: webserverless search with lucene on offline HTML doc

2006-02-22 Thread Fabio Insaccanebbia
The signed applet is surely a simpler and more elegant solution.. In some projects however this could not be a viable option: the "System properties problem" you have pointed out (and I had missed :-) is hopefully going to be solved in 1.9 (http://issues.apache.org/jira/browse/LUCENE-369) Fabio

Re: How can I get a term's frequency?

2006-02-22 Thread sog
en, I describe my question more clearly: I search with a group of query terms, I can get a document from the search result: Query(term1, term2, term3)-->search index-->Hits(doc1, doc2, doc3, ..) I wanna get term1's frequency in doc1 ? Hits(docs1((term1,freq),(term2,freq),(term3,freq)),

Re: Index missing documents

2006-02-22 Thread Michael van Rooyen
I'm using Lucene 1.4.3, and maxBufferedDocs only appears to be in the new (unreleased?) version of IndexWriter in CVS. Looking at the code though, setMaxBufferedDocs(n) just translates to minMergeDocs = n. My index was constructed using the default minMergeDocs = 10, so somehow this doesn't s

Re: Phrase query vs span query

2006-02-22 Thread Paul Elschot
On Wednesday 22 February 2006 00:45, Rajesh Munavalli wrote: > I am trying to adopt lucene for a special IR system. The following scenario > is an approximation of what I am trying to do. Please bear with me if some > things doesnt make sense. I need some suggestions on formulating queries for > th

Re: How can I get a term's frequency?

2006-02-22 Thread sog
en, but IndexReader.getTermFreqVector is an abstract method, I do not know how to implement it in an efficient way. Anyone has good advise? I search with a group of query terms, I can get a document from the search result: Query(term1, term2, term3)-->search index-->Hits(doc1, doc2, doc3, ...

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-22 Thread Nadav Har'El
Chris Hostetter <[EMAIL PROTECTED]> wrote on 22/02/2006 03:24:58 AM: > > : It would have been nice if someone wrote something like indexModifier, > : but with a cache, similar to what Yonik suggested above: deletions will > : not be done immediately, but rather cached and later done in batches. > :