Re: wildcarded phrase queries

2005-04-05 Thread Chuck Williams
Erik Hatcher writes (4/5/2005 5:57 PM): I have a need to implement wildcarded phrase queries, such as this: "apach? luc*" which would match "apache lucene", for example. This needs to also support ordered and unordered proximity like SpanNearQuery does: "apach? luc*"~10 I presume I'm goi

wildcarded phrase queries

2005-04-05 Thread Erik Hatcher
I have a need to implement wildcarded phrase queries, such as this: "apach? luc*" which would match "apache lucene", for example. This needs to also support ordered and unordered proximity like SpanNearQuery does: "apach? luc*"~10 I presume I'm going to have to key off of SpanQue

php/lucene integration: SFI working papers visualization

2005-04-05 Thread Owen Densmore
Hi folks. As promised, here is the first beta access to the php/lucene work we were discussing earlier. The url to the php front-end to the SFI working papers Lucene search is: http://webdev.santafe.edu/research/publications/redfish/wpSearch.php This provides a fairly simple search dialog, re

Fwd: Wiki formatting changes

2005-04-05 Thread Erik Hatcher
I suppose this should be addressed to Leo... anything we can do about the issue mentioned below regarding wiki formatting? Thanks, Erik Begin forwarded message: From: Chris Hostetter <[EMAIL PROTECTED]> Date: April 5, 2005 5:56:28 PM EDT To: java-user@lucene.apache.org Subject: Wiki form

Re: Strategies for updating indexes.

2005-04-05 Thread Paul Smith
Otis Gospodnetic wrote: If you take this approach, keep in mind that you will also need to handle regular application shutdowns, and also try to catch some crashes/errors, in order to flush your in-memory queue of items scheduled for indexing, and write them to disk. Feel free to post the code, if

Wiki formatting changes

2005-04-05 Thread Chris Hostetter
he wiki appears to have undergone some style cahnges recently, the layout is a lot different now (and in my opinion: cleaner) but a side effect seems to be that some page formatting which used to work no longer does Specifically, subSection headings that have leading whitespace, ie... == Utilit

Re[2]: exact match

2005-04-05 Thread Chris Hostetter
: >> I have documents with tokenized, indexes and stored field. This field : >> contain one-two words usually. I need to be able to search exact : >> matches for two words. : >> For example search "John" should return documents with field : >> containing "John" only, not "John Doe" or "John Foo". :

Re: filter search

2005-04-05 Thread Chris Hostetter
: : is it possible to filter the hits returned from a certain query?. for : example if I have a search like this: : Query searchQuery = queryParser.parse( query ); : Hits results = m_searcher.search( searchQuery ); : is there a way to use the results and find out how many of the return

Re: QueryParser: open ended range queries

2005-04-05 Thread Yonik Seeley
For numeric fields, this will never happen. For text fields, I could either 1) just use the first token generated (yuck) 2) don't run it through the analyzer (v1.0) 3) run it through an analyzer specific to range and prefix queries (post v1.0) Since I know the schema, I can pick and choose di

Re: QueryParser: open ended range queries

2005-04-05 Thread Erik Hatcher
On Apr 5, 2005, at 2:49 PM, Yonik Seeley wrote: Just curious. I plan on overriding the current getRangeQuery() anyway since it currently doesn't run the endpoints through the analyzer. What will you do when multiple tokens are returned from the analyzer? Erik --

QueryParser: open ended range queries

2005-04-05 Thread Yonik Seeley
Was there any later thread on the QueryParser supporting open ended range queries after this: http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg07973.html Just curious. I plan on overriding the current getRangeQuery() anyway since it currently doesn't run the endpoints through the ana

Re: Can not delete cfs file

2005-04-05 Thread Gusenbauer Stefan
Gusenbauer Stefan wrote: >Erik Hatcher wrote: > > > >>On Apr 3, 2005, at 3:33 PM, Gusenbauer Stefan wrote: >> >> >> >>>Sorry for beeing late! >>>Only the test code wouldn't be very useful for understanding because >>>there are a lot of dependencies in the other code. I can explain what >>>I

Re: PHP-Lucene Integration

2005-04-05 Thread Andi Vajda
As an alternative, you could also take the approach taken for PyLucene: compile the Java code with GCJ and generate bindings for Python with SWIG. SWIG supports a number of languages in addition to Python such as Ruby, PHP, Perl, and a bunch more. For more information, see: http://pylucene.osa

Re: PHP-Lucene Integration

2005-04-05 Thread Giovanni Novelli
As Lucene native language is Java it should be more natural to access its functionalities through JSP; anyway the idea of accessing Lucene functionalities seems interesting as PHP is perhaps most widely deployed server side scripting language. I think that the way to provide access to Lucene AP

Re: scalability w/ number of fields

2005-04-05 Thread Yonik Seeley
Optimize performance update (with tons of indexed fields): We had a timing bug... ignore the hour I first reported. Here are the current numbers: indexed_fields=6791 index_size=3.9GB optimize_time=21min indexed_fields=3216 index_size=2.0GB optimize_time=9min indexed_fields=2080 index_size=1

Re: Strategies for updating indexes.

2005-04-05 Thread Otis Gospodnetic
If you take this approach, keep in mind that you will also need to handle regular application shutdowns, and also try to catch some crashes/errors, in order to flush your in-memory queue of items scheduled for indexing, and write them to disk. Feel free to post the code, if you want and can, so pe

RE: Time taken in Indexing when the index is already huge

2005-04-05 Thread Will Allen
I would recommend not optimizing your index that often. Another solution is to use the multisearcher and keep one fully optimized primary index, and an unoptimized secondary index that you add to. Then search against both. During off peak hours you could merge the secondary index onto your pr

Re: scalability w/ number of fields

2005-04-05 Thread Bill Au
The compound index structure is meant for indexes with a large number of fields. I was watching the files in the index directory of my compound index while it was being optimized. The IndexWriter that I used was set to use compound file. It looks to me that Lucene first combined all existing segme

Re: Re[2]: exact match

2005-04-05 Thread Erik Hatcher
On Apr 5, 2005, at 5:44 AM, Yura Smolsky wrote: EH> On Apr 4, 2005, at 4:34 PM, Yura Smolsky wrote: Hello, java-user. I have documents with tokenized, indexes and stored field. This field contain one-two words usually. I need to be able to search exact matches for two words. For example search "Joh

Re[2]: exact match

2005-04-05 Thread Yura Smolsky
Hello, Erik. EH> On Apr 4, 2005, at 4:34 PM, Yura Smolsky wrote: >> Hello, java-user. >> >> I have documents with tokenized, indexes and stored field. This field >> contain one-two words usually. I need to be able to search exact >> matches for two words. >> For example search "John" should retur

RE: Strategies for updating indexes.

2005-04-05 Thread Nestel, Frank IZ/HZA-IOL
Hi, we are using a very cautious method for batch upating. We have long (hours) running updates on our index, but complete reindexing would even be longer (days). But I guess our strategy could be scaled down to hours or even less. So what we do is, we keep two instances of the index. There is

RE: Strategies for updating indexes.

2005-04-05 Thread Lee Turner
Hi Thank you for replying so quickly. I am very pleased as I have just started down the road of implementing a solution which is very nearly exactly like the one you describe below. It is good to know that I am not heading down a dead end. I hadn't thought about the re-indexing thread pausin

Re: Strategies for updating indexes.

2005-04-05 Thread Jens Kraemer
Hi, please see comments below. On Tue, Apr 05, 2005 at 08:38:04AM +0100, Lee Turner wrote: > Hi > > I was wondering whether anyone has any experience of multithreaded > updates to indexes. I the web app I am working on there are additions, > updates and deletes that need to happen to the index t

Strategies for updating indexes.

2005-04-05 Thread Lee Turner
Hi I was wondering whether anyone has any experience of multithreaded updates to indexes. I the web app I am working on there are additions, updates and deletes that need to happen to the index throughout the runtime of the application. Also, the application is run in a cluster with each app