RE: Escaping special characters

2005-04-06 Thread Mufaddal Khumri
I have indexed the product names using the StandardAnalyzer and I store the product names as Field.Text. I suspect the StandardAnalyzer uses my hyphens as a token separator. What do I do to get around this problem? Can I tell StandardAnalyzer not to break on hyphens or should I be using some other

Re: Escaping special characters

2005-04-06 Thread Chuck Williams
Mufaddal Khumri writes (4/6/2005 11:21 PM): Hi, Am new to Lucene. I found the following page: http://lucene.apache.org/java/docs/queryparsersyntax.html. At the bottom of the page there is a section that in order to escape special characters one would use "\". I have an Indexer that indexes produc

Escaping special characters

2005-04-06 Thread Mufaddal Khumri
Hi, Am new to Lucene. I found the following page: http://lucene.apache.org/java/docs/queryparsersyntax.html. At the bottom of the page there is a section that in order to escape special characters one would use "\". I have an Indexer that indexes product names. Some product names have "-" c

Re: Search performance under high load

2005-04-06 Thread David Spencer
Daniel Herlitz wrote: Hi everybody, We have been using Lucene for about one year now with great success. Recently though the index has growed noticably and so has the number of searches. I was wondering if anyone would like to comment on these figures and say if it works for them? Index size: ~

New Lucene-powered site, plus a Lucene job

2005-04-06 Thread David Noll
Lucene folks, Just a quick note to announce the launch of a new Lucene-powered site, the GATT Digital Library at Stanford. The URL is http://gatt.stanford.edu. Over the past seven years, the Stanford Libraries have digitized a large collection of documents produced by the GATT, the predecessor to

Re: Search performance under high load

2005-04-06 Thread Chris Hostetter
: Queries: The query strings are of highly differing complexity, from : simple x:y to long queries involving conjunctions, disjunctions and : wildecard queries. : : 90% of the queries run brilliantly. Problem is that 10% of the queries : (simple or not) take a long time, on average more that 10 se

Re: HTML pages highlighter

2005-04-06 Thread Erik Hatcher
What file do those line numbers correspond to? I'm lost. Did the Lucene in Action highlighting code work for you? Erik On Apr 6, 2005, at 6:16 PM, Yagnesh Shah wrote: Hi! Erik, Yes basic seems to be working. a) My problem is there is a chances that query is not present in stored content

Search performance under high load

2005-04-06 Thread Daniel Herlitz
Hi everybody, We have been using Lucene for about one year now with great success. Recently though the index has growed noticably and so has the number of searches. I was wondering if anyone would like to comment on these figures and say if it works for them? Index size: ~2.5 GB, on disk Number

RE: HTML pages highlighter

2005-04-06 Thread Yagnesh Shah
Hi! Erik, Yes basic seems to be working. a) My problem is there is a chances that query is not present in stored content of a file so sometimes I am getting empty strings at line#106 so I have to put a special check at line#109 and line#126. I guess this is not a problem. What you think

Re: Sorting date stored in milliseconds time

2005-04-06 Thread Scott Farquhar
On Wed, Apr 06, 2005 at 01:02:35PM +0200, [EMAIL PROTECTED] wrote: > I'm forced to keep date up to milisec. The reason is simple: I get at > least a couple of new messages per sec, if all of them are stamped with the > same time, the retrieval order id undefined, i.e. once I get it, let's > say,

Re: Wiki formatting changes

2005-04-06 Thread Chris Hostetter
: The only problematic Wiki page I found was : http://wiki.apache.org/jakarta-lucene/HowTo and I just fixed that. I've seen a few more, but i'll clean them up as i see them. i guess the problem isn't as wide spread as it initially seemd (i just got unlucky in the first few pages i looked at) but

RE: FilteredQuery and Boolean AND

2005-04-06 Thread Chris Hostetter
: shouldn't. If the FilteredQuerys worked properly, they could be put in : a BooleanQuery and then a BooleanClause. That's why I was doing that Peter: Comments in bug#34279 indicate that this problem may have been inadvertantly fixed in the latest development version of the code base -- or at

Re: scalability w/ number of fields

2005-04-06 Thread Yonik Seeley
Thanks Doug, your previous comment led us to consider compound field types of the form compound:"name=value". Open ended range queries also need some manipulation for this scheme to work. > Yes, this is an ugly hack, but it can make a huge performance > differrence. The problem is that Lucene st

Re: Wiki formatting changes

2005-04-06 Thread Otis Gospodnetic
The only problematic Wiki page I found was http://wiki.apache.org/jakarta-lucene/HowTo and I just fixed that. Otis --- Leo Simons <[EMAIL PROTECTED]> wrote: > On 06-04-2005 02:44, "Erik Hatcher" <[EMAIL PROTECTED]> > wrote: > > I suppose this should be addressed to Leo... > > Oh no, please do di

Re: scalability w/ number of fields

2005-04-06 Thread Doug Cutting
Yonik Seeley wrote: They are all indexed (and they all need to be under the current design). As I mentioned before, Lucene will not perform well with a large number of indexed fields. If these are not tokenized fields, then a simple way to reduce the number of indexed fields is to move the field

Re: Sorting date stored in milliseconds time

2005-04-06 Thread iouli . golovatyi
I'm forced to keep date up to milisec. The reason is simple: I get at least a couple of new messages per sec, if all of them are stamped with the same time, the retrieval order id undefined, i.e. once I get it, let's say, as the last reference on the first page, other time - as the first one on

Re: Strategies for updating indexes.

2005-04-06 Thread Jens Kraemer
On Tue, Apr 05, 2005 at 08:16:35AM -0700, Otis Gospodnetic wrote: > If you take this approach, keep in mind that you will also need to > handle regular application shutdowns, and also try to catch some > crashes/errors, in order to flush your in-memory queue of items > scheduled for indexing, and w

Re[4]: exact match

2005-04-06 Thread Yura Smolsky
Hello, Erik. I have very large index (200Gb) with big amount of documents. I have field "author", which stores name and this fields is tokenized, indexed, stored. This field contains values of following examples: "John" "John Doe" "Bill" "Bill Gates" I do not want to reindex all documents again.

Re: Wiki formatting changes

2005-04-06 Thread Leo Simons
On 06-04-2005 02:44, "Erik Hatcher" <[EMAIL PROTECTED]> wrote: > I suppose this should be addressed to Leo... Oh no, please do direct this stuff at infra@; there's more people than just me working on this. I might not be around :-D > anything we can do about > the issue mentioned below regarding

Re: lucene 1.4 in maven repository

2005-04-06 Thread Erik Hatcher
Please forgive my completely out of date reply here. I was looking at my mess of an inbox and just happened to spot this (one of a zillion) unread message and didn't notice the date of it. Yes, my inbox is a disaster! Maybe I should be using ZOE :) Erik On Apr 6, 2005, at 4:58 AM, Er

Re: lucene 1.4 + needs spaces problem

2005-04-06 Thread Erik Hatcher
On Apr 5, 2005, at 10:50 PM, Jason Eacott wrote: Hi, I recently upgraded from lucene 1.3 final to 1.4 and discovered some things which no longer seem to work right. anyway - if I run a query something like [EMAIL PROTECTED]: [EMAIL PROTECTED]:"2004"[EMAIL PROTECTED]:"February"+properties @com

Re: lucene 1.4 in maven repository

2005-04-06 Thread Erik Hatcher
On Aug 25, 2004, at 10:18 AM, Zilverline info wrote: Hi, Can anyone tell me why there is no lucene 1.4 jar in the maven repository @ http://www.ibiblio.org/maven/lucene/jars/ ? Who makes them available? It would be very convenient to be able to get the latest version from there (or anywhere else

Re: Fwd: Wiki formatting changes

2005-04-06 Thread Upayavira
It is either manual change, or a little script to sort the problem out. At the moment, I can't help with the latter. If you can handle doing it manually whenever you see it, that will be the most straight-forward approach. As to the nature of the change - we upgraded from Moin 1.1(pre) to 1.3.

lucene 1.4 + needs spaces problem

2005-04-06 Thread Jason Eacott
Hi, I recently upgraded from lucene 1.3 final to 1.4 and discovered some things which no longer seem to work right. I am using Analyzer analyzer = new StandardAnalyzer(); QueryParser parser = new QueryParser( "terms", analyzer); parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND); q

Re: Deeply nested boolean query performance

2005-04-06 Thread Erik Hatcher
Thanks Morus. I will very soon be running experiments against a very large dataset using the trunk of Lucene and will report some statistics then. Erik On Apr 6, 2005, at 3:26 AM, Morus Walter wrote: Hi Erik, Thanks for your very thorough response. It is very helpful. For all my projects

Re: How to index and search PDF documents.

2005-04-06 Thread Erik Hatcher
On Apr 6, 2005, at 3:07 AM, <[EMAIL PROTECTED]> wrote: "THAT is HOW CAN I INDEX and SEARCH .pdf, .ppt,. xml, .doc etc DOCUMENTS WITH LUCENE." I WILL BE REALLY HANKFUL IF U SOLVE MY PROBLEM. Get a copy of Lucene in Action. Otis wrote a great chapter on how to handle various document formats w

Re: wildcarded phrase queries

2005-04-06 Thread Paul Elschot
On Wednesday 06 April 2005 08:19, Chuck Williams wrote: > Erik Hatcher writes (4/5/2005 5:57 PM): > > > I have a need to implement wildcarded phrase queries, such as this: > > > > "apach? luc*" > > > > which would match "apache lucene", for example. This needs to also > > support ordered and

Re: Deeply nested boolean query performance

2005-04-06 Thread Morus Walter
Hi Erik, > > Thanks for your very thorough response. It is very helpful. > > For all my projects, I'm using the latest Subversion codebase and > staying current with any changes there, so that is very good news. > For lucene-1.4.final I find that some query on a real life index of the form a

RE: How to index and search PDF documents.

2005-04-06 Thread Chandrashekhar
Hi Himani, Your search result should have reference to your documents (like you can content/document id) and then add rendering logic to render such contents after you click on some link. Regards, Chandra -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday

How to index and search PDF documents.

2005-04-06 Thread himani.tandon
--- Begin Message --- Hello sir Thank u for replying. But my query is not regarding updating indexes, optimizing indexes and such others. Sorry sir may be my earlier question was not that clear. Here i elaborate my problem clearly. I have to develop a search engine for my s website. We have so