Re: question with spellchecker

2006-06-06 Thread eks dev
try your query like ((ducted^1000 duct~2) +tape) Or maybe (duct* +tape) or even better you could try to do some stemming (Porter stemmer should get rid of these ed-suffixes) and some of the above if this does not help, have a look at lingpipe spellChecker class as this looks like exactly what yo

Re: Avoiding ParseExceptions

2006-06-06 Thread Eric Jain
Chris Nokleberg wrote: I am using the QueryParser with a StandardAnalyzer. I would like to avoid or auto-correct anything that would lead to a ParseException. For example, I don't think you can get a parse exception from Google--even if you omit a closing quote it looks like it just closes it for

Re: Avoiding ParseExceptions

2006-06-06 Thread Chris Nokleberg
On Tue, 06 Jun 2006 14:57:06 -0700, Chris Hostetter wrote: > I took an approach similar to that, by escaping all of the "special' > characters except '+', '-', and '"', and then stripping out all quotes if > there was a non even amount ... this gave me a simplified version of the > Lucene syntax th

Re: Lucene in Action

2006-06-06 Thread Erik Hatcher
On Jun 6, 2006, at 4:34 PM, Erick Erickson wrote: Great Googly-Moogly Otis! .. how many blogs do you have? Are you as old as I am or do you just like old Rock-n-Roll? If you're getting this from the Apostrophe/Overnight Sensation album (Frank Zappa), Well, to be petty, it's from Nanook Ru

question with spellchecker

2006-06-06 Thread Van Nguyen
I'm implementing a spellchecker in my search and have a question. After creating the index and spellchecker index, I pass in the word "ducted tape" to search (I am expecting "duct tape" back). I've played around with boosting the prefixes and suffixes, setting the accuracy, passing in an Inde

Re: Lucene in Action

2006-06-06 Thread Erick Erickson
There are about a zillion things that just go whoosh, right over my head in my TV-less state - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene and learning search

2006-06-06 Thread Chris Hostetter
This isn't really a "learning search" issue as much as an issue of session tracking and finding patterns. Eliminate search from the discussion and the same questions could be a applied to generic product/document viewing... "users who looked at products AAA and BBB also looked at product CCC" T

Lucene and learning search

2006-06-06 Thread michael turner
Hi Everyone! Working on a project that requires a Search query similiar to what is seen on"amazon.com" in that after searching for and displaying an item, the system shows: "Users that have searched for "A" AND "B" have also searched for "". Where "B" and "" are other r

Re: Lucene in Action

2006-06-06 Thread Chris Hostetter
: > Great Googly-Moogly Otis! .. how many blogs do you have? : Are you as old as I am or do you just like old Rock-n-Roll? If you're : getting this from the Apostrophe/Overnight Sensation album (Frank Zappa), : you can also listen to my all-time favorite song "I'm the slime". Sorry, I was refrenc

Re: Browse Functionality

2006-06-06 Thread Chris Hostetter
Generally the best approach for restricting what results a person sees is with a Filter ... if you do this, then you can get a BitSet from the Filter which tells you everything they are allowed to see, if you then also build a BitSet for each of the Terms you wantto "browse" by (again: a Filter ca

Re: Avoiding ParseExceptions

2006-06-06 Thread Chris Hostetter
It really depends on what syntax you want to support ... if you just want basic term matching and do't want to let hte user specify field names, or boosts or phrases, or ranges, or wildcards -- then just escape the entirestring, that should make it impossible to get a parse exception. I took an a

Re: Avoiding ParseExceptions

2006-06-06 Thread Erick Erickson
That way madness lies.. I suspect that you'll find that there are a few rules you can apply that will allow you to "fix" a lot of queries, but... is that really what you want to do? For instance, a user types "a and or not b" Whatever you do, it isn't what the *next* user who types somethin

RE: Avoiding ParseExceptions

2006-06-06 Thread Mordo, Aviran (EXP N-NANNATEK)
Basically you need to pre-process the query and rewrite it in a way you think it should be. Then catch the parse exception if you failed to rewrite the query and display an error message on the screen (something like - This kind of query is not supported, please rephrase your query). HTH Aviran h

Re: Lucene in Action

2006-06-06 Thread Erick Erickson
Great Googly-Moogly Otis! .. how many blogs do you have? Are you as old as I am or do you just like old Rock-n-Roll? If you're getting this from the Apostrophe/Overnight Sensation album (Frank Zappa), you can also listen to my all-time favorite song "I'm the slime". Erick P.S. Haven't owned

Avoiding ParseExceptions

2006-06-06 Thread Chris Nokleberg
Hi all, I am using the QueryParser with a StandardAnalyzer. I would like to avoid or auto-correct anything that would lead to a ParseException. For example, I don't think you can get a parse exception from Google--even if you omit a closing quote it looks like it just closes it for you (please cor

Re: Lucene in Action

2006-06-06 Thread Beady Geraghty
I find it very useful. I hope you will too. On 6/6/06, digby <[EMAIL PROTECTED]> wrote: Does everyone recommend getting this book? I'm just starting out with Lucene and like to have a book beside me as well as the web / this mailing list, but the book looks quite old now, has a 1-2 month del

Re: Lucene in Action

2006-06-06 Thread Chris Hostetter
: v2.0 might be a little while in coming (checkout Otis' blog : http://www.jroller.com/page/otis?catname=%2FLucene). Great Googly-Moogly Otis! .. how many blogs do you have? http://lucenebook.com/blog/ http://www.jroller.com/page/otis http://blog.simpy.com/ -Hoss ---

RE: Compound / non-compound index files and SIGKILL

2006-06-06 Thread Chris Hostetter
1) have you tried forcing a threaddump of the JVM when it hangs to see what it's doing? (i don't remember which signal it is off the top of my head, but even if it's not responding to SIGTERM it might respond to that) : SIGTERM. I guess I'd feel more confident about using SIGKILL, if I knew that

Re: PHP and Lucene integration

2006-06-06 Thread Paul Borgermans
Hi I'm currently doing just that: using the php-java bridge. Here the goal is to integrate Java-Lucene with a php4 based CMS (eZ publish), so the Zend framework is not an answer (and premature imho). The code we've written is a bit CMS specific, but you should be able to to do the same quite fast

Re: PHP and Lucene integration

2006-06-06 Thread Peter A. Daly
On 6/6/06, Alexander MASHTAKOV <[EMAIL PROTECTED]> wrote: The other thing - performance. In order to run faster - it's necessary to have opened index, rather then open and close it for each request. Index updates have to be serialized somehow and after the update, it has to be re-opened again.

Re: PHP and Lucene integration

2006-06-06 Thread Alexander MASHTAKOV
Has anyone tried to solve this task ? --- Alexander MASHTAKOV <[EMAIL PROTECTED]> wrote: > Hi, > > Thank you for reply. > I've also had a look at Zend framework. But, at > this moment they do not support unicode, > which is a mandatory requirement in my case. > > The other thing - performanc

Re: PHP and Lucene integration

2006-06-06 Thread Alexander MASHTAKOV
Hi, Thank you for reply. I've also had a look at Zend framework. But, at this moment they do not support unicode, which is a mandatory requirement in my case. The other thing - performance. In order to run faster - it's necessary to have opened index, rather then open and close it for each req

Re: searching in more than fields on document

2006-06-06 Thread digby
All sorted now. Of course, if I can loop through the properties of a bean to add them as fields to the document, then I can certainly do the same at query time to build the MultiFieldQueryParser. All done and working great. Thanks for all your comments. digby wrote: Basically, I've got a smal

Re: PHP and Lucene integration

2006-06-06 Thread Vinay Yadav
Hi, Zend Search Framework can help you. Take a look at http://framework.zend.com/manual/en/zend.search.html - Zend_Search_Lucene is a general purpose text search engine written entirely in PHP 5. Since it stores its index on the files

Re: PHP and Lucene integration

2006-06-06 Thread Peter A. Daly
Other replies mention SOLR. I'm fairly new to SOLR, but have used Lucene quite a bit. Based on your situation, it certainly sounds like SOLR is worth looking into. I was able to convert a portion of one of my sites from being SQL powered to SOLR powered in about a days work, which includes lear

Re: searching in more than fields on document

2006-06-06 Thread digby
Basically, I've got a small app which allows me to update fields in bunch of mysql tables using Hibernate. As I save each bean, I'm want to add it to the lucene index aswell. However, I want the app to be as generic as possible and at the moment it doesn't care what the bean is, as long as ther

RE: PHP and Lucene integration

2006-06-06 Thread Rob Staveley (Tom)
For querying, we have PHP talking to our Java application through sockets and XML. Queries are set up in PHP, creating an XML document which corresponds to a subset of the subclasses of http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Query.html. If we'd had the PHP skill set at the

Re: Lucene in Action

2006-06-06 Thread digby
LOL. I gotta get a Lucene License Plate Frame though. Rob Staveley (Tom) wrote: It is better value than the tee shirt http://www.cafepress.com/lucene/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-ma

Re: PHP and Lucene integration

2006-06-06 Thread Mike Richmond
I am also working on interfacing Lucene with PHP. Here are a couple options that I have found useful: Call Java directly from PHP: http://php-java-bridge.sourceforge.net/ Solr - Interacts w/ Lucene via XML requests http://incubator.apache.org/solr/index.html There is mention of a PHP interface

PHP and Lucene integration

2006-06-06 Thread Alexander MASHTAKOV
Hi Folks, I'm working on project that is going to have free-text search mechanism. The project is completely based on open source technologies, such as MySQL and PHP. I'm reading about Lucene and think that this is probably the first candidate. BTW, the (obvious) question is: "How to integrate P

Re: Compound / non-compound index files and SIGKILL

2006-06-06 Thread Volodymyr Bychkoviak
In my application I was queuing IDs of appropriate record in Database not whole document. Document was created right before adding it to index. All this work was done in separated thread, so other threads responded very quickly. It depends on your application and at what speed your new data co

Re: Lucene in Action

2006-06-06 Thread Marc Dauncey
You could always purchase the PDF from www.manning.com. This book is essential in my view. Its also one of the clearest most engaging IT books I've ever read. - Original Message From: digby <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 6 June, 2006 11:55:26 AM Subje

RE: spring & lucene

2006-06-06 Thread Omar Didi
have a look at spring module 0.3. it has a lucene module which contains many interesting classes LuceneIndexTemplate, LuceneSearchTemplate, and all kind of factotires following spring concepts. here is the url to the documentation: http://www.springframework.org/node/270 -Original Message---

Re: searching in more than fields on document

2006-06-06 Thread Michael D. Curtin
Not sure if I understand exactly what you want to do, but would the ":" syntax that QueryParser understands work for you? That is, you could send query text like f1:foo f2:foo f3:foo to search for "foo" in any of the 3 fields. If you need boolean capabilities you can use parentheses, li

RE: Lucene in Action

2006-06-06 Thread Rob Staveley (Tom)
It is better value than the tee shirt http://www.cafepress.com/lucene/ smime.p7s Description: S/MIME cryptographic signature

Re: Lucene in Action

2006-06-06 Thread eks dev
Grab it now, it is worth all this money. - Original Message From: digby <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 6 June, 2006 11:59:53 AM Subject: Lucene in Action Does everyone recommend getting this book? I'm just starting out with Lucene and like to have a b

Re[2]: Lucene in Action

2006-06-06 Thread Sven Duzont
Hi, Or simply grab it online (paper or pdf eBook ) here : http://www.manning.com/hatcher2/ --- Sven Le mardi 6 juin 2006 à 13:05:45, vous écriviez : MC> Try here.. MC> http://www.abebooks.co.uk MC> Maybe they have one cheaper. MC> Malcolm

RE: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread Rob Staveley (Tom)
Thanks, Karl. It would be good if maxBufferedDocs could respond dynamically to available heap. It seems a shame to set <10 for the sake of sporadic large documents. Failing that, it would be nice if we could explicitly pre-flush buffers when we encounter a big field. I'm increasingly thinking that

RE: Compound / non-compound index files and SIGKILL

2006-06-06 Thread Rob Staveley (Tom)
This is a good idea. I had been worried about the additional heap requirement maintaining a queue, without being able to serialize/deserialize Documents (i.e. a build up of Lucene Documents in RAM). I have been marshalling addDocument() calls using a synchronized object; the same threads have been

Re: Lucene in Action

2006-06-06 Thread Ian Lea
For what it's worth, Blackwell's uses Lucene for biblio searching. Developed with the help of Lucene In Action. -- Ian. On 6/6/06, digby <[EMAIL PROTECTED]> wrote: Thanks everyone, although now I'm not sure what to! Blackwells quicker but more expensive, but is a new edition due...??? Think

Re: Lucene in Action

2006-06-06 Thread Malcolm Clark
Try here.. http://www.abebooks.co.uk Maybe they have one cheaper. Malcolm - Original Message - From: "digby" <[EMAIL PROTECTED]> To: Sent: Tuesday, June 06, 2006 11:55 AM Subject: Re: Lucene in Action Thanks everyone, although now I'm not sure what to! Blackwells quicker but

Re: Lucene in Action

2006-06-06 Thread digby
Thanks everyone, although now I'm not sure what to! Blackwells quicker but more expensive, but is a new edition due...??? Think I'll blow the moths off my wallet and get on with it... [EMAIL PROTECTED] wrote: It's an invaluable book if you're new to Lucene. There have been some changes to

RE: Lucene in Action

2006-06-06 Thread Kinnar Kumar Sen, Noida
IT'S A REALLY GOOD BOOK TO START OFF WITH Regards and Thanks Kinnar Kumar Sen HCL Technologies Ltd. Sec-60, Noida-201301 Ph: - 09313297423 TO SUCEED BE DIFFERENT BE DARING AND BE THERE FIRST -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesda

Re: Lucene in Action

2006-06-06 Thread Paul . Illingworth
It's an invaluable book if you're new to Lucene. There have been some changes to the Lucene API since the book was published but you shouldn't let this put you off - they're relatively minor. I think Lucene In Action v2.0 might be a little while in coming (checkout Otis' blog http://www.jrolle

Re: Lucene in Action

2006-06-06 Thread [EMAIL PROTECTED]
its a nice book!! On 6/6/06, Irving, Dave <[EMAIL PROTECTED]> wrote: It really helped me out loads and I would recommend it to anyone. I gave up trying to obtain it from amazon - but got it in 2 days from Blackwell Online (http://bookshop.blackwell.co.uk) > -Original Message- > From

RE: Lucene in Action

2006-06-06 Thread Irving, Dave
It really helped me out loads and I would recommend it to anyone. I gave up trying to obtain it from amazon - but got it in 2 days from Blackwell Online (http://bookshop.blackwell.co.uk) > -Original Message- > From: news [mailto:[EMAIL PROTECTED] On Behalf Of digby > Sent: 06 June 2006 11

Re: Lucene in Action

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 10:59 +0100, digby wrote: > > Does everyone recommend getting this book? If you want to learn Lucene then this is definitely a book to get. > I'm just starting out with Lucene and like to have a book beside me as > well as the web / this mailing list, but the book looks qui

Re: searching in more than fields on document

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 10:47 +0100, digby wrote: > I was wondering this exact question, but MultiFieldQueryParser still > requires you to specify the field names. In my application I don't know > the field names (they're automatically generated from beans using > BeanUtils.getProperties()), so I'

Lucene in Action

2006-06-06 Thread digby
Does everyone recommend getting this book? I'm just starting out with Lucene and like to have a book beside me as well as the web / this mailing list, but the book looks quite old now, has a 1-2 month delivery wait time here in the UK and is quite expensive. Is it worth waiting for a new editio

Re: searching in more than fields on document

2006-06-06 Thread digby
I was wondering this exact question, but MultiFieldQueryParser still requires you to specify the field names. In my application I don't know the field names (they're automatically generated from beans using BeanUtils.getProperties()), so I've resorted to concatenating all the fields into a sing

Re: Compound / non-compound index files and SIGKILL

2006-06-06 Thread Volodymyr Bychkoviak
If your content handlers should respond quickly then you should move indexing process to separate thread and maintain items in queue. Rob Staveley (Tom) wrote: This is a real eye-opener, Volodymyr. Many thanks. I guess that means that my orphan-producing hangs must be addDocument() calls, and n

RE: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 10:43 +0100, Rob Staveley (Tom) wrote: > You are right there are going to be a lot of tokens. The entire boxy of a > text document is getting indexed in an unstored field, but I don't see how I > can flush a partially loaded field. Check these out: http://lucene.apache.org/

RE: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread Rob Staveley (Tom)
You are right there are going to be a lot of tokens. The entire boxy of a text document is getting indexed in an unstored field, but I don't see how I can flush a partially loaded field. -Original Message- From: karl wettin [mailto:[EMAIL PROTECTED] Sent: 06 June 2006 10:33 To: java-user

RE: searching in more than fields on document

2006-06-06 Thread Kiran Joisher
You can use MultiFieldQueryParser Something like this Query query = MultiFieldQueryParser.parse(new String[]{queryString, queryString, queryString}, new String[]{ASSET_TITLE, ASSET_ARTICLE, ASSET_DIRECTOR_NAMES }, new BooleanClause.Occur[] {BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD,

RE: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread Rob Staveley (Tom)
I answered too quickly too :-) The QA folk seem to reckon that a 132MB plain text file with no white space is where it falls over. There are some accountancy e-mails with attachments of ~170Mb like this, which we need to be able to field. How would I go about flushing the IndexWriter? -Orig

RE: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 10:22 +0100, Rob Staveley (Tom) wrote: > > Thanks for the response, Karl. I am using FSDirectory. > -X:AggressiveHeap might reduce the number of times I get bitten by the > problem, but I'm really looking for a streaming/serialised approach [I > think!], which allows me to ha

RE: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread Rob Staveley (Tom)
Thanks for the response, Karl. I am using FSDirectory. -X:AggressiveHeap might reduce the number of times I get bitten by the problem, but I'm really looking for a streaming/serialised approach [I think!], which allows me to handle objects which are larger than available memory. Using the java.io.R

Re: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 10:11 +0100, Rob Staveley (Tom) wrote: > Sometimes I need to index large documents. I've got just about as much > heap > as my application is allowed (-Xmx512m) and I'm using the unstored > org.apache.lucene.document.Field constructed with a java.io.Reader, > but I'm > still s

Re: Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 10:11 +0100, Rob Staveley (Tom) wrote: > Sometimes I need to index large documents. I've got just about as much heap > as my application is allowed (-Xmx512m) and I'm using the unstored > org.apache.lucene.document.Field constructed with a java.io.Reader, but I'm > still suffe

Re: searching in more than fields on document

2006-06-06 Thread karl wettin
On Tue, 2006-06-06 at 14:38 +0530, Amaresh Kumar Yadav wrote: > My document has six field and i want to search on three fields. > > Presently I am able to search on only TITLE field.. > > query = QueryParser.parse(queryString, "TITLE", analyzer); You want to use the MultiFieldQueryParser. -

Avoiding java.lang.OutOfMemoryError in an unstored field

2006-06-06 Thread Rob Staveley (Tom)
Sometimes I need to index large documents. I've got just about as much heap as my application is allowed (-Xmx512m) and I'm using the unstored org.apache.lucene.document.Field constructed with a java.io.Reader, but I'm still suffering from java.lang.OutOfMemoryError when I index some large document

searching in more than fields on document

2006-06-06 Thread Amaresh Kumar Yadav
Hi All, Will u please give me some clue for searching on more than one field of document. My document has six field and i want to search on three fields. Presently I am able to search on only TITLE field.. query = QueryParser.parse(queryString, "TITLE", analyzer); Regards.. Amares

RE: spring & lucene

2006-06-06 Thread Mike Streeton
We wrote ours for NetSearch to handle this specific issue. I suggest you create a holder class to hold the IndexReader and IndexSearcher, this can close them in the finalizer. Clients keep the holder until they are finished and then discard it. When it is completely de-referenced it will be closed.

Re: spring & lucene

2006-06-06 Thread Sami Dalouche
Hi, when working with Spring, the best is to use Compass : http://www.opensymphony.com/compass/ (if you can). Regards, Sami Dalouche On Tue, 2006-06-06 at 00:27 -0400, Rajiv Roopan wrote: > Hello, > I'm using the spring framework to define my indexsearcher and > indexwriter. They are defined