Re: QueryParser bug?

2007-02-22 Thread Chris Hostetter
: than just on/off), but the original QP shows the problem with : setAllowLeadingWildcard(true). The compiled JavaCC code will always create a : PrefixQuery if the last character is *, regardless of any other wildcard : characters before it. Therefore the query is based on the Term: Yep, defini

TextMining.org Word extractor

2007-02-22 Thread Antony Bowesman
I'm extracting text from Word using TextMining.org extractors - it works better than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do. However, I'm trying to find out about licence issues with the TM jar. The TM website seems to be permanently hacked these days. Anyon

Re: Lucene 1.4.3 : IndexWriter.addDocument(doc) fails when run on OS requiring permissions

2007-02-22 Thread Ridzwan Aminuddin
Hi Guys. Ok thanks for the replies. You guys are right that it is to do with the system and not with Lucene. However, what i'm trying to do is to pinpoint and narrow down to the exact place that causes the system to fail. and then from there try to remedy the problem. The odd thing is that th

Re: search on colon ":" ending words

2007-02-22 Thread Felix Litman
OK. Thank you. We'll have to consider using this approach. I guess the drawback here is that ":" will not longer work as a field operator. ?:-( We were also considering using the following approach. String newquery = query.replace(query, ": ", " "); It seems this way a co

Re: search on colon ":" ending words

2007-02-22 Thread Antony Bowesman
Felix Litman wrote: Yes. thank you. How did you make that modification not to treat ":" as a field-name terminator? Is it using this Or some other way? I removed the : handling stuff from QueryParser.jj in the method: Query Clause(String field) : I removed this section --- [ LOOKAHE

RE: Optimizing Index

2007-02-22 Thread Damien McCarthy
What file system is the hard disc? If it is FAT32 one of your indexing files is probably getting bigger than 4.7 gigs - the maximum file size in FAT32 Damien -Original Message- From: maureen tanuwidjaja [mailto:[EMAIL PROTECTED] Sent: 23 February 2007 02:07 To: java-user@lucene.apache.or

Re: Optimizing Index

2007-02-22 Thread maureen tanuwidjaja
yes I do have around 75 GB of free space on that HDD...I do not invoke any index reader...hence the program only calls indexwriter to optimize the index,and that's it.. I am also perplexed why it tells that it have not enough disk space to do optimization... Michael McCandless <[EMAIL

Re: Positions in SpanFirst

2007-02-22 Thread Antony Bowesman
Chris Hostetter wrote: : So I don't see why using a SpanNear that respects order and a large : IncrementGap won't solve your problem.. Although it would return "odd" i think the use case he's worreid about is that he needs to be able to find matches just on the "start" of a persons name, ie.

Re: QueryParser bug?

2007-02-22 Thread Antony Bowesman
Chris Hostetter wrote: i'm not very familiar with this issue, but are you using setAllowLeadingWildcard(true) ? ... if not it definitely won't work. That's not the issue. (I've modified QP to allow "minWildcardPrefix" rather than just on/off), but the original QP shows the problem with setAl

Re: Using Lucene - Design Question

2007-02-22 Thread Peter W.
Hello, If you have experience using XML and doing web services requests Solr is what you need. It's production quality code and evolving quickly. It has a remarkable amount of extra functionality. For CORBA type programmers, go with terracotta. It looks to go a step further beyond sharing object

RE: Running Lucene as a stateless session bean

2007-02-22 Thread Walker, Keith 1
Thanks for the suggestions. I'm using the Lucene packaged with Gate, which is lucene-1.3-final.jar (ancient I suppose). I am now seeing the threading problems with GATE, and although I was hoping to stay with Gate in case we need some of it's capabilities, although with the current design we cou

Re: pagination

2007-02-22 Thread Peter W.
Hello, This snippet may help to understand TopDocs: http://mail-archives.apache.org/mod_mbox/lucene-general/200508.mbox/% [EMAIL PROTECTED] Also, paging through Lucene results is 'do-it-yourself' exercise using hits.length() until someone contributes a good implementation. Oversimplifying, i

Re: Efficient count of documents by type?

2007-02-22 Thread Erick Erickson
You might have some luck searching the mailing list for "faceted search", as I remember there's been quite a discussion on that topic and I *think* it applies... Even if you use a HitCollector, you still have to categorize your document, and all you have is the doc id to work with. But I think yo

Efficient count of documents by type?

2007-02-22 Thread Phillip Rhodes
I have a query that can return documents that represent different types of things (e.g. books, movies, coupons, etc) There is a "object_type" keyword on each document, so I can tell that a document is a coupon or a book etc... The problem is that I need to display a count of each item type tha

RE: Open & Close Reader

2007-02-22 Thread Chris Hostetter
: Actually I don't see how it could not be multi-threaded, : since it seems normal to me that I run it in a web application which is : multi-threaded for each user request ? every application in the world is not a web application. if you are dealing with multiple threads, you will need to o somet

Re: Lucene 1.4.3 : IndexWriter.addDocument(doc) fails when run on OS requiring permissions

2007-02-22 Thread Chris Hostetter
This sounds like it has absolutely nothing to do with Lucene, and everything to do with good security permissions -- your Zope/python front end is most likely running as a user thta does not have write permissions to the directory where your index lives. you'll need to remedy that. you can write

Re: QueryParser bug?

2007-02-22 Thread Chris Hostetter
i'm not very familiar with this issue, but are you using setAllowLeadingWildcard(true) ? ... if not it definitely won't work. : Date: Thu, 22 Feb 2007 15:36:43 +1100 : From: Antony Bowesman <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Q

Re: Scoring while sorting

2007-02-22 Thread Chris Hostetter
: > What is the point to calculate score if the result set is going to be sorted : > by some field? : No point, I believe, unless your sort includes relevance score. I ...which is non trivial information to deduce, since a SortField can contain a SortComparatorSource which uses a ScoreDocCompar

Re: how to define a pool for Searcher?

2007-02-22 Thread Mark Miller
I would not do this from scratch...if you are interested in Solr go that route else I would build off http://issues.apache.org/jira/browse/LUCENE-390 - Mark Mohammad Norouzi wrote: Hi all, I am going to build a Searcher pooling. if any one has experience on this, I would be glad to hear his/h

Re: "did you mean" for multi-word queries implementation

2007-02-22 Thread karl wettin
22 feb 2007 kl. 19.22 skrev Otis Gospodnetic: I believe it's a SpellChecker implementation deficiency, and Karl will probably suggest looking at LUCENE-626 as an alternative. And I'll ask you to please report back how much better than the contrib SpellChecker Karl's solution is. :) The

Re: "did you mean" for multi-word queries implementation

2007-02-22 Thread Otis Gospodnetic
I believe it's a SpellChecker implementation deficiency, and Karl will probably suggest looking at LUCENE-626 as an alternative. And I'll ask you to please report back how much better than the contrib SpellChecker Karl's solution is. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . .

"did you mean" for multi-word queries implementation

2007-02-22 Thread Felix Litman
Did any one have success implementing "did you mean" feature for multi-word queries as described in Tom White's excellent "Did you Mean Lucene?" article? http://today.java.net/pub/a/today/2005/08/09/didyoumean.html ...and more specifically, using the CompositeDidYouMeanParser implementation as

Re: search on colon ":" ending words

2007-02-22 Thread Felix Litman
Yes. thank you. How did you make that modification not to treat ":" as a field-name terminator? Is it using this Or some other way? String newquery = query.replace(query, ":", " "); Thank you, Felix Antony Bowesman <[EMAIL PROTECTED]> wrote: Not sure if you're still after a solution, but I ha

Re: Registering a local dtd file for use with Digester

2007-02-22 Thread Steven Rowe
Hi Mike, > I have a collection of XML files that I would like to parse using Digester > in order to index them for Lucene. A DTD file has been supplied for the XML > files, but none of those files has a line associating them > with the DTD file. Can the Digester's register function be used to tel

Re: Scoring while sorting

2007-02-22 Thread Otis Gospodnetic
- Original Message ---From: dmitri <[EMAIL PROTECTED]> > What is the point to calculate score if the result set is going to be sorted > by some field? No point, I believe, unless your sort includes relevance score. I believe there is a Lucene patch that involves a Matcher (a new concept fo

Re: a question about indexing database tables

2007-02-22 Thread Erick Erickson
OK, I was off on a tangent. We've had several discussions where people were effectively trying to replace a RDBMS with Lucene and finding out it that RDBMSs are very good at what they do ... But in general, I'd probably approach it by doing the RDBMS work first and indexing the result. I think th

RE: Open & Close Reader

2007-02-22 Thread DECAFFMEYER MATHIEU
Actually I don't see how it could not be multi-threaded, since it seems normal to me that I run it in a web application which is multi-threaded for each user request ? Erick, could u please explain to me your comment ? Thank u. __ Matt -Original Mess

RE: Returning only a small set of results

2007-02-22 Thread Kainth, Sachin
Thanks Erick you've helped a lot and so has everyone else. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 22 February 2007 13:00 To: java-user@lucene.apache.org Subject: Re: Returning only a small set of results See TopDocs, HitCollector, etc. You'll have to dig

Re: a question about indexing database tables

2007-02-22 Thread Mohammad Norouzi
Thanks Erick but we have to because we need to execute very big queries that create traffik network and are very very slow. but with lucene we do it in some milliseconds. and now we indexed our needed information by joining tables. it works fine, besides, it returns the exact result as we can get

Re: Multy Language documents indexing

2007-02-22 Thread Erick Erickson
I know this has been discussed several times, but sure don't remember the answers. Search the mail archive for "multiple languages" and you'll find some good suggestions. But as I remember, it's not a trivial issue. But I don't see why the "three different documents" approach wouldn't work. You c

Re: Open & Close Reader

2007-02-22 Thread Erick Erickson
Well, it's your logic that takes the request from the user and executes the search. So it's your logic that has to take care of any coordination between threads that use the same reader. This is a standard multi-threading resource-sharing issue. If your application is not multi-threaded, I don't

Multy Language documents indexing

2007-02-22 Thread Ivan Vasilev
Hi All, Our application that uses Lucene for indexing will be used to index documents that each of which contains parts written in different languages. For example some document could contain English, Chinese and Brazilian text. So how to index such document? Is there some best practice to do

Re: Returning only a small set of results

2007-02-22 Thread Erick Erickson
See TopDocs, HitCollector, etc. You'll have to dig through the documentation and try a few experiments to make sense of it all, one sentence explanations aren't much help. But think of Hits as a convenience class for getting the best-scoring 100 documents and use the other classes if you want to

Re: a question about indexing database tables

2007-02-22 Thread Erick Erickson
don't do either one Search this mail archive for discussions of databases, there are several long threads discussing this along with various options on how to make this work. See particularly a mail entitled *Oracle/Lucene integration -status- *and any discussions participated in by Marcelo O

Re: Lucene 1.4.3 : IndexWriter.addDocument(doc) fails when run on OS requiring permissions

2007-02-22 Thread Michael D. Curtin
Is your disk almost full? Under Linux, when you reach about 90% used on a file system, only the superuser can allocate more space (e.g. create files, add data to files, etc.). --MDC - To unsubscribe, e-mail: [EMAIL PROTECTED]

RE: Open & Close Reader

2007-02-22 Thread DECAFFMEYER MATHIEU
My question is what happen when a re-opening of the reader occurs and in the same time a user does a query on the index ? And are there solutions for this. __ Matt -Original Message- From: Michael McCandless [mailto:[EMAIL PROTECTED] Sent: Thursda

Re: Open & Close Reader

2007-02-22 Thread Michael McCandless
<[EMAIL PROTECTED]> wrote: > I need to merge indexes, > if I want the user to see the changes (the merged indexes), I heard I > need to close the index reader and re-open it again. Yes. More generally, whenever there have been changes to an index that you want your readers/searchers to see, you

Re: Optimizing Index

2007-02-22 Thread Michael McCandless
"maureen tanuwidjaja" wrote: > I had an exsisting index file with the size 20.6 GB...I havent done any > optimization in this index yet.Now I had a HDD of 100 GB,but apparently > when I create program to optimize(which simply calls writer.optimize() > to this indexfile),it gives the error

Re: Searching eats lots of memory?

2007-02-22 Thread karl wettin
22 feb 2007 kl. 05.21 skrev maureen tanuwidjaja: I also would like to know wheter searching in the indexfile eats lots of memory...I always ran out of memory when doing searching,i.e. it gives the exception java heap space(although I have put -Xmx768 in the VM argument) ...Is there any way

Re: autocomplete with multiple terms

2007-02-22 Thread karl wettin
22 feb 2007 kl. 10.09 skrev Martin Braun: the only thing I have found in the list before concerning this subject is http://issues.apache.org/jira/browse/LUCENE-625, but I'm not sure if it does the things I want. I am not sure if we get enough queries for a search over an index base on th

Re: ANN: Luke 0.7 released

2007-02-22 Thread Supriya Kumar Shyamal
Its really Great to have the tool compatible with Lucene 2.1. It saves lot of energy. Thanks once again. supriya Andrzej Bialecki wrote: Hi all, I'm happy to announce that a new version of Luke - the Lucene Index Toolbox - is now available. As usually, you can get it from: http://www.ge

RE: Returning only a small set of results

2007-02-22 Thread Kainth, Sachin
What can you use in place of Hits and how do they differ? -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 21 February 2007 22:43 To: java-user@lucene.apache.org Subject: Re: Returning only a small set of results : A question about efficiency and the internal wor

Registering a local dtd file for use with Digester

2007-02-22 Thread Mike O'Leary
I have a collection of XML files that I would like to parse using Digester in order to index them for Lucene. A DTD file has been supplied for the XML files, but none of those files has a line associating them with the DTD file. Can the Digester's register function be used to tell it to use that D

autocomplete with multiple terms

2007-02-22 Thread Martin Braun
Hello All, I am implementing a query auto-complete function à la google. Right now I am using a TermEnum enumerator on a specific field and list the Terms found. That works good for Searches with only one Term, but when the user's typing two or three words the function will autocomplete each Term

Re: Lucene 1.4.3 : IndexWriter.addDocument(doc) fails when run on OS requiring permissions

2007-02-22 Thread Doron Cohen
This is a very common use case and Lucene is most likely not the problem cause. My guess is that (1) the first attempt to write anything to disk failed. (2) opening the IndexWriter succeeded because (a) the index exists already (from previous successful run) and (b) locks are maintained in /tmp or

Open & Close Reader

2007-02-22 Thread DECAFFMEYER MATHIEU
Hi, I need to merge indexes, if I want the user to see the changes (the merged indexes), I heard I need to close the index reader and re-open it again. But I will need to do this avery x minutes for some reasons, So I wondered what could happen if user does a query just when a re-open of the read

Lucene 1.4.3 : IndexWriter.addDocument(doc) fails when run on OS requiring permissions

2007-02-22 Thread Ridzwan Aminuddin
Hi! I'm writing a java program that uses Lucene 1.4.3 to index and create a vector file of words found in Text Files. The purpose is for text mining. I created a Java .Jar file from my program and my python script calls the Java Jar executable. This is all triggered by my DTML code. I'm runnin

a question about indexing database tables

2007-02-22 Thread Mohammad Norouzi
Hello In our application we have to index the database tables, there is two way to make this 1- index each table in a separate directory and then keep all relation in order to get right result. in this method, we should use filters to overcome the problem of searching on another search result. 2.

how to define a pool for Searcher?

2007-02-22 Thread Mohammad Norouzi
Hi all, I am going to build a Searcher pooling. if any one has experience on this, I would be glad to hear his/her recommendation and suggestion. I want to know what issues I should be apply. considering I am going to use this on a web application with many user sessions. thank you very much in a