Re: confused about an entry in the FAQ

2008-05-13 Thread Stephane Nicoll
ping. Sorry for the long email but I prefer to provide all information first. On Mon, May 12, 2008 at 12:13 PM, Stephane Nicoll <[EMAIL PROTECTED]> wrote: > I tried all this and I am confused about the result. I am trying to > implement an hybrid query handler where I fetch the IDs from a > data

Re: Exact match query on a field in index which has been indexed using StandardAnalyzer

2008-05-13 Thread Gauri Shankar
Thanks Erick... My index size is ~2 GB. Is it a good idea to keep another duplicate field as UN_TOKENIZED and search using KeywordAnalyzer? Few points: 1. when I say exact match then I mean the exact phrase match only. That implies the query should not match a document with the field value "Memb

Re: Search for long titles - wildcard queries

2008-05-13 Thread Daniel Noll
On Saturday 10 May 2008 20:32:42 legrand thomas wrote: > I think I cannot use the WildcardQuery because the term shouldn't start > with "*" of "?". Should I use a QueryParser ? How can I do it ? WildcardQuery does permit a wildcard at the front, it's just much slower. Also, QueryParser allows w

"Off By One": CorruptIndexException

2008-05-13 Thread Stu Hood
Hey gang, I think we've been suffering from the following bug, and I have a question about the JVM fix. http://markmail.org/message/di3vdyfq5odfbai6 We're running 1.6.0_05 and Lucene 2.3.2. Supposedly downgrading to 1.6.0_02 will fix the issue, but I'd much rather upgrade if possible. 1.6.0_0

Re: Find last term

2008-05-13 Thread Jason Rutherglen
Last term, field, TermEnum On Tue, May 13, 2008 at 12:34 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > Find the last term of what? Document? Field in an index? Query? > > Best > Erick > > On Tue, May 13, 2008 at 12:28 PM, Jason Rutherglen < > [EMAIL PROTECTED]> wrote: > > > It is easy to find t

Re: Find last term

2008-05-13 Thread Erick Erickson
Find the last term of what? Document? Field in an index? Query? Best Erick On Tue, May 13, 2008 at 12:28 PM, Jason Rutherglen < [EMAIL PROTECTED]> wrote: > It is easy to find the first term using TermEnum. Is there a way to find > the last term without using StringIndex and binarysearch? Are t

Find last term

2008-05-13 Thread Jason Rutherglen
It is easy to find the first term using TermEnum. Is there a way to find the last term without using StringIndex and binarysearch? Are there plans to offer this functionality?

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Jay O'Leary
If it's windows only, you can roll your own with IFilters ( http://www.ifilter.org/). On Tue, May 13, 2008 at 10:23 AM, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > Does it make sense to consider using OpenOffice to convert from MS formats > to PDF or HTML before indexing. Would this yield me a lower

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Lukas Vlcek
Does it make sense to consider using OpenOffice to convert from MS formats to PDF or HTML before indexing. Would this yield me a lower fail rate as opposed to pure POI approach? I don't care about formating now I care about content in the first place. Formating would be important only in the case t

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Andrzej Bialecki
Grant Ingersoll wrote: I've used POI, as well as commercial providers. As always, it depends :-) I wasn't particularly impressed with the commercial providers given the amount of money they wanted for it. PDF was particularly tricky, but you weren't asking about that. At least w/ POI, you

Re: Exact match query on a field in index which has been indexed using StandardAnalyzer

2008-05-13 Thread Erick Erickson
First, why do you OR together the different cases? Assuming you're pushing your query through StandardAnalyzer, it'll lowercase for you (just as it did during indexing). But to your question. Would you expect your query to match a document with the field value "Member of Technical Staff for Accoun

Exact match query on a field in index which has been indexed using StandardAnalyzer

2008-05-13 Thread Gauri Shankar
Hi, I have a field in index which has been indexed using StandardAnalyzer and as TOKENIZED. Now I would like to write a query which returns the hit if there is a exact match on the field value. Say, if field value is : Member of Technical Staff then "member of technical staff" OR "Member of Techn

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Grant Ingersoll
I've used POI, as well as commercial providers. As always, it depends :-) I wasn't particularly impressed with the commercial providers given the amount of money they wanted for it. PDF was particularly tricky, but you weren't asking about that. At least w/ POI, you have the opportuni

Re: Can POI provide reliable text extraction results for productionsearch engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Robert . Hastings
We are using Aspose: www.aspose.com. We are still in pre-release, it works fine for all of the MS products. It's commercial, but is a good deal as long as you don't have too many developers working on it, since the licensing is per seat. We had a little trouble with thier PDF product. The o

Re: Can POI provide reliable text extraction results for productionsearch engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread mark harwood
On the commercial front, Oracle's "Outside In" (previously Stellent) is the one that gets used in a lot of search engines. Being a C-based product though, integration isn't quite as nice/easy as pure Java solutions. - Original Message From: Bowesman Antony <[EMAIL PROTECTED]> To: java

Re: Numerical Range Query

2008-05-13 Thread Bowesman Antony
An alternative to Lucene's NumberTools, is Solr's NumberUtils, which is more space efficient for indexing numbers, but not as pretty to look at http://lucene.apache.org/solr/api/org/apache/solr/util/NumberUtils.html Dan Hardiker wrote: > Hi, > > I've got an application which stores ratings fo

Re: Can POI provide reliable text extraction results for productionsearch engine for Word, Excel and PowerPoint formats?

2008-05-13 Thread Bowesman Antony
We are using POI 3.0.2 FINAL. Like you, it is not very reliable for many Word files. It does not support Word 2, Fast saved files, files which are not padded to 256 bytes. PPT and Excel are quite bad, a large % of our PPT files throw Exceptions. Not tried 3.1 as it's just gone BETA 1, but I

Re: Search and retrieve the line data from the File

2008-05-13 Thread Madan Narra
Hi All, Can anyone please explain how can the below explained task accomplished... Thanks, Madan N On Mon, May 12, 2008 at 3:21 PM, Madan Narra <[EMAIL PROTECTED]> wrote: > > > > > Hi All, > > I am very much new to Lucene and want to extend my skills over this tool > > But i am in need of a