Re: is lucene fit for doing this?

2008-01-29 Thread Mr Shore
hi Michael thanks for your advice I think I'll have a try Mr Shore 2008/1/30, Michael Prichard <[EMAIL PROTECTED]>: > > I would say Lucene is a capable of helping you do this. Remember > that it is a set of libraries and you have to build the functionality > you need with that. BUT with a littl

Re: is lucene fit for doing this?

2008-01-29 Thread Michael Prichard
I would say Lucene is a capable of helping you do this. Remember that it is a set of libraries and you have to build the functionality you need with that. BUT with a little planning and elbow grease you will be able to use it to create a great search engine. Good luck. On Jan 29, 2008,

Query in Lucene 2.3.0

2008-01-29 Thread ajay_garg
Hi all. Lucene latest version - 2.3.0 says that the default behaviour of flushing from memory to file-system based index is based upon RAM usage - with 16 MB being the default value. Fine. Works for me, as long as I am using a single thread to write into the index. However, I have been trying to

is lucene fit for doing this?

2008-01-29 Thread Mr Shore
hi all I'm now looking for a search engine capable of processing,say 300 gigabytes of files is lucene fit for this job? in fact I'm using python,and just find Pylucene and Lupy which is opensource and written by python. if the answer is not,any recommendation is greatly apreciated

Re: Luke for Lucene 2.3?

2008-01-29 Thread Kyle Maxwell
> Go to the luke home page (google lucene luke) and it will show you how > to start it up using the latest jar. You just launch the main class with > the lucene jar of your choice and the luke jar on the class path. IME, there were a couple additional (trivial) bugs due to internal api changes, su

Re: Luke for Lucene 2.3?

2008-01-29 Thread Kyle Maxwell
On 1/29/08, vivek sar <[EMAIL PROTECTED]> wrote: > Hi, > > Has anyone tried Luke v0.7.1 with the latest Lucene build, v2.3? I'm > getting "Unknown format version: -4" error when opening Lucene 2.3 > index with Luke 0.7.1. Is there any upgraded version of Luke anywhere? > > I also read something ab

Re: Luke for Lucene 2.3?

2008-01-29 Thread Mark Miller
Go to the luke home page (google lucene luke) and it will show you how to start it up using the latest jar. You just launch the main class with the lucene jar of your choice and the luke jar on the class path. vivek sar wrote: Hi, Has anyone tried Luke v0.7.1 with the latest Lucene build, v2

Luke for Lucene 2.3?

2008-01-29 Thread vivek sar
Hi, Has anyone tried Luke v0.7.1 with the latest Lucene build, v2.3? I'm getting "Unknown format version: -4" error when opening Lucene 2.3 index with Luke 0.7.1. Is there any upgraded version of Luke anywhere? I also read something about web-based Luke, but can't find it in the contrib in 2.3,

Re: Some Help needed in search.

2008-01-29 Thread Doron Cohen
You can add phrase on the writer field. I.e. with high boost of 3 and low boost of 2, writing 'h' for 'heading' and 'w' for 'writer', try this query: h:sachin^3 d:tendulkar^3 w:sachin^2 w:tendulkar^2 w:"h:Sachin Tendulkar"^6 On Jan 29, 2008 9:23 AM, Sure <[EMAIL PROTECTED]> wrote: > > Hi Al

Re: Word / Pharse match shown in a context

2008-01-29 Thread Mark Miller
Look at the Highlighter in contrib. It creates fragments (context) and highlights search terms in them (keywords). If you want to highlight Phrase's correctly, check out this issue which adds support for Spans and PhraseQuerys: https://issues.apache.org/jira/browse/LUCENE-794 Mark DURGA DE

Word / Pharse match shown in a context

2008-01-29 Thread DURGA DEEP
Dear All, I've been scouring through the Lucene classes. Are there any classes which can help me acheive the following ?. 1) We are an e-mail service provider. We wanted to provide a seach capability of e-mail messages via Lucene. So far we are able to index/ parse the e-mail.

RE: Apostrophe filtering in StandardFilter

2008-01-29 Thread Steven A Rowe
On 01/29/2008 at 10:05 AM, Grant Ingersoll wrote: > On Jan 29, 2008, at 9:29 AM, christophe blin wrote: > > thanks for the pointer to the ellision filter, but I am currently stuck > > with lucene-core-2.2.0 found in maven2 central repository (do not > > contain this class). I'll watch for an upgrad

Re: Apostrophe filtering in StandardFilter

2008-01-29 Thread Grant Ingersoll
On Jan 29, 2008, at 9:29 AM, christophe blin wrote: Hi, thanks for the pointer to the ellision filter, but I am currently stuck with lucene-core-2.2.0 found in maven2 central repository (do not contain this class). I'll watch for an upgrade to 2.3 in the future. 2.3 should be available

Re: Apostrophe filtering in StandardFilter

2008-01-29 Thread Mathieu Lecarme
christophe blin a écrit : Hi, thanks for the pointer to the ellision filter, but I am currently stuck with lucene-core-2.2.0 found in maven2 central repository (do not contain this class). I'll watch for an upgrade to 2.3 in the future. you can backport it easily with copy-paste. M. --

RE: Apostrophe filtering in StandardFilter

2008-01-29 Thread christophe blin
Hi, thanks for the pointer to the ellision filter, but I am currently stuck with lucene-core-2.2.0 found in maven2 central repository (do not contain this class). I'll watch for an upgrade to 2.3 in the future. BTW, I think there is an error in the current javadoc because the sentence "Note that

Re: CustomScoreQuery Not Returning Value in Index

2008-01-29 Thread Briggs
Repository. Heh. On Jan 29, 2008 9:01 AM, Briggs <[EMAIL PROTECTED]> wrote: > BTW, just wanted to say thanks again. It's working now. I still don't > know how the values were reversed. I believe I must have had a bug in > the code, but it wasn't visible to me. I think netbeans did an > 'instal

Re: CustomScoreQuery Not Returning Value in Index

2008-01-29 Thread Briggs
BTW, just wanted to say thanks again. It's working now. I still don't know how the values were reversed. I believe I must have had a bug in the code, but it wasn't visible to me. I think netbeans did an 'install' with maven and put some bad code in the respository directory and the test case was

RE: Apostrophe filtering in StandardFilter

2008-01-29 Thread Steven A Rowe
Hi Chris, Looks like the ElisionFilter handles the French problems you mentioned: See the code for the list of /X'/ constructions it handles:

Re: TermVector

2008-01-29 Thread Grant Ingersoll
Have a look at the SpanQuery, specifically the SpanNearQuery. The getSpans() method will return a Spans object, which you can use to access the positions. -Grant On Jan 29, 2008, at 7:17 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: And how can I find the offsets of something like "f

RE: TermVector

2008-01-29 Thread spring
> > And how can I find the offsets of something like "foo bar"? > I think > > this > > will get tokenized into 2 terms and thus I have no chance to find > > it, right? > > I wouldn't say no chance... TermVectorMapper would be good > for this, > as you can watch the terms as they are being

Apostrophe filtering in StandardFilter

2008-01-29 Thread christophe blin
Hi, I see a lots of thread about apostrophe not being considered a separator and I see lots of french people complaining about that (I also complain since I am french ;) ). My question is "what is the status of http://tinyurl.com/ynskw3 ?" I think the patch given in this thread will work for en

Re: Lucene to index OCR text

2008-01-29 Thread mark harwood
You could take try take a large corpus of the text (say Wikipedia) and use it to inform the likelihood of word sequences. Take the OCR output and produce fuzzy spelling variations for each word in a window of text (say 5 or 6 words) and then examine the likelihood of the different permutations

Re: Lucene to index OCR text

2008-01-29 Thread Paul Elschot
Op Tuesday 29 January 2008 03:32:08 schreef Daniel Noll: > On Friday 25 January 2008 19:26:44 Paul Elschot wrote: > > There is no way to do exact phrase matching on OCR data, because no > > correction of OCR data will be perfect. Otherwise the OCR would have made > > the correction... > > > > The