RE: Need help regarding implementation of autosuggest using jquery

2009-11-25 Thread DHIVYA M
Thanks for ur suggestion. By the way will u please give me some more details about TermEnum and its usage. Because am a beginner in using lucene and i want some theoritical explanations for TermEnum. So kindly provide me a website or any tutorial's link that provides ample information regarding

Re: Is Lucene a good choice for PB scale mailbox search?

2009-11-25 Thread fulin tang
Thanks all for the good suggestions ! But any idea of the storage? How can we make the indexes as small as possible? We know compressing is the only way, but when and where to compress is best for search? Thanks all again! 2009/11/24 Kay Kay : > fulin tang wrote: >> >> We are going to add full

RE: How to implement a GivenCharFilter using incrementToken

2009-11-25 Thread KingShooter
The example you have given is invalid, as offsets should always refer to the original position in the source stream, so should be: a(0,1,1) b(2,3,1) a(6,7,1) c(11,12,1). Deng: I'm afraid that if (case1) index "axxxb", and then I search "axb" or (case2) index "axb" and then search "ab", which

Re: SpanQuery for Terms at same position

2009-11-25 Thread Erick Erickson
Hmmm, are they unit tests? Or would you be wiling to create stand-alone unit tests demonstrating this and submit it as a patch? Best er...@alwaystrollingforworkfromothers.opportunistic. On Wed, Nov 25, 2009 at 5:38 PM, Christopher Tignor wrote: > my own tests with my own data show you are correc

Re: SpanQuery for Terms at same position

2009-11-25 Thread Christopher Tignor
my own tests with my own data show you are correct and the 1-n slop works for matching terms at the same ordinal position. thanks! C>T> On Wed, Nov 25, 2009 at 4:25 PM, Paul Elschot wrote: > Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor: > > It's worth noting however that thi

Re: SpanQuery for Terms at same position

2009-11-25 Thread Paul Elschot
Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor: > It's worth noting however that this -1 slop doesn't seem to work for cases > where oyu want to discover instances of more than two terms at the same > position. Would be nice to be able to explicitly set this in the query > constr

Re: SpanQuery for Terms at same position

2009-11-25 Thread Christopher Tignor
It's worth noting however that this -1 slop doesn't seem to work for cases where oyu want to discover instances of more than two terms at the same position. Would be nice to be able to explicitly set this in the query construction. thanks, C>T> On Tue, Nov 24, 2009 at 9:17 AM, Christopher Tignor

Re: Problem with a "." for searching Lucene 2.4.0

2009-11-25 Thread Ian Lea
In addition to Erick's advice, since you are storing filename without analysis you could use a TermQuery to find it. You can use BooleanQuery to combine that with other queries, including those generated by QueryParser. -- Ian. On Wed, Nov 25, 2009 at 6:11 PM, Erick Erickson wrote: > The first

Re: NearSpansUnordered payloads

2009-11-25 Thread Jason Rutherglen
I don't mind adding the "positions" of the payloads in them. However, maybe we can be little more clear in the javadocs what's going on underneath? On Wed, Nov 25, 2009 at 5:36 AM, Mark Miller wrote: > Grant Ingersoll wrote: >> On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote: >> >> >>> I'm i

Re: Problem with a "." for searching Lucene 2.4.0

2009-11-25 Thread Erick Erickson
The first question for this is always "what analyzers do you use at index AND query time?". I'd do two things immediately. First, what does query.toString() show you the query parses to? StandardAnalyzer does some "interesting" things with periods. Also, you have a hyphen (-) in your query which i

Problem with a "." for searching Lucene 2.4.0

2009-11-25 Thread Karl Heinz Marbaise
Hi, i'm just using Lucene 2.4 and have a problem with a "." within a field. This field contains a filename and obviously a filename can contain a "." (or multiple of them)... So if i do a search "+filename:testExcel-xaz.xls" this file will not be found...If i replace the "." with "?" it works

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Max Lynch
On Wed, Nov 25, 2009 at 11:18 AM, Erick Erickson wrote: > Why do you want to kill your indexer anyway? Just because it had > been running "too long"? Or was it behaving poorly? > > But yeah, you need to change your process, you're almost guaranteeing > that you'll corrupt your index. I've learne

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Erick Erickson
Why do you want to kill your indexer anyway? Just because it had been running "too long"? Or was it behaving poorly? But yeah, you need to change your process, you're almost guaranteeing that you'll corrupt your index. Perhaps, if you really need to stop and restart you could have your indexer vol

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Max Lynch
On Wed, Nov 25, 2009 at 9:49 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Before 2.4 it was possible that a crash of the OS, or sudden power > loss to the machine, could corrupt the index. But that's been fixed > with 2.4. > > The only known sources of corruption are hardware faul

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Michael McCandless
Before 2.4 it was possible that a crash of the OS, or sudden power loss to the machine, could corrupt the index. But that's been fixed with 2.4. The only known sources of corruption are hardware faults (bad RAM, bad disk, etc.), and, accidentally allowing 2 writers to write to the same index at o

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Ian Lea
Yes, good point. Messing around with lucene locking may well be a way to get corrupt indexes. Any others? -- Ian. On Wed, Nov 25, 2009 at 3:37 PM, Max Lynch wrote: > On Wed, Nov 25, 2009 at 9:31 AM, Ian Lea wrote: > >> > What are the typical scenarios when the index will go corrupt? >> >> D

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Max Lynch
On Wed, Nov 25, 2009 at 9:31 AM, Ian Lea wrote: > > What are the typical scenarios when the index will go corrupt? > > Dodgy disks. > I also have had index corruption on two occasions. It is not a big deal for me since my data is fairly real time so the old documents aren't as important. Howev

Re: best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Ian Lea
> What are the typical scenarios when the index will go corrupt? Dodgy disks. > E.g. can a simple JVM crash during indexing will cause it? No. See the javadocs for IndexWriter. > What are the best way to minimalize the possibility of corrupt index? Don't use dodgy disks. > Copy the directory

best way to ensure IndexWriter won't corrupt the index?

2009-11-25 Thread Istvan Soos
Hi, What are the typical scenarios when the index will go corrupt? E.g. can a simple JVM crash during indexing will cause it? What are the best way to minimalize the possibility of corrupt index? Copy the directory before indexing / then flipping the pointers? I'm using Lucene 2.9. Thanks, I

Re: customized SpanQuery Payload usage

2009-11-25 Thread Christopher Tignor
The problem is that I need to be able to match spans resulting from a a SpanNearQuery with the Term they came from so I can eliminate using Payloads from certain Terms on a query-by-query basis. I still need this term to effect the results of a NearSpanQuery as per the usual logic, I just need to

Re: NearSpansUnordered payloads

2009-11-25 Thread Mark Miller
Grant Ingersoll wrote: > On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote: > > >> I'm interested in getting the payload information from the >> matching span, however it's unclear from the javadocs why >> NearSpansUnordered is different than NearSpansOrdered in this >> regard. >> >> NearSpans

Re: NearSpansUnordered payloads

2009-11-25 Thread Grant Ingersoll
On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote: > I'm interested in getting the payload information from the > matching span, however it's unclear from the javadocs why > NearSpansUnordered is different than NearSpansOrdered in this > regard. > > NearSpansUnordered returns payloads in a has

Re: customized SpanQuery Payload usage

2009-11-25 Thread Grant Ingersoll
On Nov 24, 2009, at 9:56 AM, Christopher Tignor wrote: > Hello, > > For certain span queries I construct problematically by piecing together my > own SpanTermQueries I would like to enforce that Payload data is not > returned for matches on those specific terms used by the constituent > SapnTerm

Re: updating spell index

2009-11-25 Thread Grant Ingersoll
On Nov 24, 2009, at 12:34 AM, m.harig wrote: > > hello all > >is there any way to update the spell index directory ? please any1 help > me out of this. You have to rebuild it, as there is no incremental indexing. -- Grant Ingersoll http://www.lucidimagination.com

Re: how to score in lucene

2009-11-25 Thread Grant Ingersoll
On Nov 20, 2009, at 5:46 AM, Wilson Wu wrote: > hi, >I have a problem with scoring a document in lucene. I know there > are some factors such as docNum,boost,idf,docFreq,lengthNorm and so > on. And I also know how to count docNum,docFreq,idf, but I really have > no idea about counting the len

RE: How to implement a GivenCharFilter using incrementToken

2009-11-25 Thread Uwe Schindler
I do not understand your request completely, maybe you tell us some more requirements of your implementation. The example you have given is invalid, as offsets should always refer to the original position in the source stream, so should be: a(0,1,1) b(2,3,1) a(6,7,1) c(11,12,1). The second probl

RE: Need help regarding implementation of autosuggest using jquery

2009-11-25 Thread Uwe Schindler
Hi Dhivya, you can iterate all terms in the index using a TermEnum, that can be retrieved using IndexReader.terms(Term startTerm). If you are interested in all terms from a specific field, position the TermEnum on the first possible term in this field ("") and iterate until the field name changes