date:20111027

Finding Term Positions in the original document

2011-10-27 Thread Vidya Kanigiluppai Sivasubramanian

Hi, I am using lucene 2.4.1 in my project. I need to display the search results when searched for a particular term and on selecting an item in the result page, I need to display the document where the term was found highlighting the match terms in the display. For this I need to know the match

Re: using lucene to find neighbouring points in an n-dimensional space

2011-10-27 Thread prasenjit mukherjee

Thanks for responding. On Fri, Oct 28, 2011 at 1:12 AM, Felipe Hummel wrote: > For the indexing part, you can 'insert' the term multiple times (term-weight > times) constructing the document String manually. That is not very typical, > you would normally feed Lucene with the original documents fo

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

2011-10-27 Thread alfredhong

Hi, Mike, Thanks for your analysis. You are correct in that BalancedSegmentMergePolicy is used. We previously used LogByteSizeMergePolicy but might have run into some other issues that I was involved in so weren't using it. Re: TieredMergePolicy, we'll definitely check that out when we update

Re: using lucene to find neighbouring points in an n-dimensional space

2011-10-27 Thread Felipe Hummel

For the indexing part, you can 'insert' the term multiple times (term-weight times) constructing the document String manually. That is not very typical, you would normally feed Lucene with the original documents for it to parse and index. The query processing could be done similar as you said. Jus

Re: performance question - number of documents

2011-10-27 Thread Felipe Hummel

Hi, there are two types of query processing in document retrieval: document-at-a-time and term-at-a-time. Lucene uses document-at-a-time processing. That means the posting lists (the list of documents a word appears in) is sorted by the document IDs. This type of processing is usually better for l

Re: Lucene 3.1 search paralelism per segment doubt

2011-10-27 Thread Simon Willnauer

On Thu, Oct 27, 2011 at 2:50 PM, Robert Muir wrote: > On Mon, Oct 10, 2011 at 7:02 AM, Marc Sturlese > wrote: >> I've read in another thread >> (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991) >> /Since Lucene 2.9, Lucene works on a per segment basis when sea

Re: index bigger than it should be?

2011-10-27 Thread Ian Lea

There's org.apache.lucene.index.CheckIndex which will report assorted stats about the index, as well as checking it for correctness. It can fix it too but you don't need that. I hope. Will take quite a while to run on a large index. What version of lucene? Does a before/after (or large/small) d

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

2011-10-27 Thread Michael McCandless

It looks like you are using BalancedSegmentMergePolicy right? And somehow it gets stuck in a state where it keeps merging the same single segment into a new segment, which is odd. Likely this is a bug in BSMP. Do you see this same looping with eg LogByteSizeMergePolicy? Note that newer versions

Re: Lucene 3.1 search paralelism per segment doubt

2011-10-27 Thread Robert Muir

On Mon, Oct 10, 2011 at 7:02 AM, Marc Sturlese wrote: > I've read in another thread > (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991) > /Since Lucene 2.9, Lucene works on a per segment basis when searching. Since > Lucene 3.1 it can even parallelize on multipl

Re: idf calculation in Lucene ?

2011-10-27 Thread Robert Muir

On Thu, Oct 20, 2011 at 3:11 PM, David Ryan wrote: > > However, in some case, when I search o'reilly , I see > > * 44.0865 = idf(title: o''reilli=4 o=1488 reilli=14 oreilli=4)* > > In this cae, How is IDF calculated? > thats a phrase or multiphrase query. in this case it sums up the idf of

index bigger than it should be?

2011-10-27 Thread v . sevel

Hi, I have an application that has an index with 30 millions docs in it. every day, I add around 1 million docs, and I remove the oldest 1 million, to keepit stable at 30 million. for the most part doc fields are indexed and stored. each doc weighs around from a few Kb to a 1 Mb (a few Mb in so

Finding Term Positions in the original document

Re: using lucene to find neighbouring points in an n-dimensional space

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

Re: using lucene to find neighbouring points in an n-dimensional space

Re: performance question - number of documents

Re: Lucene 3.1 search paralelism per segment doubt

Re: index bigger than it should be?

Re: IndexWriter loops trying to merge using ConcurrentMergeScheduler

Re: Lucene 3.1 search paralelism per segment doubt

Re: idf calculation in Lucene ?

index bigger than it should be?

11 matches

Site Navigation

Mail list logo

Footer information