Distinction between AtomicReader and CompositeReader

2013-04-24 Thread Paul Taylor
Trying to convert some Lucene 3 code to Lucene 4, I want to use termEnums.docs(ir.getLiveDocs()) to only return docs that have not been deleted for a particular term. However getLiveDocs() is only available for AtomicReaders, and although I just have a single index it is file based and uses Di

Too many unique terms

2013-04-24 Thread Manuel LeNormand
Hi there, Looking at my index (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these terms wou

Re: Too many unique terms

2013-04-24 Thread Adrien Grand
Hi Manuel, On Thu, Apr 25, 2013 at 12:29 AM, Manuel LeNormand wrote: > Hi there, > Looking at my index (about 1M docs) i see lot of unique terms, more > than 8M which is a significant part of my total term count. These are very > likely useless terms, binaries or other meaningless numbers that co

Re: Distinction between AtomicReader and CompositeReader

2013-04-24 Thread Adrien Grand
Hi Paul On Wed, Apr 24, 2013 at 1:35 PM, Paul Taylor wrote: > Trying to convert some Lucene 3 code to Lucene 4, > > I want to use termEnums.docs(ir.getLiveDocs()) to only return docs that have > not been deleted for a particular term. However getLiveDocs() is only > available for AtomicReaders, a

Re: org.apache.lucene.classification - bug in SimpleNaiveBayesClassifier

2013-04-24 Thread Adrien Grand
Hi Alexey, On Tue, Apr 23, 2013 at 3:28 PM, Alexey Anatolevitch wrote: > I was trying it with 4.2.1 and SimpleNaiveBayesClassifier seems to have a > bug - the local copy of BytesRef referenced by foundClass is affected by > subsequent TermsEnum.iterator.next() calls as the shared BytesRef.bytes >

Faceted Search: count direct matches/member für result nodes

2013-04-24 Thread Schimke, Danny
Hi, I am new to lucene. I've done some basics so far. Currently I have to deal with Faceted Search. Given: For example I have the following categories: Root Root/idA/ Root/idA/idB Root/idA/idB/idC Scenario: The search result delivers the folowing FacetResult for example: R