Re: Facet DrillDown Exclusion

2016-12-06 Thread Matt Hicks
Thanks, that did the trick! On Tue, Dec 6, 2016 at 8:58 AM Shai Erera wrote: > Hey Matt, > > You basically don't need to use DDQ in that case. You can construct a > BooleanQuery with a MUST_NOT clause for filter out the facet path. Here's a > short code snippet: > > String indexedField = config.

Re: Apply Lucene Query on Bits

2016-12-06 Thread Hendrik Dev
Thx for your help. I think i found an additional solution: QueryBitSetProducer bsp = new QueryBitSetProducer(myquery); @Override public Bits getLiveDocs() { return bsp.getBitSet(this.getContext()); } but the link from Uwe seems quite promising as it also solves the numDocs() problem. On Mo

Re: Facet DrillDown Exclusion

2016-12-06 Thread Shai Erera
Hey Matt, You basically don't need to use DDQ in that case. You can construct a BooleanQuery with a MUST_NOT clause for filter out the facet path. Here's a short code snippet: String indexedField = config.getDimConfig("Author").indexFieldName; // Find the field of the "Author" facet Query q = new

RE: Hardcoded checksum mechanism in BlockTreeTermsReader

2016-12-06 Thread Uwe Schindler
Hi, The checksum is also written for a second reason: Java VMs often have optimization bugs (you may know the Java 7 GA disaster and Java 7u40 vector optimization bugs that Lucene discovered). The checksums will often catch those bugs, too. Uwe - Uwe Schindler Achterdiek 19, D-28357 Breme

Re: Offset bug in WordDelimiterFilter?

2016-12-06 Thread Michael McCandless
It looks like WDF strips the 's (STEM_ENGLISH_POSSESSIVE flag) but doesn't reflect that in the end offset. I'm not sure this is a bug, in that it seems OK to highlight the token minus its attached English possessive? It could be it was originally be design? E.g. you can see it here: http://jiras

Re: Hardcoded checksum mechanism in BlockTreeTermsReader

2016-12-06 Thread Michael McCandless
I see. Bits can also be flipped by the network as they are travelling to/from the DB. The end to end checksum Lucene does now would catch that. Anyway, that BlockTree index file that is being entirely checksummed is a very small file. And, using the first pattern is not easy for it because it n

Offset bug in WordDelimiterFilter?

2016-12-06 Thread Markus Jelsma
Hello - i noticed something peculiar running Lucene/Solr 6.3.0. The plural vaccinatieprogramma's should have a startOffset of 0 and a endOffset of 21 when passed through WordDelimiterFilter and/or stemmers but it isn't, slightly messing up highlighted terms. wdf = new WordDelimiterFilter(ne

Re: Hardcoded checksum mechanism in BlockTreeTermsReader

2016-12-06 Thread Duke DAI
Thanks for your quick response, Mike. Database has its own raw page management over OS page management, and most likely database has its own checksum on page level, that's why I want to avoid checksum in Lucene Directory level. Certainly checksum is good, I like the pattern(rewrite openChecksumIn

Re: Increase in ByteBufferImpl class heap size in longevity run

2016-12-06 Thread Michael McCandless
If you are actively indexing and opening new near-real-time readers, the number of segments in your index will increase, which means the number of open input files (corresponding to instances of ByteBufferIndexInput.SingleBufferImpl) will be created. So it's expected its heap usage grows, but the

Re: Hardcoded checksum mechanism in BlockTreeTermsReader

2016-12-06 Thread Michael McCandless
We have learned over time not to trust the underlying store to correctly record the bytes we wrote to it. This is why checksumming is very strongly built into Lucene at this point. If you disable checksumming, when bits do flip, you get exotic exceptions at search time that might look like Lucene

Hardcoded checksum mechanism in BlockTreeTermsReader

2016-12-06 Thread Duke DAI
Hi all, I'm customizing Lucene Directory, which extends o.a.l.store.Directory based on database files. I do not need checksum again on IndexIndex and IndexOutput. But in BlockTreeTermsReader constructor, following code open a hard-coded BufferedChecksumIndexInput to checksum on raw IndexInput. I