oom on FastTaxonomyFacetsCounts

2016-12-27 Thread Sheng
This is probably not the fault of Lucene, as oom happened on the loc : values = new int[taxoReader.getSize()]; So taxoReader.getSize() probably is too big. My question is is there a more memory friendly way (also without significant performance penality) to get FacetResult for a particular dimens

Re: SortingMergePolicy moved to solr ?

2016-09-14 Thread Sheng
che.org/jira/browse/LUCENE-6766 has all the gory > details. > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Sep 14, 2016 at 10:56 AM, Sheng > > wrote: > > Before 6.2, it is in Lucene-misc, now I can only find it in solr. I > > understand it migh

SortingMergePolicy moved to solr ?

2016-09-14 Thread Sheng
Before 6.2, it is in Lucene-misc, now I can only find it in solr. I understand it might have something to do with an issue I reported earlier that sortingmergepolicy cannot handle point field properly, but my expectation by then was to expect this would be addressed in a later version instead of be

Re: dv field is too large

2016-07-06 Thread Sheng
ending it to Lucene. > > Best, > Erick > > On Wed, Jul 6, 2016 at 3:53 PM, Sheng > > wrote: > > You misunderstand. I have many fields, and unfortunately a few of them > are > > quite big, i.e. exceeding the 32k limit. In order to make these "big" >

Re: dv field is too large

2016-07-06 Thread Sheng
independently. But from what you've described, putting the entire > thing into a single DV field isn't useful. > > Best, > Erick > > > > On Wed, Jul 6, 2016 at 3:10 PM, Sheng > > wrote: > > To be clear, the "field" is indeed tokenized,

Re: dv field is too large

2016-07-06 Thread Sheng
less.com > > On Wed, Jul 6, 2016 at 5:55 PM, Sheng > > wrote: > > > Hi Eric, > > > > I am refactoring a legacy system. One of the most annoying things is I > have > > to keep the old feature even though it makes little sense. In this case, > we > > h

Re: dv field is too large

2016-07-06 Thread Sheng
To be clear, the "field" is indeed tokenized, which is accompanied with a SortedDocValueField so that it is sortable too. Am I making the wrong assumption here ? On Wednesday, July 6, 2016, Sheng wrote: > Hi Eric, > > I am refactoring a legacy system. One of the most annoying

Re: dv field is too large

2016-07-06 Thread Sheng
valid reason, but > it's > not obvious what use-case you're serving from this thread so far > > Nobody has yet put forth a compelling use-case for such large fields, > perhaps > this would be one. > > Best, > Erick > > On Wed, Jul 6, 2016 at 2:24 PM

Re: dv field is too large

2016-07-06 Thread Sheng
> Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Jul 6, 2016 at 10:31 AM, Sheng > > wrote: > > > Hi, > > > > I am getting an IAE indicating one of the SortedDocValueField is too > large, > > > 32k > > > > I googled a b

dv field is too large

2016-07-06 Thread Sheng
Hi, I am getting an IAE indicating one of the SortedDocValueField is too large, > 32k I googled a bit, and it seems like #Lucene-4583 has addressed this issue in 4.5 and 6.0, while I am currently using Lucene 6.1. Do I miss or misunderstand anything ? Thanks,

Re: SortingMergePolicy in Lucene 6

2016-06-10 Thread Sheng
and > secondarily by "blockID" where blockID is a unique long doc value indexed > on each document in the block. That should preserve your blocks? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, May 25, 2016 at 8:26 PM, Sheng > wrote: &

Re: SortingMergePolicy in Lucene 6

2016-05-25 Thread Sheng
test Lucene's current master > and confirm points and index-time sorting work correctly for you? > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, May 25, 2016 at 1:10 PM, Sheng > wrote: > >> It makes a call to SlowCompositeReaderWrapper in line 1

SortingMergePolicy in Lucene 6

2016-05-25 Thread Sheng
It makes a call to SlowCompositeReaderWrapper in line 103, which checks if field hasPointValues in line 68. If yes, it throws an exception "cannot wrap points". Does this essentially mean SortingMergePolicy cannot be used for index that has point values. If yes, what is the rationale behind it ?

Re: 500 millions document for loop.

2016-04-21 Thread Sheng
If you don't care about search, why not just use reader to traverse ? Establish a for loop from 0 to reader.maxDoc() - 1, and filter the documents using Multifields. You can even bucket this procedure, and run your statistics calc in parallel. On Thursday, November 12, 2015, Valentin Popov wrote:

Re: What is the propper replacement for Filters working in DocValue fields?

2016-03-23 Thread Sheng
One possible workaround I can think of is to make use of the CustomScoreQuery to do a posteri scoring, let documents not matching your criteria have score 0, and use a PostiveScoreOnlyCollector to harvest the search result. Now problem using CustomScoreQuery is FieldCache is deprecated too, but you

Re: Weird Lucene 5 filter behavior

2016-02-10 Thread Sheng
MUST instead? And is is guaranteed the behavior would be the same as that written in Filter ? On Wednesday, February 10, 2016, Sheng wrote: > question is asked on SO, > > > http://stackoverflow.com/questions/35320661/weird-filter-behavior-in-lucene-5 > > I am behind the firm proxy th

Weird Lucene 5 filter behavior

2016-02-10 Thread Sheng
question is asked on SO, http://stackoverflow.com/questions/35320661/weird-filter-behavior-in-lucene-5 I am behind the firm proxy that make me have to type in phone to send this to the mail group. If there is any read inconvenience, apologize in advance!

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
uld use a version of > SpanPayloadCheckQuery? There isn't anything that combines checking and > scoring for payloads at the moment, but I don't think it would be too > difficult to write one. > > Alan Woodward > www.flax.co.uk > > > On 22 Oct 2015, at 16:21, Sh

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
e > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Sheng [mailto:sheng...@gmail.com] > > Sent: Thursday, October 22, 2015 4:06 PM > > To: java-user@lucene.apache.org > > Subject: Re: ConjunctionScorer access > > > > That&#x

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
A and term B in > "payload_field" will not necessarily have term A in "excluded_field" -- > only the ones that you don't want to see in the result set. > > Regards, > AndrĂ¡s > > On Thu, Oct 22, 2015 at 4:06 PM, Sheng wrote: > > > That's

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
want to leverage this. On Thu, Oct 22, 2015 at 9:22 AM, Alan Woodward wrote: > You should be able to use a FilterScorer that wraps a ConjunctionScorer > and overrides score(). > > Alan Woodward > www.flax.co.uk > > > On 22 Oct 2015, at 13:43, Sheng wrote: > > > Tha

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
till private - and that's good. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Sheng [mailto:sheng...@gmail.com] > > Sent: Wednesday,

ConjunctionScorer access

2015-10-21 Thread Sheng
It's a bummer Lucene makes the constructor of ConjunctionScorer non-public. I wanted to extend from this class in order to tweak its behavior for my use case. Is it possible to change it to protected in future releases ?

similarity per query

2015-10-08 Thread Sheng
Let's say I have a boolean query "a AND b", is it possible I run the search for this boolean query with similarity "Sa" set for query "a", and similarity "Sb" set for query "b" ?

Facet label index exception

2015-07-30 Thread Sheng
this is the first time I come across error like this, label already exists: Facet label: ..., prev ordinal: ... It shows error happened at line 131 of CompactLabelToOrdinal.java Any idea for what could go wrong? I am using Lucene 4.10.2 Thanks!

Re: drilldown query with null base query

2015-07-24 Thread Sheng
. On Fri, Jul 24, 2015 at 2:06 PM, Sheng wrote: > Just found out more, drill down query will MatchAllDocsQuery as base query > will work if only one path is added, and starts to return empty results if > more than 1 path are added. This is very strange... > > > > On Fri, Jul

Re: drilldown query with null base query

2015-07-24 Thread Sheng
Just found out more, drill down query will MatchAllDocsQuery as base query will work if only one path is added, and starts to return empty results if more than 1 path are added. This is very strange... On Fri, Jul 24, 2015 at 12:12 PM, Sheng wrote: > This is what I am going to achi

drilldown query with null base query

2015-07-24 Thread Sheng
This is what I am going to achieve - running a drill down query with baseQuery = null / MatchAllDocsQuery(), and expecting the index returning all the documents that matches the drill down path(s). So it returns nothing back to me, however as long as I make the basequery to search a specific term

Re: Using lucene queries to search StringFields

2015-06-19 Thread Sheng
1. What is the analyzer are you using for indexing ? 2. you cannot fuzzy match field name - that for sure will throw exception 3. I would start from a simple, deterministic query object to rule out all unlikely possibilities first before resorting to parser to generate that for you. On Fri, Jun 1

Re: Exception while updating a lucene document

2015-04-25 Thread Sheng
seems like you forgot to do facetsConfig.setMultiValued(`field`, true) too . On Sat, Apr 25, 2015 at 7:37 AM, Gimantha Bandara wrote: > Hi, > > I was able to fix the problem.. the issue was with my wrong usage of > FacetConfig class. I was creating Document using facetConfig.build per each > fac

Customscorequery and payload

2015-02-11 Thread Sheng
the document level during search. I am using latest 4.10.x Lucene. Thanks, Sheng

Re: IndexSearcher creation policy question

2014-08-22 Thread Sheng
Your best bet is to use a searcher manager to manage the searcher instance, and only refresh the manager if writes are committed. This way the same searcher instances can be shared by multiple threads. For the paging, if you want to have a guaranteed consistent view, you have to keep around the se

Re: WhiteSpaceTokenizer

2014-08-15 Thread Sheng
ra/browse/SOLR-4148 > > I actually filed a Jira for this already. No action so far, but PLEASE > feel free to comment on it: > https://issues.apache.org/jira/browse/LUCENE-5785 > > -- Jack Krupansky > > -Original Message- From: Sheng > Sent: Thursday, August 14, 2014

WhiteSpaceTokenizer

2014-08-14 Thread Sheng
The length of token has to be shorter than 255, otherwise there will be unpredictable behaviors for this tokenizer. I see 255 is set as a private final in the src code, but there is no documentation to explicitly address that. Can we either make that number configurable (if not an option, I'd like

Re: Lucene newbie in need of a hint

2014-08-14 Thread Sheng
At a side note, there is a race condition in your code: what if a search on the old reader is in progress while you call reader.close()? You need to call reader incref (should be tryincref, as you need to consider what if the reader is closed at the moment you call incref on it) and decref wheneve

Re: Questions for facets search

2014-08-13 Thread Sheng
like a map is quite similar to how we store the payload :) We use an integer as payload for each token, and store more complicated information in another Lucene index with the integer payload as the key for each document. Sheng On Wednesday, August 13, 2014, Shai Erera wrote: > Sheng, &g

Questions for facets search

2014-08-12 Thread Sheng
whole lucene cache, since they are separated? We have a dynamic list of faceted fields, being able to quickly rebuild the whole facet lucene cache would be quite desirable. Again, I am using lucene 4.7, thanks in advance to your answers! Sheng

Problem of calling indexWriterConfig.clone()

2014-08-12 Thread Sheng
sion.LUCENE_47, null); > // set whatever you need on this instance > . > > IndexWriter writer = new IndexWriter(directory, masterCfg.clone()); > > Wouldn't this just work? If not, could you paste the stack trace of the > exception you're get

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Sheng
ctor of IndexWriter. > > > On Mon, Aug 11, 2014 at 7:12 PM, Sheng wrote: > > > So the indexWriterConfig.clone() failed at this step: > > clone.indexerThreadPool = indexerThreadPool > > < > > > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.

Re: Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Sheng
, the corresponding indexWriterConfig object cannot be called with .clone() at all? On Mon, Aug 11, 2014 at 9:52 PM, Vitaly Funstein wrote: > Looks like you have to clone it prior to using with any IndexWriter > instances. > > > On Mon, Aug 11, 2014 at 2:49 PM, Sheng wrot

Problem of calling indexWriterConfig.clone()

2014-08-11 Thread Sheng
I tried to create a clone of indexwriteconfig with "indexWriterConfig.clone()" for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: "clone this object before it is used". Why does this exception happen, and how can I get around it? Thanks!