date:20161118

Re: Exclusion List for standard tokenizer

2016-11-18 Thread lukes

Actually ClassicTokenizer seems to do the job. Any side effects of using ClassicTokenizer rather than StandardTokenizer ? Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Exclusion-List-for-standard-tokenizer-tp4306511p4306516.html Sent from the Lucene - Java Users

Exclusion List for standard tokenizer

2016-11-18 Thread lukes

Hi, Is there any exclusion list of characters which can be defined for StandardTokenizer ? In my case, i want to use StandardTokenizer(as it solves many problems of when to tokenization across languages) but i don't want to tokenize the stream on certain characters for example '@'. Is there a wa

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless

So when a query arrives, you know the query is only allowed to match either module:1 (analyzed terms) or module:2 (not analyzed) but never both? If so, you should be fine. Though relevance will be sort of wonky, in case that matters, because you are polluting the unique term space; you would get

Re: enhancement for SynonymFilter

2016-11-18 Thread Michael McCandless

Hmm I didn't realize there was that change in behavior between versions. But, in 6.3.0, can't you look for a token of type SYNONYM whose posInc=0 and then know that the previous (posInc>0) token had caused that synonym? You just need a bit of caching, until all synonyms for a given token have bee

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-18 Thread Michael McCandless

I think you've summed up exactly the differences! And, yes, it would be possible to emulate hierarchical facets on top of flat facets, if the hierarchy is fixed depth like year/month/day. But if it's variable depth, it's trickier (but I think still possible). See e.g. the Committed Paths drill-d

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Michael McCandless

You can do this, Lucene will let you, but it's typically a bad idea for search relevance because some documents will return only if you search for precisely the same whole token, others if you search for an analyzed token, giving the user a broken experience. Mike McCandless http://blog.mikemcca

Re: Multi-field IDF

2016-11-18 Thread Will Martin

In this work, we aim to improve the field weighting for structured doc- ument retrieval. We first introduce the notion of field relevance as the generalization of field weights, and discuss how it can be estimated using relevant documents, which effectively implements relevance feedback for f

Re: indexing analyzed and not_analyzed values in same field

2016-11-18 Thread Kumaran Ramasubramanian

Hi All, Can anyone say, is it advisable to have index with both analyzed and not_analyzed values in one field? Use case: i have custom fields in my product which can be configured differently ( ANALYZED and NOT_ANALYZED ) in different modules -- Kumaran R On Wed, Oct 26, 2016 at 12:0

Re: Multi-field IDF

2016-11-18 Thread Ahmet Arslan

Hi Nicholas, Aha, I see that you are into field-based scoring, which is an unsolved problem. Then, you might find BlendedTermQuery and SynonymQuery relevant. Ahmet On Friday, November 18, 2016 12:22 AM, Nicolás Lichtmaier wrote: That depends on what you want. In this case I want to use a

Re: enhancement for SynonymFilter

2016-11-18 Thread Bernd Fehling

Am 18.11.2016 um 08:58 schrieb Bernd Fehling: > Hi Mike, > > let me explain. > > First, after looking deeper inside I noticed that the Filters are used > like a stack and called backwards. So the first incrementToken goes > to the last filter in the chain. That one also uses incrementToken and

Re: enhancement for SynonymFilter

2016-11-18 Thread Bernd Fehling

Hi Mike, let me explain. First, after looking deeper inside I noticed that the Filters are used like a stack and called backwards. So the first incrementToken goes to the last filter in the chain. That one also uses incrementToken and and calls its predecessor in the chain and so on. So everythin

Re: Exclusion List for standard tokenizer

Exclusion List for standard tokenizer

Re: indexing analyzed and not_analyzed values in same field

Re: enhancement for SynonymFilter

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

Re: indexing analyzed and not_analyzed values in same field

Re: Multi-field IDF

Re: indexing analyzed and not_analyzed values in same field

Re: Multi-field IDF

Re: enhancement for SynonymFilter

Re: enhancement for SynonymFilter

11 matches

Site Navigation

Mail list logo

Footer information