Re: Favoring Terms Occurring in Close Proximity

2016-06-27 Thread Daniel Bigham
climates OR "temperate climates"~5^100 > ahmet > On Friday, June 24, 2016 5:07 PM, Daniel Bigham wrote: > Something significant that I've noticed about using the default Lucene > query parser is that if your user enters a query like: > "temperate climates&quo

Favoring Terms Occurring in Close Proximity

2016-06-24 Thread Daniel Bigham
Something significant that I've noticed about using the default Lucene query parser is that if your user enters a query like: "temperate climates" ... it will get turned into an OR query: temperate OR climates This means that a document that contains the literal substring "temperate climates

Re: analyzers-common VS analyzers-icu

2016-06-08 Thread Daniel Bigham
affect > your > analysis chain (there may be surprises with stemmers or stop lists that were > designed without it), but, generally, that's a really important filter. > I haven't looked deeply into the diffs between the StandardTokenizer and the > ICUTokenizer and c

analyzers-common VS analyzers-icu

2016-06-01 Thread Daniel Bigham
Hi, I recently setup my code to choose the appropriate analyzer from analyzers-common depending on the language of the user's index/field. I then extended the existing source code to allow, for any language, to turn on/off things like stemming, case sensitivity, etc. Today I discovered ana

Re: Boosting Documents

2016-05-27 Thread Daniel Bigham
Found the answer here: https://lucene.apache.org/core/4_1_0/MIGRATE.html - On May 27, 2016, at 12:36 PM, danielb wrote: > I've noticed that the Document.setBoost method appears to have been > removed at some point. > What should be used now to boost a document? > ---

Boosting Documents

2016-05-27 Thread Daniel Bigham
I've noticed that the Document.setBoost method appears to have been removed at some point. What should be used now to boost a document? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands,

Re: Multiple values for a field: Stable order?

2016-05-26 Thread Daniel Bigham
o assume). > Le mer. 25 mai 2016 à 22:00, Daniel Bigham a écrit : > > I've recently become aware that Lucene allows duplicate field names, > > which essentially allows multiple values to be associated with a field. > > A follow-up question is whether the order of the val

Multiple values for a field: Stable order?

2016-05-25 Thread Daniel Bigham
I've recently become aware that Lucene allows duplicate field names, which essentially allows multiple values to be associated with a field. A follow-up question is whether the order of the values is maintained... if I store the values "A", "B", and then "C" in a given field for a document, an

Re: Synonym Query Expansion / Gaps / UnsupportedOperationException wrt SpanNearQuery

2016-05-16 Thread Daniel Bigham
www.flax.co.uk > On 13 May 2016, at 22:33, Daniel Bigham wrote: >> I am experimenting with supporting synonyms on the query side by doing query > > expansion. >> For example, the query "open webpage" can be expanded if the following things > > are synonyms:

Synonym Query Expansion / Gaps / UnsupportedOperationException wrt SpanNearQuery

2016-05-13 Thread Daniel Bigham
I am experimenting with supporting synonyms on the query side by doing query expansion. For example, the query "open webpage" can be expanded if the following things are synonyms: "open" | "go to" This becomes the following: (I'm using both the stop word filter and the stemming filter) sp

Re: SpanNearQuery, Multiple Fields

2016-05-12 Thread Daniel Bigham
your reply. - On May 12, 2016, at 6:48 PM, Alan Woodward wrote: > Try adding your multiple SpanNearQuery objects to a BooleanQuery? > Alan Woodward > www.flax.co.uk > On 12 May 2016, at 20:35, Daniel Bigham wrote: >> I'm very interested in SpanNearQuery, beca

SpanNearQuery With Inner Phrases

2016-05-12 Thread Daniel Bigham
When constructing boolean queries, the "parts" can themselves be phrases, and can be parsed as follows: QueryBuilder(analyzer).createPhraseQuery(fieldName, phrase) The above call is handy in that, even if the part is a single word, it will get tokenized and turned into the appropriate term.

SpanNearQuery, Multiple Fields

2016-05-12 Thread Daniel Bigham
I'm very interested in SpanNearQuery, because it allows for quite powerful phrasal searching. However, unlike BooleanQuery, there doesn't seem to be any way to have it search multiple fields. I thought I might be able to wrap multiple SpanNearQueries, each of them searching a different field

Re: StopFilterFactory with french_stop.txt

2016-05-05 Thread Daniel Bigham
For the time being I seem to be able to do this by using a custom TokenFilterFactory class as follows. If there is a better approach, or if this approach seems flawed, let me know. Thanks. package com.wolfram.textsearch; import java.io.IOException; import java.io.Reader; import java.nio.

StopFilterFactory with french_stop.txt

2016-05-05 Thread Daniel Bigham
I'd like to use CustomAnalyzer to create an analyzer that is much like the FrenchAnalyzer. In doing that, I'm using StopFilterFactory. But I'm unsure how to point it to use "french_stop.txt". ie. What FrenchAnalyzer is using here: public final class FrenchAnalyzer extends StopwordAnalyzerBas

Query Expansion for Synonyms

2016-04-28 Thread Daniel Bigham
I'm investigating various ways of supporting synonyms in Lucene. One such approach that looks potentially interesting is to do a kind of "query expansion". For example, if the user searches for "us 1888", one might expand the query as follows: SpanNearQuery query = new SpanNearQuery