Re: Searching for words begining with "or"

2013-07-18 Thread Jack Krupansky
Just so you know, the presence of a wildcard in a term means that the term will not be analyzed. So, state:OR* should fail since "OR" will not be in the index - because it would index as "or" (lowercase). Hmmm... why does "or" seem familiar...? Ah yeah... right!... The standard analyzer in

Re: Partial word match using n-grams

2013-07-18 Thread Shai Erera
There are several options: As Allison suggested, pad your words with ##, so that "quota tom" becomes "##quota## ##tom##" at indexing time, and the query "quota to" becomes either "##quota ##to", or if you want to optimize, only pad query terms < 3 characters, so it becomes "quota ##to". That shoul

Re: Searching for words begining with "or"

2013-07-18 Thread ABlaise
When I make my query, everything goes well until I add the last part : (city:or* OR state:or*). I tried the first solution that was given to me but putting \OR and \AND doesn't seem to be the solution. The query is actually well built, he has no problem with OR or \OR to parse the query since the q

Re: Searching for words begining with "or"

2013-07-18 Thread Jack Krupansky
Break your query down into simpler pieces for testing. What pieces seem to have what problems? Be specific about the symptom, and how you "know" that something is wrong. You wrote: stored,indexed,tokenized,omitNorms>. But... the standard analyzer would have lowercased that term. Did it, or are

RE: Searching for words begining with "or"

2013-07-18 Thread Doug Turnbull
This seems relevant. Though admittedly I haven't tried it http://stackoverflow.com/questions/10337908/how-to-properly-escape-or-and-and-in-lucene-query Sent from my Windows Phone From: ABlaise Sent: 7/18/2013 9:52 PM To: java-user@lucene.apache.org Subject: Searching for words begining with "or"

Searching for words begining with "or"

2013-07-18 Thread ABlaise
Hi everyone, I am new to this forum, I have made some research for my question but I can't seem to find an answer for it. I am using Lucene for a project and I know for sure that in my lucene index I have somewhere this document with these elements : Document stored,indexed,tokenized,omitNorms st

RE: Partial word match using n-grams

2013-07-18 Thread Becker, Thomas
Thanks for the reply Tim. I really should have been clearer. Let's say I have an object named "quota_tommy_1234". I'd like to match that object with any 3 character (or more) substring of that name. So for example: quo tom 234 quota etc. Further, at search time I'm splitting input on white

Re: ShingleFilter

2013-07-18 Thread Malgorzata Urbanska
thanks ! On Thu, Jul 18, 2013 at 5:30 PM, Allison, Timothy B. wrote: > Need to set outputUnigrams = false with something like: > > StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43, > reader); > TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, sour

RE: Partial word match using n-grams

2013-07-18 Thread Allison, Timothy B.
Tommy, I'm sure that I don't fully understand your use case and your data. Some thoughts: 1) I assume that fuzzy term search (edit distance <= 2) isn't meeting your needs or else you wouldn't have gone the ngram route. If fuzzy term search + phrase/proximity search would meet your needs, se

RE: ShingleFilter

2013-07-18 Thread Allison, Timothy B.
Need to set outputUnigrams = false with something like: StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43, reader); TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source); tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);

ShingleFilter

2013-07-18 Thread Malgorzata Urbanska
Hello, For some time I have been trying to apply ShingleFilter. I have a string: "The users get program in the User RPC API in Apache Rave" and I would like to get: [the users get] [users get program] [get program in] [program in the] [in the user] [the user rpc] [user rpc api] [rpc api in] [a

Re: Another question on sorting documents

2013-07-18 Thread Adrien Grand
Hi, On Thu, Jul 18, 2013 at 7:15 AM, Sriram Sankar wrote: > The approach we have discussed in an earlier thread uses: > > writer.addIndexes(new SortingAtomicReader(...)); > > I want to confirm (this is not absolutely clear to me yet) that the above > call will not create multiple segments - i.e.,

Partial word match using n-grams

2013-07-18 Thread Becker, Thomas
One of our main use-cases for search is to find objects based on partial name matches. I've implemented this using n-grams and it works pretty well. However we're currently using trigrams and that causes an interesting problem when searching for things like "abc ab" since we first split on whi

Re: Indexing into SolrCloud

2013-07-18 Thread Jack Krupansky
Sorry, but you need to resend this message to the Solr user list - this is the Lucene user list. -- Jack Krupansky -Original Message- From: Beale, Jim (US-KOP) Sent: Thursday, July 18, 2013 12:34 PM To: java-user@lucene.apache.org Subject: Indexing into SolrCloud Hey folks, I've bee

Indexing into SolrCloud

2013-07-18 Thread Beale, Jim (US-KOP)
Hey folks, I've been migrating an application which indexes about 15M documents from straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 zookeeper ensemble using HAProxy for load balancing. The documents are processed on a quad core machine with 6 threads and indexed into

Re: PostingsHighlighter to highlight the first Match ion the document

2013-07-18 Thread Michael McCandless
But for this one document, where you get only the first sentence back from PH without "android" in it, does "android" in fact occur in that field for that document? Ie, it could be that document was returned because another field (e.g. title) matched, but the body field you are highlighting on did