Just so you know, the presence of a wildcard in a term means that the term
will not be analyzed. So, state:OR* should fail since "OR" will not be in
the index - because it would index as "or" (lowercase). Hmmm... why does
"or" seem familiar...?
Ah yeah... right!... The standard analyzer in
There are several options:
As Allison suggested, pad your words with ##, so that "quota tom" becomes
"##quota## ##tom##" at indexing time, and the query "quota to" becomes
either "##quota ##to", or if you want to optimize, only pad query terms < 3
characters, so it becomes "quota ##to". That shoul
When I make my query, everything goes well until I add the last part :
(city:or* OR state:or*).
I tried the first solution that was given to me but putting \OR and \AND
doesn't seem to be the solution. The query is actually well built, he has no
problem with OR or \OR to parse the query since the q
Break your query down into simpler pieces for testing. What pieces seem to
have what problems? Be specific about the symptom, and how you "know" that
something is wrong.
You wrote:
stored,indexed,tokenized,omitNorms>.
But... the standard analyzer would have lowercased that term. Did it, or are
This seems relevant. Though admittedly I haven't tried it
http://stackoverflow.com/questions/10337908/how-to-properly-escape-or-and-and-in-lucene-query
Sent from my Windows Phone From: ABlaise
Sent: 7/18/2013 9:52 PM
To: java-user@lucene.apache.org
Subject: Searching for words begining with "or"
Hi everyone,
I am new to this forum, I have made some research for my question but I
can't seem to find an answer for it.
I am using Lucene for a project and I know for sure that in my lucene index
I have somewhere this document with these elements :
Document
stored,indexed,tokenized,omitNorms
st
Thanks for the reply Tim. I really should have been clearer. Let's say I have
an object named "quota_tommy_1234". I'd like to match that object with any 3
character (or more) substring of that name. So for example:
quo
tom
234
quota
etc.
Further, at search time I'm splitting input on white
thanks !
On Thu, Jul 18, 2013 at 5:30 PM, Allison, Timothy B. wrote:
> Need to set outputUnigrams = false with something like:
>
> StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43,
> reader);
> TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, sour
Tommy,
I'm sure that I don't fully understand your use case and your data. Some
thoughts:
1) I assume that fuzzy term search (edit distance <= 2) isn't meeting your
needs or else you wouldn't have gone the ngram route. If fuzzy term search +
phrase/proximity search would meet your needs, se
Need to set outputUnigrams = false with something like:
StandardTokenizer source = new StandardTokenizer(Version.LUCENE_43,
reader);
TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source);
tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);
Hello,
For some time I have been trying to apply ShingleFilter. I have a string:
"The users get program in the User RPC API in Apache Rave"
and I would like to get:
[the users get] [users get program] [get program in] [program in
the] [in the user] [the user rpc] [user rpc api] [rpc api in] [a
Hi,
On Thu, Jul 18, 2013 at 7:15 AM, Sriram Sankar wrote:
> The approach we have discussed in an earlier thread uses:
>
> writer.addIndexes(new SortingAtomicReader(...));
>
> I want to confirm (this is not absolutely clear to me yet) that the above
> call will not create multiple segments - i.e.,
One of our main use-cases for search is to find objects based on partial name
matches. I've implemented this using n-grams and it works pretty well.
However we're currently using trigrams and that causes an interesting problem
when searching for things like "abc ab" since we first split on whi
Sorry, but you need to resend this message to the Solr user list - this is
the Lucene user list.
-- Jack Krupansky
-Original Message-
From: Beale, Jim (US-KOP)
Sent: Thursday, July 18, 2013 12:34 PM
To: java-user@lucene.apache.org
Subject: Indexing into SolrCloud
Hey folks,
I've bee
Hey folks,
I've been migrating an application which indexes about 15M documents from
straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3
zookeeper ensemble using HAProxy for load balancing. The documents are
processed on a quad core machine with 6 threads and indexed into
But for this one document, where you get only the first sentence back
from PH without "android" in it, does "android" in fact occur in that
field for that document?
Ie, it could be that document was returned because another field (e.g.
title) matched, but the body field you are highlighting on did
16 matches
Mail list logo