Hi Ahmet,
As per your suggestion I have posted the request with example on
Lucene-5205 jira ticket.
Thanks,
Modassar
On Wed, Mar 5, 2014 at 8:44 PM, Ahmet Arslan wrote:
> Hi Modassar,
>
> Can you post your request (with an example if possible) to lucene-5205
> jura ticket too? If you don't ha
Hi Modassar,
Can you post your request (with an example if possible) to lucene-5205 jura
ticket too? If you don't have an jira account, anyone can create one.
Thanks,
Ahmet
On Wednesday, March 5, 2014 9:40 AM, Modassar Ather
wrote:
Hi,
Phrases with stop words in them are not getting searc
SpanFirstQuery is the clean option. Another option is to add a "start
token" to each title. Then, search for "startToken oil spill". This
will be faster than SpanFirstQuery. But it also requires doing
something weird to the field.
Lance
On Thu, Jun 17, 2010 at 3:19 PM, Michael McCandless
wrote:
SpanFirstQuery?
Mike
On Thu, Jun 17, 2010 at 3:23 PM, rakesh rakesh wrote:
> Hi,
>
> I have thousands of article titles in lucene index. So for a query "Oil
> spill" I want to return all the article title starts with "Oil spill". I do
> not want those titles which has this phrase but do not star
, March 04, 2010 8:54 AM
To: java-user@lucene.apache.org
Subject: Re: Phrase search on NOT_ANALYZED content
I'm still struggling with your overall goal here, but...
It sounds like what you're looking for is an exact match
in some cases but not others? In which case you could
think about in
Message-
> From: java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
> [mailto:java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
> ] On Behalf Of Erick Erickson
> Sent: Wednesday, March 03, 2010 4:30 PM
> To: java-user@lucene.apache.org
> Subject
em.
Thanks,
Paul
-Original Message-
From: java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45278-paul.b.murdoch=saic@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Wednesday, March 03, 2010 4:30 PM
To: java-user@lucene.apache
NOT_ANALYZED is probably not what you want.
NOT_ANALYZED stores the entire input as
a *single* token, so you can never match on
anything except the entire input.
What did you hope to accomplish by indexint
NOT_ANALYZED? That's actually a pretty
specialized thing to do, perhaps there's a better
way
ssage-
From: java-user-return-45156-paul.b.murdoch=saic@lucene.apache.org
[mailto:java-user-return-45156-paul.b.murdoch=saic@lucene.apache.org] On
Behalf Of Murdoch, Paul
Sent: Wednesday, February 24, 2010 5:11 PM
To: java-user@lucene.apache.org
Subject: RE: Phrase Search and NOT_ANA
:01 PM
To: java-user@lucene.apache.org
Subject: RE: Phrase Search and NOT_ANALYZED
Thanks,
I've been looking at that one too. I'm trying to make it happen with the
StandardAnalyzer. Unfortunately, I think I see some redesign for more
robustness in the future.
Cheers,
Paul
---
apache.org
[mailto:java-user-return-45154-paul.b.murdoch=saic@lucene.apache.org] On
Behalf Of Robert Muir
Sent: Wednesday, February 24, 2010 4:55 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase Search and NOT_ANALYZED
check out KeywordAnalyzer!
On Wed, Feb 24, 2010 at 4:51 PM, Mur
ead.
>
> Thanks,
>
> Paul
>
>
> -Original Message-
> From: java-user-return-45149-paul.b.murdoch=saic@lucene.apache.org
> [mailto:java-user-return-45149-paul.b.murdoch=saic@lucene.apache.org
> ] On Behalf Of Erick Erickson
> Sent: Wednesday, February 24, 20
=saic@lucene.apache.org
] On Behalf Of Digy
Sent: Wednesday, February 24, 2010 4:45 PM
To: java-user@lucene.apache.org
Subject: RE: Phrase Search and NOT_ANALYZED
Since it is not analyzed, your text is stored as a single term in the
index
[something in the index].
But the query
name:"someth
aul.b.murdoch=saic@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Wednesday, February 24, 2010 4:23 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase Search and NOT_ANALYZED
What does Luke's explain show you? That'll show you a lot about how
the query gets transformed
Since it is not analyzed, your text is stored as a single term in the index
[something in the index].
But the query
name:"something in the index"
is translated as :
find 4 consecutive terms which have values "something", "in","the" and
"index" respectively.
or if stop words are removed
What does Luke's explain show you? That'll show you a lot about how
the query gets transformed..
My first guess is that stop words are messing you up
Erick
On Wed, Feb 24, 2010 at 3:51 PM, Murdoch, Paul wrote:
> Hi,
>
>
>
> I'm indexing a field using the StandardAnalyzer 2.9.
>
>
>
> fi
Hello,
You could use a PhraseQuery with the terms "cool" and "gaming" and
"computer" and set the slop factor you reckon is right. Then could assign a
boost to this query only, which will make it bubble up the list.
I don't think you can get away without specifying a slop factor though(like
in the
On Fri, Jun 5, 2009 at 21:31, Abhi wrote:
> Say I have indexed the following strings:
>
> 1. "cool gaming laptop"
> 2. "cool gaming lappy"
> 3. "gaming laptop cool"
>
> Now when I search with a query say "cool gaming computer", I want string 1
> and 2 to appear on top (where search terms are closer
You're going to want to change your TokenFilter so that it emits the split
pieces tokens immediately after the original token and with a
positionIncrement of "0" .. don't buffer then up and wait for the entire
stream to finish first.
it true order of the tokens in the tokenstream and the posit
Hi Grant,
Thanks for the advice.. It turns out it was all my own stupidity,, I
had commented out (for whatever reason) setPositionIncrement(0) on my
synonym analyzer..
Cheers,
Spence
On 8/23/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> I would suggest starting with: http://wiki.apache.org/l
I would suggest starting with: http://wiki.apache.org/lucene-java/
LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 especially
the part on Luke.
Luke will let you try out the various queries and show you what they
look like before being submitted.
Cheers,
Grant
On Aug 23, 2007, at
Ok.. thanks, I have tried to index address field as UN_TOKENIZED and search
using above query, its return Nothing, How can I specified " NOT tokenize"
in query..
--Thanks,
On 6/18/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Phrase queries won't help you here
Your particular issue can be
: Another good old trick is to index field values (tokenized) with
: appended special starting and ending tokens, e.g. instead of "Hiran
: Magri" use "_start_ Hiran Magri _end_". Then you can query for fields
: that are exactly equal to a phrase, while still retaining the
: possibility to search b
Erick Erickson wrote:
Phrase queries won't help you here
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening i
Phrase queries won't help you here
Your particular issue can be addressed, but I'm not sure it's a
reasonable long-term solution
If you indexed your address field as UN_TOKENIZED, and
did NOT tokenize your query, it should give you what you want.
What's happening is that StandardAnalyzer
: Sorry for the confusion and thanks for taking the time to educate me. So, if
: I am just indexing literal values, what is the best way to do that (what
: analyzer)? Sounds like this approach, even though it works, is not the
: preferred method.
if you truely want just the literal values then
D it will still work.
>
>
> do you have na example of something that *isn't* working the way you want?
> ... if not i don't see what your problem is, all your tests are passing :)
>
>
> : Date: Tue, 5 Sep 2006 14:06:13 -0700 (PDT)
> : From: Philip Brown <[EMAIL
rom: Philip Brown <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Re: Phrase search using quotes -- special Tokenizer
:
:
: Here's a little sample program (borrowed some code from Erick Erickson :)).
: Whether I add as TOKENIZED or UN_
Here's a little sample program (borrowed some code from Erick Erickson :)).
Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference in
the output. Is this what you'd expect?
- Philip
package com.test;
import java.io.IOException;
import java.util.HashSet;
import java.util.regex.
Some info to help you on you're journey :)
1. If you add a field as untokenized then it will not be analyzed when added
to the index. However, QueryParser will not know that this happened and will
tokenize queries on that field.
2. The solution that Hoss has explained to you is to leave the defa
: So, if I do as you suggest below (using PerFieldAnalyzerWrapper with
: StandardAnalyzer) then I still need to enclose in quotes the phrases
: (keywords with spaces) when I issue the search, and they are only returned
Yes, quotes will be neccessary to tell the QueryParser "this
is one chunk of t
So, if I do as you suggest below (using PerFieldAnalyzerWrapper with
StandardAnalyzer) then I still need to enclose in quotes the phrases
(keywords with spaces) when I issue the search, and they are only returned
in the results if the case is identical to how it was added? (This seems to
be what
: Yeah, they are more complex than the "exactish" match -- basically, there are
: more fields involved -- combined sometimes with AND and sometimes with OR,
: and sometimes negated field values, sometimes groupings, etc. These other
: field values are all single words (no spaces), and a search mi
More to consider:
perhaps there is some way to get what you want by overriding
getFieldQuery(String, String) instead. I have not been able to come up
with anything simple off the top of my head, but overriding
getFieldQuery would free you from having to make that line change on
every Lucene up
Yeah, they are more complex than the "exactish" match -- basically, there are
more fields involved -- combined sometimes with AND and sometimes with OR,
and sometimes negated field values, sometimes groupings, etc. These other
field values are all single words (no spaces), and a search might invo
Keeping in mind that Hoss's input is much more valuable than mine...
It sounds like you want what I originally tgave you. You want to be able
to perform complex queries with the QueryParser and you want '-' and '_'
to not break words, and you want quoted words to be tokenized as one
token with
: Thanks for your input. I'm sure I could do as you suggest (and maybe that
: will end up being my best option), but I had hoped to use a string for
: creating the query object, particularly as some of my queries are a bit
: complex.
you have to clarify what you mean by "use a string for creatin
Thanks for your input. I'm sure I could do as you suggest (and maybe that
will end up being my best option), but I had hoped to use a string for
creating the query object, particularly as some of my queries are a bit
complex.
Thanks.
Chris Hostetter wrote:
>
>
> I haven't really been followi
Yeah, what he said
On 9/3/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
I haven't really been following this thread, but it's gotten so long
i got interested.
from whta i can tell skimming the discussion so far, it seems like the
biggest confusion is about the definition of a "phrase" a
I haven't really been following this thread, but it's gotten so long
i got interested.
from whta i can tell skimming the discussion so far, it seems like the
biggest confusion is about the definition of a "phrase" and what analyzers
do with "quote" characters and what the QueryParser does with "q
Just as you, I would PREFER not to change any of the base Lucene code -- and
I imagine there is still some way to do what I want (possibly by extending
some other existing class) with what is already available.
Regarding point 0) -- You are right in that if I add "test phrase" to index
as UN_TO
Disclaimer: Of course I'm not as familiar with your problem space as you
are, so I may be way out in left field, but...
I *still* think you're making waay too much work for yourself and need
to examine your assumptions.
0> But when you index something UN_TOKENIZED as in your example, I don't
I tend to agree with Mark. I tried a query as so...
TermQuery query = new TermQuery(new Term("keywordField", "phrase test"));
IndexSearcher searcher= new IndexSearcher(activeIdx);
Hits hits = searcher.search(query);
And this produced the expected results. Whe
I think if he wants to use the queryparser to parse his search strings
that he has no choice but to modify it. It will eat any pair of quotes
going through it no matter what analyzer is used.
- Mark
Well, you're flying blind. Is the behavior rooted in the indexing or
querying? Since you can't
Well, you're flying blind. Is the behavior rooted in the indexing or
querying? Since you can't answer that, you're reduced to trying random
things hoping that one of them works. A little like voodoo. I've wasted
farr too much time trying to solve what I was *sure* was the problem
only to f
Well Philip...bad news. I should have thought of this before...I think
the query parser is the problem. You are tokening "all in the quotes" to
one token...but when QueryParser sees that, it doesnt matter what
analyzer you use, it's going to see the quotes and strip them right off
. Then it pas
I am out of ideas. If I'm feeling perky I'll build you one in the morning.
No, I've never used Luke. Is there an easy way to examine my RAMDirectory
index? I can create the index with no quoted keywords, and when I search
for a keyword, I get back the expected results (just can't search for a
p
No, I've never used Luke. Is there an easy way to examine my RAMDirectory
index? I can create the index with no quoted keywords, and when I search
for a keyword, I get back the expected results (just can't search for a
phrase that has whitespace in it). If I create the index with phrases in
quo
OK, I've gotta ask. Have you examined your index with Luke to see if what
you *think* is in the index actually *is*???
Erick
On 9/1/06, Philip Brown <[EMAIL PROTECTED]> wrote:
Interesting...just ran a test where I put double quotes around everything
(including single keywords) of source text
Interesting...just ran a test where I put double quotes around everything
(including single keywords) of source text and then ran searches for a known
keyword with and without double quotes -- doesn't find either time.
Mark Miller-5 wrote:
>
> Sorry to hear you're having trouble. You indeed nee
Added the to the other section and reran the javacc and imported the
new files...but, I still get the same result -- no results. (Quotes are in
the source text and query string.) Anything else I might be missing?
Philip
Mark Miller-5 wrote:
>
> Sorry to hear you're having trouble. You indee
Sorry to hear you're having trouble. You indeed need the double quotes in
the source text. You will also need them in the query string. Make sure they
are in both places. My machine is hosed right now or I would do it for you
real quick. My guess is that I forgot to mention...no only do you need t
Well, I tried that, and it doesn't seem to work still. I would be happy to
zip up the new files, so you can see what I'm using -- maybe you can get it
to work. The first time, I tried building the documents without quotes
surrounding each phrase. Then, I retried by enclosing every phrase within
That is a good point. I was just thinking that it would be a pain for
searchers to have to include the quotes when searching, but I guess
there is little way around it. The best you could do is have an option
that specified a quoted search...and you might as well make that option
be to put the
Thanks, but I don't "think" I need that. But curious, how will it know it's
a phrase if it's not enclosed in quotes? Won't all its terms be treated
separately then?
Philip
Mark Miller-5 wrote:
>
> One more tip...if you would like to be able to search phrases without
> putting in the quotes
One more tip...if you would like to be able to search phrases without
putting in the quotes you must strip them with the analyzer. In
standardfilter (in the standard analyzer code) add this:
private static final String QUOTED_TYPE = tokenImage[QUOTED];
- youll see where to put that
and youll s
So this will recognize anything in quotes as a single token and '_' and
'-' will not break up words. There may be some repercussions for the NUM
token but nothing I'd worry about. maybe you want to use Unicode for '-'
and '_' as well...I wouldn't worry about it myself.
- Mark
TOKEN : {
Philip Brown wrote:
Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm
not seeing StandardAnalyzer.jj in the Lucene source download.
Mark Miller-5 wrote:
Philip Brow
Do you mean StandardTokenizer.jj (org.apache.lucene.analysis.standard)? I'm
not seeing StandardAnalyzer.jj in the Lucene source download.
Mark Miller-5 wrote:
>
> Philip Brown wrote:
>> Hi,
>>
>
Philip Brown wrote:
Hi,
After running some tests using the StandardAnalyzer, and getting 0 results
from the search, I believe I need a special Tokenizer/Analyzer. Does
anybody have something that parses like the following:
- doesn't parse apart phrases (in quotes)
- doesn't parse/separate hyph
60 matches
Mail list logo