WhiteSpaceTokenizer breaks at spaces, tabs & newlines. This will leave
Cla$$War as one word. If you want Cla$$War to become one word, use a
CharFilter to filter out all $.
Otherwise, Lucene has debug features to show you exactly how these are
broken up. The easiest way to explore them is to instal
I appreciate your input. However, my question is which analyzer and tokenizer
to choose.
-- Original --
From: "Uwe Schindler";
Date: Wed, Aug 15, 2012 00:52 AM
To: "java-user";
Subject: RE: Re:RE: Does the string "Cla$$War" affect Lucene?
Please read my
We have recently moved to 3.6 from lucene 2.2 and have seen that the way
tokens get indexed are not the same.
Although we are open to reindexing the data which was initially indexed
with 2.2, I would like to know if there is a way I can avoid indexing?
I am using IndexUpgrader tool to update the
Please read my answer posted before, it explains exactly what happens - so
you can imagine what type of search input produces this. If you want to
change the behavior rethink your tokenization.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
Another phrase "$FREE.99" causes the same problem.
What are the ultimate solutions? How many cases would cause this problem?
Thanks
-- Original --
From: "dyzc2010 "<1393975...@qq.com>;
Date: Tue, Aug 14, 2012 11:27 PM
To: "java-user";
Subject: Re: Re:R
I know the reason of no hits.
Without configuring autoGeneratePhraseQueries, a term like "I love you" is
split into "I", "love", and "you", therefore getting quite a lot hits.
On the contrary, the term is not split, and no hits.
-- Original --
From: "Jack K
I should have made it more clear.
When I said no hits, I referred to no hits by other ordinary term such as "Gone
with Wind".
I do analyze the query. When "True" is on for autoGeneratePhraseQueries, the
term is parsed as "cla war" with a space sit between.
When "False", it becomes two phras
Try enclosing "Cla$$War" in quotes, which should have the same effect as
turning on auto-phrase query generation.
qp.parse("\"Cla$$War\"")
(You only need to use "escape" for characters which are query syntax
characters.)
And do a q.toString to see how the term was analyzed.
I'm surprised th
Sound like some other analyzer can do the trick?
Anyway, I don't want a slower lucene, and I want to treat "Cla$$War" as a whole
word.
What is the solution left?
Thanks.
-- Original --
From: "Uwe Schindler";
Date: Tue, Aug 14, 2012 04:56 PM
To: "java-us
No results are returned after I add qp.setAutoGeneratePhraseQueries(true).
After I remove qp.setAutoGeneratePhraseQueries(true), I can get results back.
Anything should be taken care of before/after adding the line of code?
-- Original --
From: "Jack Krupan
No results are returned after I add qp.setAutoGeneratePhraseQueries(true).
After I remove qp.setAutoGeneratePhraseQueries(true), I can get results back.
Anything should be taken care of before/after adding the line of code?
-- Original --
From: "Jack Krupans
Hi to all,
In pruning package, for pruneAllPositions(TermPositions termPositions, Term
t) methos it is said that :
"termPositions - positioned term positions. Implementations MUST NOT advance
this by calling TermPositions methods that advance either the position
pointer (next, skipTo) or term poi
Uwe, if I look at the TestSpanMultiTermQueryWrapper.testPrefix test in 3.6,
it doesn't rewrite the SMTQW, but works. What's the difference? Is that test
wrong/broken?
public void testPrefix() throws Exception {
WildcardQuery wq = new WildcardQuery(new Term("field", "extrem*"));
SpanQuery swq
Add qp.setAutoGeneratePhraseQueries = true before calling qp.parse.
Otherwise, the query (clause of the larger BooleanQuery) will be the same as
"cla" OR "war", which will match all "war" documents, plus any "cla"
documents you may have.
-- Jack Krupansky
-Original Message-
From: zh
BooleanQuery bq;
QueryParser qp;
qp = new QueryParser(version, "title", analyzer);
bq.add(qp.parse(QueryParser.escape("Cla$$War")), Occur.valueOf("MUST"));
version = Version.LUCENE_35;
analyzer = new LimitTokenCountAnalyzer(new StandardAnalyzer(
Version.LUCENE_35,
Please send such inquiries to the Solr user email list, not the Lucene user
list.
-- Jack Krupansky
-Original Message-
From: Ralf Heyde
Sent: Tuesday, August 14, 2012 7:45 AM
To: java-user@lucene.apache.org
Subject: Solr adding Documents / Commit in different Threads
Hello,
we curre
Try sending only 100 or so (or maybe even only 20) of the documents at a
time, and only send commit with the last batch.
Sometimes network-related components along the way have trouble dealing with
very large requests unless carefully configured. In other words, maybe you
can find a way to mak
Hi,
This stack trace has nothing to do with Solr. The problem here is that while
Solr tries to respond, the connection to the client was already closed. This
might happen if timeouts in your tomcat configuration or reverse proxy in front
of your tomcat servers are too low. Can you try this with
Hello,
we currently facing a problem which may lost updates for some documents during
adding / comitting.
The infrastructure: we have a main solr, which gets documents and distribute
them to a lot of slaves.
The situation: we have a Job, which runs scheduled every minute (no run, if a
prev
14 August 2012, Apache Luceneā 4.0-beta available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.0-beta
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires
Yes, cast is safe.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Carsten Schnober [mailto:schno...@ids-mannheim.de]
> Sent: Tuesday, August 14, 2012 11:07 AM
> To: java-user@lucene.apache.org
> Subject
Am 14.08.2012 11:00, schrieb Uwe Schindler:
> You have to rewrite the wrapper query.
Thanks, Uwe! I had tried that way but it failed because the rewrite()
method would return a Query (not a SpanQuery) object. A cast seems to
solve the problem, I'm re-posting the code snippet to the list for the
sa
You have to rewrite the wrapper query.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Carsten Schnober [mailto:schno...@ids-mannheim.de]
> Sent: Tuesday, August 14, 2012 10:59 AM
> To: java-user
> Subje
Dear list,
I am trying to combine a WildcardQuery and a SpanQuery because I need to
extract spans from the index for further processing. I realise that
there have been a few public discussions about this topic around, but I
still fail to get what I am missing here. My code is this (Lucene 3.6.0):
Hi,
If you are using StandardAnalyzer, then "Cla$$War" is split at the $ signs,
so it searches for two tokens, "cla" and "war". If autogenerate phrase
queries is enabled for QueryParser, it will then create a phrase query "cla
war" out of it, which is slower because positions are involved. If
auto
Sounds extremely unlikely. What is the query? What analyzer? What
version of lucene? What about other strings containing $$?
--
Ian.
On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008 wrote:
> Hi,
>
>
> I have a big index, and when I searched it with a title string "Cla$$War",
> Lucene became
Hi,
I have a big index, and when I searched it with a title string "Cla$$War",
Lucene became very slow. It doesn't happen when I searched with other title
string such as "Gone with Wind". Does the "$$" affect the search performance?
Thanks,
Cheng
27 matches
Mail list logo