Re: RE: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Lance Norskog
WhiteSpaceTokenizer breaks at spaces, tabs & newlines. This will leave Cla$$War as one word. If you want Cla$$War to become one word, use a CharFilter to filter out all $. Otherwise, Lucene has debug features to show you exactly how these are broken up. The easiest way to explore them is to instal

Re:RE: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread zhoucheng2008
I appreciate your input. However, my question is which analyzer and tokenizer to choose. -- Original -- From: "Uwe Schindler"; Date: Wed, Aug 15, 2012 00:52 AM To: "java-user"; Subject: RE: Re:RE: Does the string "Cla$$War" affect Lucene? Please read my

IndexUpgrader

2012-08-14 Thread sunil Kumar Verma
We have recently moved to 3.6 from lucene 2.2 and have seen that the way tokens get indexed are not the same. Although we are open to reindexing the data which was initially indexed with 2.2, I would like to know if there is a way I can avoid indexing? I am using IndexUpgrader tool to update the

RE: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Uwe Schindler
Please read my answer posted before, it explains exactly what happens - so you can imagine what type of search input produces this. If you want to change the behavior rethink your tokenization. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de

Re: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread zhoucheng2008
Another phrase "$FREE.99" causes the same problem. What are the ultimate solutions? How many cases would cause this problem? Thanks -- Original -- From: "dyzc2010 "<1393975...@qq.com>; Date: Tue, Aug 14, 2012 11:27 PM To: "java-user"; Subject: Re: Re:R

Re: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread dyzc
I know the reason of no hits. Without configuring autoGeneratePhraseQueries, a term like "I love you" is split into "I", "love", and "you", therefore getting quite a lot hits. On the contrary, the term is not split, and no hits. -- Original -- From: "Jack K

Re: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread dyzc
I should have made it more clear. When I said no hits, I referred to no hits by other ordinary term such as "Gone with Wind". I do analyze the query. When "True" is on for autoGeneratePhraseQueries, the term is parsed as "cla war" with a space sit between. When "False", it becomes two phras

Re: Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Jack Krupansky
Try enclosing "Cla$$War" in quotes, which should have the same effect as turning on auto-phrase query generation. qp.parse("\"Cla$$War\"") (You only need to use "escape" for characters which are query syntax characters.) And do a q.toString to see how the term was analyzed. I'm surprised th

Re:RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread zhoucheng2008
Sound like some other analyzer can do the trick? Anyway, I don't want a slower lucene, and I want to treat "Cla$$War" as a whole word. What is the solution left? Thanks. -- Original -- From: "Uwe Schindler"; Date: Tue, Aug 14, 2012 04:56 PM To: "java-us

Re: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread zhoucheng2008
No results are returned after I add qp.setAutoGeneratePhraseQueries(true). After I remove qp.setAutoGeneratePhraseQueries(true), I can get results back. Anything should be taken care of before/after adding the line of code? -- Original -- From: "Jack Krupan

Re: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread dyzc
No results are returned after I add qp.setAutoGeneratePhraseQueries(true). After I remove qp.setAutoGeneratePhraseQueries(true), I can get results back. Anything should be taken care of before/after adding the line of code? -- Original -- From: "Jack Krupans

pruning package- question about termpositions && skipTo

2012-08-14 Thread Zeynep P.
Hi to all, In pruning package, for pruneAllPositions(TermPositions termPositions, Term t) methos it is said that : "termPositions - positioned term positions. Implementations MUST NOT advance this by calling TermPositions methods that advance either the position pointer (next, skipTo) or term poi

Re: UnsupportedOperationException: Query should have been rewritten

2012-08-14 Thread Jack Krupansky
Uwe, if I look at the TestSpanMultiTermQueryWrapper.testPrefix test in 3.6, it doesn't rewrite the SMTQW, but works. What's the difference? Is that test wrong/broken? public void testPrefix() throws Exception { WildcardQuery wq = new WildcardQuery(new Term("field", "extrem*")); SpanQuery swq

Re: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Jack Krupansky
Add qp.setAutoGeneratePhraseQueries = true before calling qp.parse. Otherwise, the query (clause of the larger BooleanQuery) will be the same as "cla" OR "war", which will match all "war" documents, plus any "cla" documents you may have. -- Jack Krupansky -Original Message- From: zh

Re: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread zhoucheng2008
BooleanQuery bq; QueryParser qp; qp = new QueryParser(version, "title", analyzer); bq.add(qp.parse(QueryParser.escape("Cla$$War")), Occur.valueOf("MUST")); version = Version.LUCENE_35; analyzer = new LimitTokenCountAnalyzer(new StandardAnalyzer( Version.LUCENE_35,

Re: Solr adding Documents / Commit in different Threads

2012-08-14 Thread Jack Krupansky
Please send such inquiries to the Solr user email list, not the Lucene user list. -- Jack Krupansky -Original Message- From: Ralf Heyde Sent: Tuesday, August 14, 2012 7:45 AM To: java-user@lucene.apache.org Subject: Solr adding Documents / Commit in different Threads Hello, we curre

Re: Solr adding Documents / Commit in different Threads

2012-08-14 Thread Jack Krupansky
Try sending only 100 or so (or maybe even only 20) of the documents at a time, and only send commit with the last batch. Sometimes network-related components along the way have trouble dealing with very large requests unless carefully configured. In other words, maybe you can find a way to mak

RE: Solr adding Documents / Commit in different Threads

2012-08-14 Thread Uwe Schindler
Hi, This stack trace has nothing to do with Solr. The problem here is that while Solr tries to respond, the connection to the client was already closed. This might happen if timeouts in your tomcat configuration or reverse proxy in front of your tomcat servers are too low. Can you try this with

Solr adding Documents / Commit in different Threads

2012-08-14 Thread Ralf Heyde
Hello, we currently facing a problem which may lost updates for some documents during adding / comitting. The infrastructure: we have a main solr, which gets documents and distribute them to a lot of slaves. The situation: we have a Job, which runs scheduled every minute (no run, if a prev

[ANNOUNCE] Apache Lucene 4.0-beta released.

2012-08-14 Thread Robert Muir
14 August 2012, Apache Luceneā€š 4.0-beta available The Lucene PMC is pleased to announce the release of Apache Lucene 4.0-beta Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires

RE: UnsupportedOperationException: Query should have been rewritten

2012-08-14 Thread Uwe Schindler
Yes, cast is safe. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Carsten Schnober [mailto:schno...@ids-mannheim.de] > Sent: Tuesday, August 14, 2012 11:07 AM > To: java-user@lucene.apache.org > Subject

Re: UnsupportedOperationException: Query should have been rewritten

2012-08-14 Thread Carsten Schnober
Am 14.08.2012 11:00, schrieb Uwe Schindler: > You have to rewrite the wrapper query. Thanks, Uwe! I had tried that way but it failed because the rewrite() method would return a Query (not a SpanQuery) object. A cast seems to solve the problem, I'm re-posting the code snippet to the list for the sa

RE: UnsupportedOperationException: Query should have been rewritten

2012-08-14 Thread Uwe Schindler
You have to rewrite the wrapper query. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Carsten Schnober [mailto:schno...@ids-mannheim.de] > Sent: Tuesday, August 14, 2012 10:59 AM > To: java-user > Subje

UnsupportedOperationException: Query should have been rewritten

2012-08-14 Thread Carsten Schnober
Dear list, I am trying to combine a WildcardQuery and a SpanQuery because I need to extract spans from the index for further processing. I realise that there have been a few public discussions about this topic around, but I still fail to get what I am missing here. My code is this (Lucene 3.6.0):

RE: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Uwe Schindler
Hi, If you are using StandardAnalyzer, then "Cla$$War" is split at the $ signs, so it searches for two tokens, "cla" and "war". If autogenerate phrase queries is enabled for QueryParser, it will then create a phrase query "cla war" out of it, which is slower because positions are involved. If auto

Re: Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread Ian Lea
Sounds extremely unlikely. What is the query? What analyzer? What version of lucene? What about other strings containing $$? -- Ian. On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008 wrote: > Hi, > > > I have a big index, and when I searched it with a title string "Cla$$War", > Lucene became

Does the string "Cla$$War" affect Lucene?

2012-08-14 Thread zhoucheng2008
Hi, I have a big index, and when I searched it with a title string "Cla$$War", Lucene became very slow. It doesn't happen when I searched with other title string such as "Gone with Wind". Does the "$$" affect the search performance? Thanks, Cheng