.net lucene doubt

2009-07-15 Thread m.harig
hello all , am using .Net lucene for my search application , how do i index non english pages ? Is there any analyzers to do it?? because am struggling with utf8 problem , please any1 help me -- View this message in context: http://www.nabble.com/.net-lucene-doubt-tp24510928p24510928.html

.net lucene doubt

2009-07-15 Thread m.harig
hello all , am using .Net lucene for my search application , how do i index non english pages ? Is there any analyzers to do it?? because am struggling with utf8 problem , please any1 help me -- View this message in context: http://www.nabble.com/.net-lucene-doubt-tp24510918p24510918.html

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
well, QA team is not there, and I am "abusing" cutomer's sysadmin, and it will cost me only a beer if I stop now :) Will post traces tomorrow, daylight does better ... I will have them done on trunk version (fixed two bugs) ... - Original Message > From: Michael McCandless > To

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 7:13 PM, eks dev wrote: >>Are you sure when you ran the test you called >> setAllowDocsOutOfOrder(true)? > > right, just a second this is static... we have two indices, something > runs first and sets it to false... ouch, I hate statics... they make you > beleive you

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
warmduscher :) good night - Original Message > From: Uwe Schindler > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 1:06:30 > Subject: RE: speed of BooleanQueries on 2.9 > > Same here, too late! Good night! > And the blood glucose level is very low, too - very bad

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
>Are you sure when you ran the test you called > setAllowDocsOutOfOrder(true)? right, just a second this is static... we have two indices, something runs first and sets it to false... ouch, I hate statics... they make you beleive you can set them during construction... traces come in a coup

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
Same here, too late! Good night! And the blood glucose level is very low, too - very bad for such problems... Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemc

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 6:52 PM, eks dev wrote: > Also not really expected, but this query runs over BS2, shouldn't  +( > whatewer whatever1...)  run as BS? what does it mean to have MUST +() at the > top level? Your query is +(((X Y Z))^2). In BQ.rewrite, any single-clause query that hasn't h

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
I jut do not see how... Also not really expected, but this query runs over BS2, shouldn't +( whatewer whatever1...) run as BS? what does it mean to have MUST +() at the top level? it is a bit late here, I am going to bed ... Thanks a lot to all involved! Eks - Original Message -

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
I think that query should rewrite to a BQ that would in turn use BS. Are you sure when you ran the test you called setAllowDocsOutOfOrder(true)? (How else can we explain that BS is in the "hung" stack trace, and that setAllowDocsOutOfOrder alters the behavior?) Mike On Wed, Jul 15, 2009 at 6:03

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
There is also this one: https://issues.apache.org/jira/browse/LUCENE-1744 Maybe this fixed this for Eks? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On B

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
You can look into the JavaDocs, which lists all child classes. From there you can click through it - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Beha

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
On Wed, Jul 15, 2009 at 5:57 PM, eks dev wrote: > it works with current trunk, 10 Minutes ago built?! Hmmm, OK, maybe it was the DISI bug. Do we have any Scorers in Lucene that forgot to implement advance() and hence got the slow default version??? Not sure how to ask the IDE for that info... -Yo

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
Mike's instrumented version is not printing anything on this query and it works fine with trunk version BS2 gets executed (top Query Required... +((( )))? again the Query: Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632 NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:marea^0.28

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
it works with current trunk, 10 Minutes ago built?! if I put lucene from yesterday, the same symptoms like yesterday... Mike's instrumented version is running ... - Original Message > From: Yonik Seeley > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 23:34:29

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
On Wed, Jul 15, 2009 at 4:37 PM, Uwe Schindler wrote: > And the fix only affects custom DocIdSetIterators. And custom Queries (via Scorer) since Scorer inherits from DISI. But as Mike says, it shouldn't be the issue behind in this thread. -Yonik http://www.lucidimagination.com --

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
I do, but not on this Query... the same happens when I use Luke - Original Message > From: Uwe Schindler > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 22:37:04 > Subject: RE: speed of BooleanQueries on 2.9 > > And the fix only affects custom DocIdSetIterators.

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
OK let's start w/ the attached patch? It'll produce a ridiculous amount of output (one line for each doc collected). If that's a problem you can comment out the "BS collect" line. Mike On Wed, Jul 15, 2009 at 4:27 PM, Michael McCandless wrote: > OK I'll instrument. > > Mike > > On Wed, Jul 15,

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
And the fix only affects custom DocIdSetIterators. The ones from Lucene core all implement the new API and do it more effective than the example code :-) Or does Eks Dev use custom DocIdSetIterators? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
OK I'll instrument. Mike On Wed, Jul 15, 2009 at 3:28 PM, eks dev wrote: > >> If I make a patch that adds verbosity to what BS is doing, can you run >> it & post the output? > > can do, it can take some time > > > > - Original Message >> From: Michael McCandless >> To: java-user@lucene.

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
I just committed Uwe's fix for that (thanks Uwe!), but I don't think it's causing eks' slowdown because eks' case is a straight OR query, which doesn't use advance. Mike On Wed, Jul 15, 2009 at 3:23 PM, Yonik Seeley wrote: > Could this perhaps have anything to do with the changes to DocIdSetItera

RE: speed of BooleanQueries on 2.9

2009-07-15 Thread Uwe Schindler
To correctly implement the backwards-pattern, it should call skipTo: public int advance(int target) throws IOException { return doc = skipTo(target) ? doc() : NO_MORE_DOCS; } This is how nextDoc is implemented. New iterator that override advance() work correct, older ones implementing ski

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
> If I make a patch that adds verbosity to what BS is doing, can you run > it & post the output? can do, it can take some time - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Wednesday, 15 July, 2009 20:54:25 > Subject: Re: speed of BooleanQuer

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Yonik Seeley
Could this perhaps have anything to do with the changes to DocIdSetIterator? Glancing at the default implementation of advance makes me wince a bit: public int advance(int target) throws IOException { while (nextDoc() < target) {} return doc; } IMO, this is a back-compatibility anti-pa

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 2:30 PM, eks dev wrote: > >> Weird.  Have you run CheckIndex? > nope, I guess it brings nothing: two times built index; Bug provoked by > changing one parameter  that controls only search caused it => no corrupt > index? > > You think we should give it a try? Hell, why not

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
Well, skipTo does in fact throw UOE. And BS.next() does in fact work, which is interesting, but it will next() through docs out-of-order, which BS2 won't like. Does anyone know of any cases where BS.next() is in fact used? Mike On Wed, Jul 15, 2009 at 2:15 PM, Paul Elschot wrote: > As long as n

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
> Weird. Have you run CheckIndex? nope, I guess it brings nothing: two times built index; Bug provoked by changing one parameter that controls only search caused it => no corrupt index? You think we should give it a try? Hell, why not :) What do you mean by "Can you do a binary search to loca

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
As long as next(), skipTo(), doc() and score() on a Scorer work, the search will be done. I hope the results are correct in this case, but I'm not sure. Regards, Paul Elschot On Wednesday 15 July 2009 19:08:00 Michael McCandless wrote: > I don't think a toplevel BS2 is able to use BS as sub-score

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
OK thanks for the updates. Yes, we are on the hunt now ;) Something nasty is lurking... Weird. Have you run CheckIndex? Can you do a binary search to locate the term(s) that's causing it? It's great you see 10% speedup in searching overall (excluding these ones...)! Mike On Wed, Jul 15, 200

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
> Is it possible for you to make the problem happen such that we get > line numbers in this traceback? sure, I will build lucene trunk with debug/line numbers enabled and ask customer's QA to run it again... > Is CPU pegged when it's stuck? Yes!, One core was 100% hot - Original Mes

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
1. pls forget minNumberShould match, it is NOT set on this particular query (minNumberShouldMatch is determined dynamically, depending on semantics of user query... sometimes triggers, sometimes not...). This Exact Query here causes search to take longer than 180 Seconds with allowDocsOutOfO

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 11:41 AM, eks dev wrote: > You see it on stack trace taken while "stuck" > o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource) Is it possible for you to make the problem happen such that we get line numbers in this traceback? Is CPU pe

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
But, that query can't accept a minNumberShouldMatch -- are you really setting that? (You get 0 results if you set it, because the top boolean query has a single required clause). Maybe you set it only on the inner large OR-query? (But then I don't see the ~2 on that inner clause). I've tested a

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
I don't think a toplevel BS2 is able to use BS as sub-scorers? BS2 needs to do doc-at-once, for all sub-scorers, but BS can't do that. I think? Mike On Wed, Jul 15, 2009 at 12:10 PM, Paul Elschot wrote: > On Wednesday 15 July 2009 17:16:23 Michael McCandless wrote: >> So now I'm confused.  Sinc

searching for c++, c#, etc...

2009-07-15 Thread Chris Salem
Hello, I'm trying to search for the terms like c++ but the parser is stripping off the ++. I tried escaping the ++ with slashes but it's still stripping it off. I could replace + with "plus", is that the best way to do it? How come escaping isn't working? thanks Sincerely, Chris Salem

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
On Wednesday 15 July 2009 17:16:23 Michael McCandless wrote: > So now I'm confused. Since your query has required (+) clauses, the > setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk. Probably the top level BQ is using BS2 because of the required clauses, but the nested BQ's ar

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
sorry for confusion, here is exact query that runs forever with setAllowDocsOutOfOrder: You see it on stack trace taken while "stuck" o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource) Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632 NAME:m

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Tue, Jul 14, 2009 at 6:24 PM, eks dev wrote: > org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown > Source) > org.apache.lucene.search.BooleanScorer.score(Unknown Source) > org.apache.lucene.search.BooleanScorer.score(Unknown Source) > org.apache.lucen

[REMINDER] NYC Meetup July 22nd

2009-07-15 Thread Grant Ingersoll
For those in NYC, there will be a Lucene ecosystem (Lucene/Solr/Mahout/ Nutch/Tika/Droids/Lucene ports) Meetup on July 22, hosted by MTV Networks and co-sponsored with Lucid Imagination. For more info and to RSVP, see http://www.meetup.com/NYC-Apache-Lucene-Solr-Meetup/ . There is limited seati

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
So now I'm confused. Since your query has required (+) clauses, the setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk. BooleanQuery only uses BooleanScorer when there are no required terms, and allowDocsOutOfOrder is true. So I can't explain why you see this setting changing a

Re: Stream field values

2009-07-15 Thread Günter Ladwig
Hi, thanks for your answer. I know about lazy loading fields, but my question is whether fields are always loaded as a whole or if it is possible in some way to stream a field's contents. Regards, Günter -- Dipl.-Inform. Günter Ladwig Institute AIFB, University of Karlsruhe, D-76128 Karlsru

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread eks dev
something weird happening w/ BooleanScorer... indeed, my first impression was jvm bug triggered on some rare conditions... but we tried old jvm (1.5).. the latest 1.6 U14 , -client instead of -XBatch -serverno changes We never managed to wait so long to see it finish, so I am not sure if

Re: Stream field values

2009-07-15 Thread Günter Ladwig
Hi, thanks for your answer. I know about lazy loading fields, but my question is whether fields are always loaded as a whole or if it is possible in some way to stream a field's contents. Regards, Günter -- Dipl.-Inform. Günter Ladwig Institute AIFB, University of Karlsruhe, D-76128 Karlsru

Re: Custom FieldComparator and incorrect sort order

2009-07-15 Thread Michael McCandless
On Wed, Jul 15, 2009 at 7:51 AM, Shalin Shekhar Mangar wrote: > On Wed, Jul 15, 2009 at 4:49 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> OK I opened & fixed https://issues.apache.org/jira/browse/LUCENE-1744. >> >> > Wow, that was fast! Thanks! Well, I had the easy part ;) Yo

Re: Custom FieldComparator and incorrect sort order

2009-07-15 Thread Shalin Shekhar Mangar
On Wed, Jul 15, 2009 at 4:49 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK I opened & fixed https://issues.apache.org/jira/browse/LUCENE-1744. > > Wow, that was fast! Thanks! -- Regards, Shalin Shekhar Mangar.

Re: Index doubling in size when adding extra terms

2009-07-15 Thread Michael McCandless
It looks like your "text_substrings" field will have many more unique terms than the original text, right? And, since it's indexed (I assume), the docIDs will in fact be stored twice (once in the postings for your orig text and once in the postings for text_substrings). So I think it's expected t

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
On Tue, Jul 14, 2009 at 7:04 PM, eks dev wrote: > > I do not know exactly why, but > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but > with setAllowDocsOutOfOrder(false);  no problems whatsoever That toggles between using BooleanScorer vs BooleanScorer2. The odd thing i

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Michael McCandless
OK I'll dig on this one. Maybe I can repro w/ a Wikipedia index. Mike On Tue, Jul 14, 2009 at 7:04 PM, eks dev wrote: > > I do not know exactly why, but > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but > with setAllowDocsOutOfOrder(false);  no problems whatsoever > >

Re: Custom FieldComparator and incorrect sort order

2009-07-15 Thread Michael McCandless
OK I opened & fixed https://issues.apache.org/jira/browse/LUCENE-1744. Thanks Shalin! Mike On Wed, Jul 15, 2009 at 7:04 AM, Michael McCandless wrote: > OK this is a bug in BooleanScorer2!  I'll open it shortly... thanks Shalin! > > Mike > > On Wed, Jul 15, 2009 at 6:32 AM, Michael > McCandless w

Re: Custom FieldComparator and incorrect sort order

2009-07-15 Thread Michael McCandless
OK this is a bug in BooleanScorer2! I'll open it shortly... thanks Shalin! Mike On Wed, Jul 15, 2009 at 6:32 AM, Michael McCandless wrote: > I'll look into this... > > Mike > > On Wed, Jul 15, 2009 at 3:55 AM, Shalin Shekhar > Mangar wrote: >> Hello, >> >> Over in Solr land, I'm facing a problem

RE: Index doubling in size when adding extra terms

2009-07-15 Thread Uwe Schindler
Field.Store.NO used for the text_substrings field? :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Gregory Tarr [mailto:gregory.t...@detica.com] > Sent: Wednesday, July 15, 2009 12:49 PM > To: java-us

Index doubling in size when adding extra terms

2009-07-15 Thread Gregory Tarr
I have added a new field to each document in my index containing substrings of another field to speed up initial-wildcard searches. Each document has a field "text" which might contain "the quick brown fox jumped over the lazy dogs" The new field - "text_substrings" would then contain "the quick u

Re: re-ranking ....

2009-07-15 Thread KK
fetch all the search results along with their corresponding values for all the terms used for scoring and then you use those values and play-around with them and re-rank your results to your hearts content/wish. --kk On Wed, Jul 15, 2009 at 11:28 AM, henok sahilu wrote: > what i want to do is re

Re: Custom FieldComparator and incorrect sort order

2009-07-15 Thread Michael McCandless
I'll look into this... Mike On Wed, Jul 15, 2009 at 3:55 AM, Shalin Shekhar Mangar wrote: > Hello, > > Over in Solr land, I'm facing a problem while upgrading the lucene version > to trunk. Solr has a QueryElevationComponent which is used to boost certain > documents to the top. It pre-processes

Custom FieldComparator and incorrect sort order

2009-07-15 Thread Shalin Shekhar Mangar
Hello, Over in Solr land, I'm facing a problem while upgrading the lucene version to trunk. Solr has a QueryElevationComponent which is used to boost certain documents to the top. It pre-processes the query to add a few boolean clauses of its own and uses a FieldComparator for the sorting part. Th