hello all ,
am using .Net lucene for my search application , how do i index non
english pages ? Is there any analyzers to do it?? because am struggling with
utf8 problem , please any1 help me
--
View this message in context:
http://www.nabble.com/.net-lucene-doubt-tp24510928p24510928.html
hello all ,
am using .Net lucene for my search application , how do i index non
english pages ? Is there any analyzers to do it?? because am struggling with
utf8 problem , please any1 help me
--
View this message in context:
http://www.nabble.com/.net-lucene-doubt-tp24510918p24510918.html
well, QA team is not there, and I am "abusing" cutomer's sysadmin, and it will
cost me only a beer if I stop now :)
Will post traces tomorrow, daylight does better ... I will have them done on
trunk version (fixed two bugs) ...
- Original Message
> From: Michael McCandless
> To
On Wed, Jul 15, 2009 at 7:13 PM, eks dev wrote:
>>Are you sure when you ran the test you called
>> setAllowDocsOutOfOrder(true)?
>
> right, just a second this is static... we have two indices, something
> runs first and sets it to false... ouch, I hate statics... they make you
> beleive you
warmduscher :)
good night
- Original Message
> From: Uwe Schindler
> To: java-user@lucene.apache.org
> Sent: Thursday, 16 July, 2009 1:06:30
> Subject: RE: speed of BooleanQueries on 2.9
>
> Same here, too late! Good night!
> And the blood glucose level is very low, too - very bad
>Are you sure when you ran the test you called
> setAllowDocsOutOfOrder(true)?
right, just a second this is static... we have two indices, something runs
first and sets it to false... ouch, I hate statics... they make you beleive you
can set them during construction... traces come in a coup
Same here, too late! Good night!
And the blood glucose level is very low, too - very bad for such problems...
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemc
On Wed, Jul 15, 2009 at 6:52 PM, eks dev wrote:
> Also not really expected, but this query runs over BS2, shouldn't +(
> whatewer whatever1...) run as BS? what does it mean to have MUST +() at the
> top level?
Your query is +(((X Y Z))^2). In BQ.rewrite, any single-clause query
that hasn't h
I jut do not see how...
Also not really expected, but this query runs over BS2, shouldn't +( whatewer
whatever1...) run as BS? what does it mean to have MUST +() at the top level?
it is a bit late here, I am going to bed ...
Thanks a lot to all involved!
Eks
- Original Message -
I think that query should rewrite to a BQ that would in turn use BS.
Are you sure when you ran the test you called
setAllowDocsOutOfOrder(true)?
(How else can we explain that BS is in the "hung" stack trace, and
that setAllowDocsOutOfOrder alters the behavior?)
Mike
On Wed, Jul 15, 2009 at 6:03
There is also this one: https://issues.apache.org/jira/browse/LUCENE-1744
Maybe this fixed this for Eks?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On B
You can look into the JavaDocs, which lists all child classes. From there
you can click through it
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Beha
On Wed, Jul 15, 2009 at 5:57 PM, eks dev wrote:
> it works with current trunk, 10 Minutes ago built?!
Hmmm, OK, maybe it was the DISI bug.
Do we have any Scorers in Lucene that forgot to implement advance()
and hence got the slow default version???
Not sure how to ask the IDE for that info...
-Yo
Mike's instrumented version is not printing anything on this query
and it works fine with trunk version
BS2 gets executed (top Query Required... +((( )))?
again the Query:
Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632
NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:marea^0.28
it works with current trunk, 10 Minutes ago built?!
if I put lucene from yesterday, the same symptoms like yesterday...
Mike's instrumented version is running ...
- Original Message
> From: Yonik Seeley
> To: java-user@lucene.apache.org
> Sent: Wednesday, 15 July, 2009 23:34:29
On Wed, Jul 15, 2009 at 4:37 PM, Uwe Schindler wrote:
> And the fix only affects custom DocIdSetIterators.
And custom Queries (via Scorer) since Scorer inherits from DISI.
But as Mike says, it shouldn't be the issue behind in this thread.
-Yonik
http://www.lucidimagination.com
--
I do, but not on this Query... the same happens when I use Luke
- Original Message
> From: Uwe Schindler
> To: java-user@lucene.apache.org
> Sent: Wednesday, 15 July, 2009 22:37:04
> Subject: RE: speed of BooleanQueries on 2.9
>
> And the fix only affects custom DocIdSetIterators.
OK let's start w/ the attached patch? It'll produce a ridiculous
amount of output (one line for each doc collected). If that's a
problem you can comment out the "BS collect" line.
Mike
On Wed, Jul 15, 2009 at 4:27 PM, Michael
McCandless wrote:
> OK I'll instrument.
>
> Mike
>
> On Wed, Jul 15,
And the fix only affects custom DocIdSetIterators. The ones from Lucene core
all implement the new API and do it more effective than the example code :-)
Or does Eks Dev use custom DocIdSetIterators?
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@
OK I'll instrument.
Mike
On Wed, Jul 15, 2009 at 3:28 PM, eks dev wrote:
>
>> If I make a patch that adds verbosity to what BS is doing, can you run
>> it & post the output?
>
> can do, it can take some time
>
>
>
> - Original Message
>> From: Michael McCandless
>> To: java-user@lucene.
I just committed Uwe's fix for that (thanks Uwe!), but I don't think
it's causing eks' slowdown because eks' case is a straight OR query,
which doesn't use advance.
Mike
On Wed, Jul 15, 2009 at 3:23 PM, Yonik Seeley wrote:
> Could this perhaps have anything to do with the changes to DocIdSetItera
To correctly implement the backwards-pattern, it should call skipTo:
public int advance(int target) throws IOException {
return doc = skipTo(target) ? doc() : NO_MORE_DOCS;
}
This is how nextDoc is implemented. New iterator that override advance()
work correct, older ones implementing ski
> If I make a patch that adds verbosity to what BS is doing, can you run
> it & post the output?
can do, it can take some time
- Original Message
> From: Michael McCandless
> To: java-user@lucene.apache.org
> Sent: Wednesday, 15 July, 2009 20:54:25
> Subject: Re: speed of BooleanQuer
Could this perhaps have anything to do with the changes to DocIdSetIterator?
Glancing at the default implementation of advance makes me wince a bit:
public int advance(int target) throws IOException {
while (nextDoc() < target) {}
return doc;
}
IMO, this is a back-compatibility anti-pa
On Wed, Jul 15, 2009 at 2:30 PM, eks dev wrote:
>
>> Weird. Have you run CheckIndex?
> nope, I guess it brings nothing: two times built index; Bug provoked by
> changing one parameter that controls only search caused it => no corrupt
> index?
>
> You think we should give it a try? Hell, why not
Well, skipTo does in fact throw UOE.
And BS.next() does in fact work, which is interesting, but it will
next() through docs out-of-order, which BS2 won't like. Does anyone
know of any cases where BS.next() is in fact used?
Mike
On Wed, Jul 15, 2009 at 2:15 PM, Paul Elschot wrote:
> As long as n
> Weird. Have you run CheckIndex?
nope, I guess it brings nothing: two times built index; Bug provoked by
changing one parameter that controls only search caused it => no corrupt index?
You think we should give it a try? Hell, why not :)
What do you mean by "Can you do a binary search to loca
As long as next(), skipTo(), doc() and score() on a Scorer work,
the search will be done. I hope the results are correct in this
case, but I'm not sure.
Regards,
Paul Elschot
On Wednesday 15 July 2009 19:08:00 Michael McCandless wrote:
> I don't think a toplevel BS2 is able to use BS as sub-score
OK thanks for the updates. Yes, we are on the hunt now ;) Something
nasty is lurking...
Weird. Have you run CheckIndex?
Can you do a binary search to locate the term(s) that's causing it?
It's great you see 10% speedup in searching overall (excluding these ones...)!
Mike
On Wed, Jul 15, 200
> Is it possible for you to make the problem happen such that we get
> line numbers in this traceback?
sure, I will build lucene trunk with debug/line numbers enabled and ask
customer's QA to run it again...
> Is CPU pegged when it's stuck?
Yes!, One core was 100% hot
- Original Mes
1. pls forget minNumberShould match, it is NOT set on this particular query
(minNumberShouldMatch is determined dynamically, depending on semantics of user
query... sometimes triggers, sometimes not...).
This Exact Query here causes search to take longer than 180 Seconds with
allowDocsOutOfO
On Wed, Jul 15, 2009 at 11:41 AM, eks dev wrote:
> You see it on stack trace taken while "stuck"
> o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource)
Is it possible for you to make the problem happen such that we get
line numbers in this traceback?
Is CPU pe
But, that query can't accept a minNumberShouldMatch -- are you really
setting that? (You get 0 results if you set it, because the top
boolean query has a single required clause). Maybe you set it only on
the inner large OR-query? (But then I don't see the ~2 on that inner
clause).
I've tested a
I don't think a toplevel BS2 is able to use BS as sub-scorers? BS2
needs to do doc-at-once, for all sub-scorers, but BS can't do that. I
think?
Mike
On Wed, Jul 15, 2009 at 12:10 PM, Paul Elschot wrote:
> On Wednesday 15 July 2009 17:16:23 Michael McCandless wrote:
>> So now I'm confused. Sinc
Hello,
I'm trying to search for the terms like c++ but the parser is stripping off the
++. I tried escaping the ++ with slashes but it's still stripping it off. I
could replace + with "plus", is that the best way to do it? How come escaping
isn't working?
thanks
Sincerely,
Chris Salem
On Wednesday 15 July 2009 17:16:23 Michael McCandless wrote:
> So now I'm confused. Since your query has required (+) clauses, the
> setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk.
Probably the top level BQ is using BS2 because of the required clauses,
but the nested BQ's ar
sorry for confusion, here is exact query that runs forever with
setAllowDocsOutOfOrder:
You see it on stack trace taken while "stuck"
o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource)
Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632
NAME:m
On Tue, Jul 14, 2009 at 6:24 PM, eks dev wrote:
> org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown
> Source)
> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
> org.apache.lucen
For those in NYC, there will be a Lucene ecosystem (Lucene/Solr/Mahout/
Nutch/Tika/Droids/Lucene ports) Meetup on July 22, hosted by MTV
Networks and co-sponsored with Lucid Imagination.
For more info and to RSVP, see http://www.meetup.com/NYC-Apache-Lucene-Solr-Meetup/
. There is limited seati
So now I'm confused. Since your query has required (+) clauses, the
setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk.
BooleanQuery only uses BooleanScorer when there are no required terms,
and allowDocsOutOfOrder is true. So I can't explain why you see this
setting changing a
Hi,
thanks for your answer. I know about lazy loading fields, but my
question is whether fields are always loaded as a whole or if it is
possible in some way to stream a field's contents.
Regards,
Günter
--
Dipl.-Inform. Günter Ladwig
Institute AIFB, University of Karlsruhe, D-76128 Karlsru
something weird happening w/ BooleanScorer...
indeed, my first impression was jvm bug triggered on some rare conditions...
but we tried old jvm (1.5).. the latest 1.6 U14 , -client instead of -XBatch
-serverno changes
We never managed to wait so long to see it finish, so I am not sure if
Hi,
thanks for your answer. I know about lazy loading fields, but my
question is whether fields are always loaded as a whole or if it is
possible in some way to stream a field's contents.
Regards,
Günter
--
Dipl.-Inform. Günter Ladwig
Institute AIFB, University of Karlsruhe, D-76128 Karlsru
On Wed, Jul 15, 2009 at 7:51 AM, Shalin Shekhar
Mangar wrote:
> On Wed, Jul 15, 2009 at 4:49 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> OK I opened & fixed https://issues.apache.org/jira/browse/LUCENE-1744.
>>
>>
> Wow, that was fast! Thanks!
Well, I had the easy part ;) Yo
On Wed, Jul 15, 2009 at 4:49 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> OK I opened & fixed https://issues.apache.org/jira/browse/LUCENE-1744.
>
>
Wow, that was fast! Thanks!
--
Regards,
Shalin Shekhar Mangar.
It looks like your "text_substrings" field will have many more unique
terms than the original text, right? And, since it's indexed (I
assume), the docIDs will in fact be stored twice (once in the postings
for your orig text and once in the postings for text_substrings). So
I think it's expected t
On Tue, Jul 14, 2009 at 7:04 PM, eks dev wrote:
>
> I do not know exactly why, but
> when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but
> with setAllowDocsOutOfOrder(false); no problems whatsoever
That toggles between using BooleanScorer vs BooleanScorer2.
The odd thing i
OK I'll dig on this one. Maybe I can repro w/ a Wikipedia index.
Mike
On Tue, Jul 14, 2009 at 7:04 PM, eks dev wrote:
>
> I do not know exactly why, but
> when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but
> with setAllowDocsOutOfOrder(false); no problems whatsoever
>
>
OK I opened & fixed https://issues.apache.org/jira/browse/LUCENE-1744.
Thanks Shalin!
Mike
On Wed, Jul 15, 2009 at 7:04 AM, Michael
McCandless wrote:
> OK this is a bug in BooleanScorer2! I'll open it shortly... thanks Shalin!
>
> Mike
>
> On Wed, Jul 15, 2009 at 6:32 AM, Michael
> McCandless w
OK this is a bug in BooleanScorer2! I'll open it shortly... thanks Shalin!
Mike
On Wed, Jul 15, 2009 at 6:32 AM, Michael
McCandless wrote:
> I'll look into this...
>
> Mike
>
> On Wed, Jul 15, 2009 at 3:55 AM, Shalin Shekhar
> Mangar wrote:
>> Hello,
>>
>> Over in Solr land, I'm facing a problem
Field.Store.NO used for the text_substrings field? :-)
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Gregory Tarr [mailto:gregory.t...@detica.com]
> Sent: Wednesday, July 15, 2009 12:49 PM
> To: java-us
I have added a new field to each document in my index containing
substrings of another field to speed up initial-wildcard searches.
Each document has a field "text" which might contain "the quick brown
fox jumped over the lazy dogs"
The new field - "text_substrings" would then contain "the quick u
fetch all the search results along with their corresponding values for all
the terms used for scoring and then you use those values and play-around
with them and re-rank your results to your hearts content/wish.
--kk
On Wed, Jul 15, 2009 at 11:28 AM, henok sahilu wrote:
> what i want to do is re
I'll look into this...
Mike
On Wed, Jul 15, 2009 at 3:55 AM, Shalin Shekhar
Mangar wrote:
> Hello,
>
> Over in Solr land, I'm facing a problem while upgrading the lucene version
> to trunk. Solr has a QueryElevationComponent which is used to boost certain
> documents to the top. It pre-processes
Hello,
Over in Solr land, I'm facing a problem while upgrading the lucene version
to trunk. Solr has a QueryElevationComponent which is used to boost certain
documents to the top. It pre-processes the query to add a few boolean
clauses of its own and uses a FieldComparator for the sorting part. Th
55 matches
Mail list logo