Re: uncorrect results

Jan Thu, 18 Nov 2010 10:53:23 -0800

hmm ok i tried it but to no avail. 
It would have confused me even more to be honest.
actually i would not have used a Document Collector at all, because I
was supposed to give all results even when queried "the". What i mean is
that i would not need the score at all. I just didn't know how ;)
Anyway a scoring system should not "invent" token i think.


but thanks
jan

Am Donnerstag, den 18.11.2010, 13:05 -0500 schrieb Pulkit Singhal:
> Briefly looked at your code and there is no way that I'm right about
> this but I'll say it anyway:
> Every single field you index doesn't have any NORMS so how will the
> scoring happen?
> It probably happens based on the matches at query time but its not
> like you are specifying any boosts in you query.
> Lucene has a complex scoring formula that I don't claim to fully
> understand ... but what if somehow (stay with me, don't shoot the
> messenger) due to the fact that you have no NORMS at all, the results
> being collected somehow give a score to the document that doesn't have
> a match at all and therefore present it in the results?
> 
> Just a theory (a bad one perhaps) ... but one which can be easily
> blown away by using ANALYZED in your indexer and then trying again.
> 
> - Pulkit
> 
> On Thu, Nov 18, 2010 at 12:55 PM, Pulkit Singhal
> <pulkitsing...@gmail.com> wrote:
> > Wow, you live in a really great country and attend an awesome
> > university where they have classes like "Text Analytics" I'm gonna
> > send my kid there to study :)
> >
> > In all seriousness I think the problem may be with how you are
> > collecting your results.
> >
> > I find this very amusing:
> >> 80. 896889 phrase occurs 0 times
> >
> > How can it claim there are zero hits and still be returning you a result? 
> > Weird.
> >
> > Have you tried removing all other docs and then only leaving the one
> > problem child in there indexing just that and seeing what comes back?
> >
> > On Wed, Nov 17, 2010 at 1:19 PM, Jan <fajer...@informatik.hu-berlin.de> 
> > wrote:
> >> thats what i figured...i can't find out what i'm doing wrong though ;)
> >>
> >> so the query is "experiment" (i know not really a phrase...but the
> >> assignment requested precisely so). The program constructs the following
> >> query
> >>
> >> +(AbstractText:"experiment" ArticleTitle:"experiment")
> >>
> >> which looks good to me. the results look like this:
> >>
> >> Found 95 hits.
> >> 1. 19810 phrase occurs 3 times
> >> 2. 587340 phrase occurs once
> >> ...
> >> 80. 896889 phrase occurs 0 times
> >> ...
> >> 95. 900325 phrase occurs once
> >>
> >> so here is the document 896889
> >> PMID
> >>        896889
> >> ArticleTitle
> >>        Estrogen-induced sexual receptivity and localization of 
> >> 3H-estradiol in
> >> brains of female mice: effects of 5 alpha-reduced androgens, progestins
> >> and cyproterone acetate.
> >> AbstractText
> >>        Sexual receptivity induced in ovariectomized CD-1 mice with chronic
> >> daily administration of estradiol benzoate (E2 B) was blocked by
> >> concurrent administration of the 5 alpha-reduced androgen,
> >> dihydrotestosterone (DHT). Receptivity was restored in these females
> >> with progesterone-, but not with dihydroprogesterone-priming 6 hr prior
> >> to testing. Delaying the DHT injections until 12 hr after the E2 B
> >> injections greatly reduced its inhibitory properties. Receptivity in E2
> >> B-primed females was also blocked by concurrent treatment with
> >> cyproterone acetate and 3 alpha-, but not 3 beta-adrostanediol.
> >> Pretreatment with DHT, or 3 alpha- or 3 beta-androstanediol failed to
> >> consistently affects 3H-estradiol accumulation in crude nuclear and
> >> supernatant fractions from brain and pituitary
> >>
> >> so apart from doing something wrong while indexing/analyzing (the text
> >> above is from the xml, but i double checked...it is put in teh index
> >> with these textfragments) or so, the token "experiment" does not even
> >> occur. thats what baffles me.
> >>
> >> thanks for the very quick reaction
> >> jan
> >>
> >> Am Mittwoch, den 17.11.2010, 12:57 -0500 schrieb Donna L Gresh:
> >>> As it is probably more likely that you're doing something incorrect than
> >>> that Lucene is reporting incorrect results :), it might help if you
> >>> reported the exact query that is being submitted to the IndexSearcher, and
> >>> then showing us the document that was incorrectly returned. My guess is
> >>> that either looking at the query itself will immediately reveal the
> >>> problem to you, or that the query in combination with the document and
> >>> knowledge of which analyzers you are using will reveal the problem-
> >>>
> >>> Donna
> >>>
> >>>
> >>> Jan <fajer...@informatik.hu-berlin.de> wrote on 11/17/2010 11:47:49 AM:
> >>>
> >>> > [image removed]
> >>> >
> >>> > uncorrect results
> >>> >
> >>> > Jan
> >>> >
> >>> > to:
> >>> >
> >>> > java-user
> >>> >
> >>> > 11/17/2010 11:51 AM
> >>> >
> >>> > Please respond to java-user
> >>> >
> >>> > Hi,
> >>> > i have an assignment in my Text Analytics class. I am supposed to create
> >>> > an index and search it. The corpus is a PubMed-like XML file. it is
> >>> > possible to query terms (programcall a few terms) and phrases
> >>> > (programcall "a phrase").
> >>> > When a phrase is queried the program should answer how often the phrase
> >>> > occured.
> >>> > The problem is, on certain queries the IndexSearcher returns some
> >>> > documents that do not have that particular query in its fields.
> >>> > I'd be delighted if someone could tell me what i am doing wrong.
> >>> > See the source code at my github repo
> >>> >
> >>> https://github.com/jangingnicht/TextAnalytics2/tree/master/src/textanalytics2/
> >>>
> >>> >
> >>> > Thanks in advance
> >>> > jan
> >>> >
> >>> > PS: I use Lucene 3.0.2 and the OpenJDK Runtime Environment (IcedTea6
> >>> > 1.8.2) on an 64 bit Linux machine.
> >>> > [attachment "signature.asc" deleted by Donna L Gresh/Watson/IBM]
> >>
> >>
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil

Re: uncorrect results

Reply via email to