Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
Thanks. It might be that Nutch sets some values. I am not able to find anything in the config files though. We are using nutch' solrindex. -- Ole-Martin Mørk http://twitter.com/olemartin http://flickr.com/olemartin On Mon, Oct 5, 2009 at 2:28 PM, Simon Willnauer < simon.willna...@googlemail.com>

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
Have a look at your schema definition I guess thats the place where boosts are set if not defined in the data you send to you solr instance. simon On Mon, Oct 5, 2009 at 2:14 PM, Ole-Martin Mørk wrote: > That might be true. The document boost did not change, but maybe the field > boost changed.

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
That might be true. The document boost did not change, but maybe the field boost changed. Is it possible to retrieve the field boost from solr? -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 2:01 PM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > I still guess that the document has been i

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
Could it be that the tokenization schema for URL have changed between the times you added documents? I.e. yielding more tokens when you got the low fieldNorm value. Number of documents should not impact the fieldnorm, the value is based on number of tokens in the field, field and document b

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
I still guess that the document has been indexed with different boost factors the first time if you did not change the length of the URL. Can you make sure this did not happen? simon On Mon, Oct 5, 2009 at 12:45 PM, Ole-Martin Mørk wrote: > I did not change the url. The length of the title was i

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
I did not change the url. The length of the title was increased by 1, from 41 to 42 characters. -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 12:39 PM, Karl Wettin wrote: > sorry, I ment title. > > 5 okt 2009 kl. 11.57 skrev Simon Willnauer: > > > Ole-Martin, did you mention that you did not chang

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
sorry, I ment title. 5 okt 2009 kl. 11.57 skrev Simon Willnauer: Ole-Martin, did you mention that you did not change the URL value but the title? simon On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin wrote: Hi Ole-Martin, how many characters was it in the url in before and after update?

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
Ole-Martin, did you mention that you did not change the URL value but the title? simon On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin wrote: > Hi Ole-Martin, > > how many characters was it in the url in before and after update? > > > karl > > 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: > > >

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
Hi Ole-Martin, how many characters was it in the url in before and after update? karl 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: Hi. I am trying to understand Lucene's scoring algorithm. We're getting some strange results. First we search for a given page by it's url. We get this resul

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
Did another update: 9.707364 = fieldWeight(url:"our super secret url" in 0), product of: 1.0 = tf(phraseFreq=1.0) 31.063566 = idf(url: www=7329 host=323 com=7329 article=2458 something=4 something=46 704290075=3) 0.3125 = fieldNorm(field=url, doc=0) FieldNorm value is not changed this time.

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
I don't think I changed any boost values, at least not on purpose. I think the reason for the changed document id is that, to my knowledge, an update is a delete and an add. The code for my solrj update: public void updateDocument(SolrDocument document) { SolrServer server = new CommonsHtt

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
Did you change any boost values for URL field or document while reindexing the document by any chance? Or do you look at different documents - one is internal id 0 and other is internal id 22 - this could be the updated one just curious if that might be the cause?! simon On Mon, Oct 5, 2009 at 10