Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Mark Miller
I keep considering a full response too this, but I just can't get over the hump and spend the time writing something up. Figured someone else would get to it - perhaps they still will. I will make a comment here though: >Before Lucene 2.9, I don't think this made any difference, as (I think) the

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Michael Busch
On 10/5/09 5:30 PM, Nigel wrote: Before Lucene 2.9, I don't think this made any difference, as (I think) the only advantage to calling reopen vs. just creating another IndexReader was having reopen figure out whether the index had actually changed. (And whave a different way to figure that out

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Jason Rutherglen
I'm not sure I understand the question. You're trying to reopen the segments that you're replicated and you're wondering what's changed in Lucene? On Mon, Oct 5, 2009 at 5:30 PM, Nigel wrote: > Anyone have any ideas here?  I imagine a lot of other people will have a > similar question when trying

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Nigel
Anyone have any ideas here? I imagine a lot of other people will have a similar question when trying to take advantage of the reopen improvements in 2.9. Thanks, Chris On Thu, Oct 1, 2009 at 5:15 PM, Nigel wrote: > I have a question about the reopen functionality in Lucene 2.9. As I > underst

Re: German article about Lucene 2.9

2009-10-05 Thread Simon Willnauer
Here is the english version of the article for those who are interested. Lucene version 2.9 released Content-Management systems like the ones powering the channels at AOL, social networks like LinkedIn, the cloud nebula cloud computing platform at NASA: Nearly no application that does not need to

German article about Lucene 2.9

2009-10-05 Thread Simon Willnauer
Hey Lucene Users, Heise.de ( http://www.heise.de/open/artikel/Such-Engine-Lucene-in-Version-2-9-erschienen-810377.html) has just published an article about the new 2.9 release. Unfortunately they only published the german version while we tried to get the english one too. Thanks to Isabel (http://

InstantiatedIndex feedback

2009-10-05 Thread David Causse
Hi, sometime ago I was asked some feedback about InstantiatedIndex, I lost the identity of the person who asked this so I decided to post some info on this list, sorry. It is not a benchmark but just raw debug results, do not expect scientific/usable results. We index a small set of documents an

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
Thanks. It might be that Nutch sets some values. I am not able to find anything in the config files though. We are using nutch' solrindex. -- Ole-Martin Mørk http://twitter.com/olemartin http://flickr.com/olemartin On Mon, Oct 5, 2009 at 2:28 PM, Simon Willnauer < simon.willna...@googlemail.com>

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
Have a look at your schema definition I guess thats the place where boosts are set if not defined in the data you send to you solr instance. simon On Mon, Oct 5, 2009 at 2:14 PM, Ole-Martin Mørk wrote: > That might be true. The document boost did not change, but maybe the field > boost changed.

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
That might be true. The document boost did not change, but maybe the field boost changed. Is it possible to retrieve the field boost from solr? -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 2:01 PM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > I still guess that the document has been i

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
Could it be that the tokenization schema for URL have changed between the times you added documents? I.e. yielding more tokens when you got the low fieldNorm value. Number of documents should not impact the fieldnorm, the value is based on number of tokens in the field, field and document b

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
I still guess that the document has been indexed with different boost factors the first time if you did not change the length of the URL. Can you make sure this did not happen? simon On Mon, Oct 5, 2009 at 12:45 PM, Ole-Martin Mørk wrote: > I did not change the url. The length of the title was i

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
I did not change the url. The length of the title was increased by 1, from 41 to 42 characters. -- Ole-Martin Mørk On Mon, Oct 5, 2009 at 12:39 PM, Karl Wettin wrote: > sorry, I ment title. > > 5 okt 2009 kl. 11.57 skrev Simon Willnauer: > > > Ole-Martin, did you mention that you did not chang

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
sorry, I ment title. 5 okt 2009 kl. 11.57 skrev Simon Willnauer: Ole-Martin, did you mention that you did not change the URL value but the title? simon On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin wrote: Hi Ole-Martin, how many characters was it in the url in before and after update?

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
Ole-Martin, did you mention that you did not change the URL value but the title? simon On Mon, Oct 5, 2009 at 11:52 AM, Karl Wettin wrote: > Hi Ole-Martin, > > how many characters was it in the url in before and after update? > > > karl > > 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: > > >

Re: Help understanding fieldNorm

2009-10-05 Thread Karl Wettin
Hi Ole-Martin, how many characters was it in the url in before and after update? karl 5 okt 2009 kl. 10.21 skrev Ole-Martin Mørk: Hi. I am trying to understand Lucene's scoring algorithm. We're getting some strange results. First we search for a given page by it's url. We get this resul

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
Did another update: 9.707364 = fieldWeight(url:"our super secret url" in 0), product of: 1.0 = tf(phraseFreq=1.0) 31.063566 = idf(url: www=7329 host=323 com=7329 article=2458 something=4 something=46 704290075=3) 0.3125 = fieldNorm(field=url, doc=0) FieldNorm value is not changed this time.

Re: Help understanding fieldNorm

2009-10-05 Thread Ole-Martin Mørk
I don't think I changed any boost values, at least not on purpose. I think the reason for the changed document id is that, to my knowledge, an update is a delete and an add. The code for my solrj update: public void updateDocument(SolrDocument document) { SolrServer server = new CommonsHtt

Re: Help understanding fieldNorm

2009-10-05 Thread Simon Willnauer
Did you change any boost values for URL field or document while reindexing the document by any chance? Or do you look at different documents - one is internal id 0 and other is internal id 22 - this could be the updated one just curious if that might be the cause?! simon On Mon, Oct 5, 2009 at 10

Re: TimeLimitedCollector hang on, VM process doesn't die (TOMCAT)

2009-10-05 Thread Mani EZZAT
Mark Miller wrote: Mani EZZAT wrote: Mark Miller wrote: That thread will only be stopped if its interrupted. So it would appear there is a not a path that leads to it being interrupted ... why that is would be the next question ... I found someone (a japanese) who had the sa