Re: Modifying Length Normalization calculation

2011-06-13 Thread Lahiru Samarakoon
Hi Ian, The order is right and your method is working for me. Thanks [?] Lahiru On Mon, Jun 13, 2011 at 7:15 PM, Ian Lea wrote: > This is getting beyond my level of expertise, but I'll have a go at > your questions. Hopefully someone better informed will step in with > corrections or confir

Re: Index size and performance degradation

2011-06-13 Thread Shai Erera
> > but you'll still cache the results - so again this isn't viable when RT > search, or even an NRT, is a requirement > No I don't cache the results. The Filter is an OpenBitSet of all docs that match the filter (e.g. have the specified language field's value) and it is refreshed whenever new seg

Re: So many compiling errors in the 3.3 source release when added into Eclipse

2011-06-13 Thread Erick Erickson
How did you do this? Did you execute the "ant eclipse" target first? See the instructions at: http://wiki.apache.org/solr/HowToContribute#Eclipse_.28Galileo.2C_J2EE_version_1.2.2.20100217-2310.2C_but_any_relatively_recent_Eclipse_should_do.29: Best Erick On Sat, Jun 11, 2011 at 2:16 AM, dyzc2010

Re: Index size and performance degradation

2011-06-13 Thread Jason Rutherglen
> deletions made by readers merely mark it for > deletion, and once a doc has been marked for deletions it is deleted for all > intents and purposes, right? There's the point-in-timeness of a reader to consider. > Does the N in NRT represent only the cost of reopening a searcher? Aptly put, and

Re: Index size and performance degradation

2011-06-13 Thread Itamar Syn-Hershko
Since there should only be one writer, I'm not sure why you'd need transactional storage for that? deletions made by readers merely mark it for deletion, and once a doc has been marked for deletions it is deleted for all intents and purposes, right? But perhaps I need to refresh my memory on th

Re: Index size and performance degradation

2011-06-13 Thread Jason Rutherglen
> I don't think we'd do the post-filtering solution, but instead maybe > resolve the deletes "live" and store them in a transactional data I think Michael B. aptly described the sequence ID approach for 'live' deletes? On Mon, Jun 13, 2011 at 3:00 PM, Michael McCandless wrote: > Yes, adding dele

Re: Index size and performance degradation

2011-06-13 Thread Michael McCandless
Yes, adding deletes to Twitter's approach will be a challenge! I don't think we'd do the post-filtering solution, but instead maybe resolve the deletes "live" and store them in a transactional data structure of some kind... but even then we will pay a perf hit to lookup del docs against it. So, y

Re: Index size and performance degradation

2011-06-13 Thread Itamar Syn-Hershko
Thanks Mike, much appreciated. Wouldn't Twitter's approach fall for the exact same pit-hole you described Zoie does (or did) when it'll handle deletes too? I don't thing there is any other way of handling deletes other than post-filtering results. But perhaps the IW cache would be smaller tha

Re: Index size and performance degradation

2011-06-13 Thread Itamar Syn-Hershko
On 13/06/2011 06:23, Shai Erera wrote: A Language filter is one -- different users search in different languages and want to view pages in those languages only. If you have a field attach to your documents that identifies the language of the document, you can use it to filter the queries to retur

So many compiling errors in the 3.3 source release when added into Eclipse

2011-06-13 Thread dyzc2010
Hi I added a Java project in Eclipse for the 3.3 src release, and find hundreds of compiling errors. Most are the MockAnalyzer class are missing a few constructors. Does anyone have the same experience? How has this incomplete version been released? Thanks

Re: Index size and performance degradation

2011-06-13 Thread Michael McCandless
Here's a blog post describing some details of Twitter's approach: http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html And here's a talk Michael did last October (Lucene Revolutions): http://www.lucidimagination.com/events/revolution2010/video-Realtime-Search-Wit

Re: need help

2011-06-13 Thread Marlen
thank you very much Ian. On 13/06/2011 9:17, Ian Lea wrote: Hello Lucene can be used for searching pretty much anything. But it is a library, not an application, and you'll have to write code to make it do what you want. You might be better off using Solr. It uses lucene but provides lots of

Re: Modifying Length Normalization calculation

2011-06-13 Thread Ian Lea
This is getting beyond my level of expertise, but I'll have a go at your questions. Hopefully someone better informed will step in with corrections or confirmation. > ... > The application calls the *writer.addDocument(d);* method and in this > process the *lengthNorm(String fieldName, int numTer

Re: need help

2011-06-13 Thread Ian Lea
Hello Lucene can be used for searching pretty much anything. But it is a library, not an application, and you'll have to write code to make it do what you want. You might be better off using Solr. It uses lucene but provides lots of stuff on top. http://lucene.apache.org/solr/features.html -

need help

2011-06-13 Thread Marlen
hello I´m new with lucene.. I wold like to know if I can use it to make searchs on my web site and FTP.. and know if it v¡can make search over pdf,*.doc, and any other non plane text Thanks - To unsubscribe, e-mail: java-user-u

Re: Slow ness of IndexWriter.close()

2011-06-13 Thread Erick Erickson
My first question is "what are you trying to do at a higher level"? Because asking people to check your code without telling us what you're trying to accomplish makes it difficult to know what to look at. You might review: http://wiki.apache.org/solr/UsingMailingLists That said, at a guess, your

Re: WordBoundTokenFilter

2011-06-13 Thread Denis Bazhenov
Okay, now I'm experiencing one of those "Simpsons already did it" moments in my life :) Nevertheless, nice to know that this problem already solved and I should write no code at all. Thanks a lot! On 13.06.2011, at 22:11, Uwe Schindler wrote: > In Lucene trunk (will be version 4.0), all analyze

Slow ness of IndexWriter.close()

2011-06-13 Thread Yogesh Dabhi
Hi, I try to add and update document in index At the start its take only 1 to 2 second but after 50 to 60 document add and update Its take 40 to 50 second or some time its take more then 1 min Is there any way to improve performance ? Please help me Please check my code

RE: WordBoundTokenFilter

2011-06-13 Thread Uwe Schindler
In Lucene trunk (will be version 4.0), all analyzers/tokenizers/tokenfilters were moved to a new shared analyzer module. So WDF is now part of a shared Lucene/Solr module. In 3.x, you still have to add the Solr JARS to use it. This TokenFilter should do what you intend to do (see the Solr document

Re: WordBoundTokenFilter

2011-06-13 Thread Em
Yes, it's part of Solr. And even in Solr there was no documentation in the API - at last when I searched for it the last time. Regards, Em Am 13.06.2011 12:56, schrieb Denis Bazhenov: > It seems so. Interestingly I can't find any mentions of > WordDelimiterTokenFilter using google. Is it part of

Re: WordBoundTokenFilter

2011-06-13 Thread Denis Bazhenov
It seems so. Interestingly I can't find any mentions of WordDelimiterTokenFilter using google. Is it part of Solr codebase? On 13.06.2011, at 21:49, Em wrote: > Hi, > > sounds like the WordDelimiterTokenFilter from Solr, doesn't it? > > Regards, > Em > > Am 13.06.2011 12:06, schrieb Denis Bazh

Re: WordBoundTokenFilter

2011-06-13 Thread Em
Hi, sounds like the WordDelimiterTokenFilter from Solr, doesn't it? Regards, Em Am 13.06.2011 12:06, schrieb Denis Bazhenov: > Some time ago I need to tune our home grown search engine based on lucene to > perform well on product searches. Product search is search where users come > with part

Re: Modifying Length Normalization calculation

2011-06-13 Thread Lahiru Samarakoon
HI Ian, Thank you very much for the reply. The application calls the *writer.addDocument(d);* method and in this process the *lengthNorm(String fieldName, int numTerms)* method is called. I can extend the *DefaultSimilarity* class and override the *lengthNorm*method, but how can I explicitly spe

WordBoundTokenFilter

2011-06-13 Thread Denis Bazhenov
Some time ago I need to tune our home grown search engine based on lucene to perform well on product searches. Product search is search where users come with part of product name and we should find the product. The problem here is that users doesn't provide full model name. For instance id prod

Re: Modifying Length Normalization calculation

2011-06-13 Thread Ian Lea
org.apache.lucene.search.Similarity would be the place to look, specifically computeNorm(String field, FieldInvertState state). There is comprehensive info in the javadocs. Note that values are calculated at indexing and stored in the index encoded, with some loss of precision. -- Ian. On Mon,