Re: How to tune Analyzer for Text Extraction

2009-08-11 Thread Shai Erera
If this file has a predefined construct, e.g.: title: someting location: new york then you can write a simple parser that extracts that information. But I think otherwise this falls outside the scope of Lucene, unless I misunderstood you. If I had to give it a long shot though, I'd try to in

Re: Different Analyzers

2009-08-11 Thread Shai Erera
you should also make sure the data is indexed twice, once w/ the original case and once w/o. It's like putting a TokenFilter after WhitespaceTokenizer which returns two tokens - lowercased and the original, both in the same position (set posIncr to 0). On Wed, Aug 12, 2009 at 6:20 AM, Max Lynch w

Different Analyzers

2009-08-11 Thread Max Lynch
I just want to see if it's safe to use two different analyzers for the following situation: I have an index that I want to preserve case with so I can do case-sensitive searches with my WhitespaceAnalyzer. However, I also want to do case insensitive searches. What I did was create a custom Analy

Re: Query Boosting

2009-08-11 Thread bourne71
thanks, I understand how boosting works, what I need will be a boost in the query that will increase the score of a page if all keywords/query is found in the page to increase its ranking. I tried all sort of combination and it did not work. Anyone can provide any suggestion? Simon Willnauer wr

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Michael McCandless
Phew! Thank you for raising this... it was a sneaky one. Mike On Tue, Aug 11, 2009 at 4:13 PM, Jibo John wrote: > Mike, > > Yes, it works perfect ! > > I did observe a dip in the indexing throughput (1855 recs/sec vs. 2200 > recs/sec previously), but, more importantly, no data is lost this time.

Re: How to tune Analyzer for Text Extraction

2009-08-11 Thread Michael Wechner
xs2Abhishek schrieb: Hi, I am trying to make a decision on weather or not I can use Lucene for my requirements, which mainly include data tagging. I have to be able to parse or index a .txt file and then be able to extract text accordingly. For e.g if the input document has some text like: "Loca

How to tune Analyzer for Text Extraction

2009-08-11 Thread xs2Abhishek
Hi, I am trying to make a decision on weather or not I can use Lucene for my requirements, which mainly include data tagging. I have to be able to parse or index a .txt file and then be able to extract text accordingly. For e.g if the input document has some text like: "Location: New York" , so f

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Jibo John
Mike, Yes, it works perfect ! I did observe a dip in the indexing throughput (1855 recs/sec vs. 2200 recs/sec previously), but, more importantly, no data is lost this time. Thanks for helping me nail this down. -Jibo On Aug 11, 2009, at 11:12 AM, Michael McCandless wrote: OK I found th

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Michael McCandless
OK I found the problem! It was losing docs from the queue, when shutting down the thread pool, because we were calling super's addDocument(doc) not addDocument(doc, analyzer). IndexWriter was simply forwarding that call to ThreadedIndexWriter's addDocument(doc, analyzer) which in turn would do no

Re: Efficient optimization of large indexes?

2009-08-11 Thread Nigel
Mike, thanks very much for your comments! I won't have time to try these ideas for a little while but when I do I'll definitely post the results. On Fri, Aug 7, 2009 at 12:15 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Aug 6, 2009 at 5:30 PM, Nigel wrote: > >> Actually I

Re: ThreadedIndexWriter vs. IndexWriter

2009-08-11 Thread Michael McCandless
I'm baffled why you're losing docs w/ ThreadedIndexWriter. One question: your Lucene core JAR seems to be newer than the last MEAP update. Did you update it manually? Also, your indexes were optimized, but your algs don't have an optimize step -- did you separately run an optimize? Could you zi

Re: Query Boosting

2009-08-11 Thread Simon Willnauer
Hi there, well, where to start from I would suggest you look at the output of Query#explain() first to see how the score is calculated. You might use a simpler query to get started with it as this might be quite cryptic if you see it the first time. To completely understand what the output mea

Query Boosting

2009-08-11 Thread bourne71
Hi, I am fairly new to Lucene and have encounter a problem with the search function i am trying to create using Lucene. When I search, lets say "news sharing", then the results return and display. Its fine up to this point until I check the ranking. Some results, although match only 1 of the 2

Re: How to solve this problem

2009-08-11 Thread Chuan SHI
Although you have not worked with kaffe, your suggestion is right. Thank you. 2009/8/11 Alexander Aristov > I suspect that you might use incompatible versions of lucene and kaffe. > Though I have never worked with kaffe before and so might be wrong. > > Best Regards > Alexander Aristov > > > 200

Re: How to solve this problem

2009-08-11 Thread Chuan SHI
Yeah you are quite right, Malo. I recompile lucene143 with lower jdk and it works. Thank you. 2009/8/11 Malo Pichot > 石川 a écrit : > > Hi, > > I am a newbie in lucene and am trying the 'indexing and searching' > > demo of lucene 1.4.3 using kaffe 1.0.6. After inputing the query, an > error

Re: How to solve this problem

2009-08-11 Thread Malo Pichot
石川 a écrit : > Hi, > I am a newbie in lucene and am trying the 'indexing and searching' > demo of lucene 1.4.3 using kaffe 1.0.6. After inputing the query, an error > occurs as follows: > >Query: stringSearching for: string > java.lang.NoSuchMethodError: > org/apache/lucene/search/Se