Re: Multiple Terms, Delete From Index

2006-09-26 Thread Josh Joy
Hi Otis, Won't that delete all documents with term1, then all documents with term2...rather than deleting all documents that contain only term1 and term2...or am I missing the obvious and doing something wrong? Thanks, Josh Otis Gospodnetic wrote: > Heh, I have to try the obvious - two reader.

Re: Lucene In Action Book vs Lucene 2.0

2006-09-26 Thread KEGan
Otis, What about the internal of Lucene? Are there any major changes in there? LIA is such a great book. Any date when LIA2 is coming? I definitely must get it :) On 9/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi, I think you'll find most of the book to still be useful (but then ag

Re: Lucene In Action Book vs Lucene 2.0

2006-09-26 Thread Otis Gospodnetic
Hi, I think you'll find most of the book to still be useful (but then again, I'm the co-author, so maybe I'm not 100% objective). One thing where the API changed is Fields. They are now constructed differently, so the code in the book won't match the current API. We have LIA code working unde

Re: spell checker

2006-09-26 Thread Otis Gospodnetic
The code works with Lucene 2.0, I've used it. However, it did change slightly. If you look in JIRA you'll find some comments about it. If I recall correctly, some changes I made to LuceneDictionary(?) class now require the index directory to existI think. Otis - Original Message ---

Re: Multiple Terms, Delete From Index

2006-09-26 Thread Otis Gospodnetic
Heh, I have to try the obvious - two reader.delete(term) calls? Otis - Original Message From: Josh Joy <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, September 26, 2006 10:04:13 PM Subject: Multiple Terms, Delete From Index Hi All, I need to delete from the index wh

Re: Very high fieldNorm for a field resulting in bad results

2006-09-26 Thread Mek
Thanks a lot Chris for the detailed & patitent response. The value of a the field norm for any field named "A" is typically the lengthNorm of the field, times the document boost, times the field boost for *each* Field instance added to the document with the name "A". (lengthNorm is by default

Multiple Terms, Delete From Index

2006-09-26 Thread Josh Joy
Hi All, I need to delete from the index where 2 terms are matching, rather than just one term. For example, IndexReader reader = IndexReader.open(dir); Term[] terms = new Term[2]; terms[0] = new Term("city","city1"); terms[1] = new Term("state","state1"); reader.delete(terms); reader.close(); A

Re: spell checker

2006-09-26 Thread Chris Hostetter
I've added a FAQ that may help you with this, "How do i get code written for Lucene 1.4.x to work with Lucene 2.x?" http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-d09fdfc8a6335eab4e3f3dc8ac41a40a3666318e : Date: Tue, 26 Sep 2006 20:56:57 - : From: Chris Salem <[EMAIL PROTECTED]> : R

Re: cache persistent Hits

2006-09-26 Thread Chris Hostetter
: > IndexSearcher[] searchers; : > searchers=new IndexSearcher[3]; : > String path="/home/sn/public_html/"; : > searchers[0]=new IndexSearcher(path+"index1"); : > searchers[1]=new IndexSearcher(path+"index2"); : > searchers[2]=new IndexSearcher(path+"

Re: how to get results without getting total number of found documents?

2006-09-26 Thread Andrzej Bialecki
Vlad, Please check published papers on sampling inverted indexes and multi-level caching - this is most probably what Google and other major search engines use. You can see a simple implementation of this principle in Nutch - the index is sorted in decreasing order by a PageRank-like score (

RE: how to get results without getting total number of found documents?

2006-09-26 Thread Vladimir Olenin
Thanks, Mark, that clears things up a bit. No need to appologise - I am quite a novice with Lucene. To explain my concern a bit, assume that your inverted index is queried with 'or' query for the most 'common' terms (ie, after excluding such denominators as 'a', 'the', etc). Let's say, you have fo

Re: cache persistent Hits

2006-09-26 Thread Erick Erickson
Glad I could help. I don't read a word of German, but even I could see the 227 milliseconds at the bottom . Glad things are working for you. Erick On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote: Hi Erick, the problem was this piece of code I don't need anymore. for(int i=0;ihttp://www.suchste.

Re: how to get results without getting total number of found documents?

2006-09-26 Thread markharw00d
>>- get the top 1000 results WITHOUT executing query across whole data set (Apologies if this is telling something you are already fully aware of ) - Counting matches doesn't involve scanning the text of all the docs so may be less expensive than you think for a single index. It very quickly l

Re: Re[2]: how to enhance speed of sorted search

2006-09-26 Thread eks dev
Paul's Matcher in Jira will almost enable this, indirectly but possible - Original Message From: karl wettin <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 26 September, 2006 11:30:24 PM Subject: Re: Re[2]: how to enhance speed of sorted search On 9/26/06, Chris Hoste

how to get results without getting total number of found documents?

2006-09-26 Thread Vladimir Olenin
Hi. I couldn't find the answer to this question in the mailing list archive. In case I missed it, please let me know the keyword phrase I should be looking for, if not a direct link. All the 'Lucene' powered implementations I saw (well, primarily those utilizing Solr) return exact count of the

Re: Re[2]: how to enhance speed of sorted search

2006-09-26 Thread karl wettin
On 9/26/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: if you are seeing "slow" performance from sorted searches, the time spent "scoring" the results isn't the biggest contributor to how long thesearch takes -- it tends to be negligable for most queries. I've many times wished for a visiting

term OR term OR term OR .... query question

2006-09-26 Thread Vladimir Olenin
Hi. I have a question regarding Lucene scoring algorithm. Providing I have a query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d" and doc2 "d e", will doc1 score higher than doc2? In other words, does Lucene takes into account the number of terms matched in the document in case o

Re: cache persistent Hits

2006-09-26 Thread Gaston
Hi Erick, the problem was this piece of code I don't need anymore. for(int i=0;iNow it is very fast, thank you very much for your email that is written in detail. Here is my application, that still is in development phase. http://www.suchste.de Greetings Gaston P.S. The search for 'web' del

spell checker

2006-09-26 Thread Chris Salem
Does anyone have sample code on how to build a dictionary? I found this article online and but it uses version 1.4.3 and it doesn't seem to work on 2.0.0: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 Here's the code I have: indexReader = IndexReader.open(originalIndexDi

spell checker

2006-09-26 Thread Chris Salem
Does anyone have sample code on how to build a dictionary? I found this article online and but it uses version 1.4.3 and it doesn't seem to work on 2.0.0: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html?page=1 Here's the code I have: indexReader = IndexReader.open(originalIndexDi

Re: cache persistent Hits

2006-09-26 Thread Erick Erickson
See below. On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote: hi, first thank you for the fast reply. I use MultiSearcher that opens 3 indexes, so this makes the whole operation surly slower, but 20seconds for 5260 results out of an 212MB index is much too slow. Another reason can of course be m

Lucene In Action Book vs Lucene 2.0

2006-09-26 Thread KEGan
Hi, I have bought the Lucene In Action Book for more than a year now, and was using Lucene 1.x during that time. Now, I have a new project with Lucene and Lucene is now 2.0. Many APIs seems to have changed. I would like to ask the experts here, what are the important or substantial changes from

Re: cache persistent Hits

2006-09-26 Thread Gaston
hi, first thank you for the fast reply. I use MultiSearcher that opens 3 indexes, so this makes the whole operation surly slower, but 20seconds for 5260 results out of an 212MB index is much too slow. Another reason can of course be my ISP. Here is my code: IndexSearcher[] searcher

Re: Very high fieldNorm for a field resulting in bad results

2006-09-26 Thread Chris Hostetter
: The symptom: : Very high fieldNorm for field A.(explain output pasted below) The boost i am : applying to the troublesome field is 3.5 & the max boost applied per doc is : 1.8 : Given that information, the very high fieldNorm is very surprising to me. : Based on what I read, FieldNorm = 1 / s

Re[3]: how to enhance speed of sorted search

2006-09-26 Thread Chris Hostetter
: I am thinking should be this faster The ConstantScoreQuery wrapped arround the QueryFilter might in fact be faster then the raw query -- have your tried it to see? you might be able to shave a little bit of speed off by accessing the bits from the Filter directly and iterating over them yourse

Re: cache persistent Hits

2006-09-26 Thread Erick Erickson
Well, my index is over 1.4G, and others are reporting very large indexes in the 10s of gigabytes. So I suspect your index size isn't the issue. I'd be very, very, very surprised if it was. Three things spring immediately to mind. First, opening an IndexSearcher is a slow operation. Are you openi

Re: searching for the part of a term.

2006-09-26 Thread Chris Hostetter
: Since the overhead in first is the speed of the system, i think adopting : second method will be better. : : Is there any other solution for this problem?? Am i going in right : direction?? you're definitely on teh right path -- those are the two bigsolutions i can think of, which appraoch you

Re: Where to find drill-down examples (source code)

2006-09-26 Thread Simon Willnauer
Either you grap the next best svn client and check out the branch of 2.0 or you just download the source dist from a mirror. use this one http://mirrorspace.org/apache/lucene/java/ best regards simon On 9/26/06, djd0383 <[EMAIL PROTECTED]> wrote: I there a link to a zip file where I can ge

cache persistent Hits

2006-09-26 Thread Gaston
Hi, Lucene has itself volatile caching mechanism provided by a weak HashMap. Is there a possibilty to serialize the Hits Object? I think of a HashMap that for each found result, caches the first 100 results. Is it possible to implement such a feature or is there such an extension? My problem

Re: spell checker with lucene

2006-09-26 Thread Bill Taylor
On Sep 26, 2006, at 8:50 AM, Bhavin Pandya wrote: Hi, Do anybody have idea for spell checker in java. I want to use with lucene...but which must work well for phrases also... -Bhavin pandya When I googled "java spell check open source" I found http://jazzy.sourceforge.net/ I have looked at

Re: Where to find drill-down examples (source code)

2006-09-26 Thread djd0383
I there a link to a zip file where I can get the entire package of source files (version 2, please). I know I am able to view them in the Source Repository (http://svn.apache.org/viewvc/lucene/java/trunk/), but I do not really feel like going through each of those to download them all. I am look

Re: Ordered positions

2006-09-26 Thread Paul Elschot
On Tuesday 26 September 2006 17:57, Virlouvet Olivier wrote: > Hi > > In javadoc IndexReader.termPositions() maps to the definition : > Term=> >* >where returned enumeration is ordered by doc number. > > Are positions ordered for each doc or not ? The po

Re: spell checker with lucene

2006-09-26 Thread karl wettin
On 9/26/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: Hi, Do anybody have idea for spell checker in java. I want to use with lucene...but which must work well for phrases also... You are welcome to try this: https://issues.apache.org/jira/browse/LUCENE-626 it is good with phrases, is trained b

Re: Advice on Custom Sorting

2006-09-26 Thread Paul Lynch
Thanks again Erick for taking the time. I agree that the CachingWrapperFilter as described under "using a custom filter" in LIA is probably my best bet. I wanted to check if anything had been added in Lucene releases since the book was written I wasn't aware of. Cheers again. --- Erick Erickson

RE: Caused by: java.io.IOException: The handle is invalid

2006-09-26 Thread Van Nguyen
I'm running this on Windows 2003 server (NTFS). The Java VM version is 1.5.0_06. This exception is not consistent, but it is not intermittent either. It does not throw it at any particular point while rebuilding the index, but it will throw this exception at some point (it could be 1/3 way throu

Ordered positions

2006-09-26 Thread Virlouvet Olivier
Hi In javadoc IndexReader.termPositions() maps to the definition : Term=> >* where returned enumeration is ordered by doc number. Are positions ordered for each doc or not ? Thanks Olivier - Yahoo!

Re: searching in social networks

2006-09-26 Thread Otis Gospodnetic
Hi Sharad, I've done this on Simpy.com. You can see it in action if you create some Watchlists and Watchlists Filters in Simpy. The neighbourhood is pulled from the DB, and the search uses a MultiSearcher to search that neighbourhood. Otis - Original Message From: Sharad Agarwal <[EM

Re: How to remove duplicate records from result

2006-09-26 Thread Otis Gospodnetic
You could do it with a custom HitCollector, no? Otis - Original Message From: Bhavin Pandya <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, September 26, 2006 8:43:56 AM Subject: How to remove duplicate records from result Hi, I searched the index and i found say 100

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-09-26 Thread Otis Gospodnetic
Look at LingPipe from Alias-i.com. Look at Named Entity extraction and its classifiers. Otis - Original Message From: Vladimir Olenin <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, September 25, 2006 9:49:31 PM Subject: does anyone know of a 'smart' categorizing tex

Re: Caused by: java.io.IOException: The handle is invalid

2006-09-26 Thread Michael McCandless
Van Nguyen wrote: I only get this error when using the server version of jvm.dll with my JBoss app server… but when I use the client version of jvm.dll, the same index builds just fine. This is an odd error. Which OS are you running on? And, what kind of filesystem is the index directory

Re: spell checker with lucene

2006-09-26 Thread Otis Gospodnetic
Lucene-based one is described on the Wiki. Another one is the one from LingPipe. It may not be free, depending on what you do with it. Otis - Original Message From: Bhavin Pandya <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, September 26, 2006 8:50:14 AM Subject:

Re: searching for the part of a term.

2006-09-26 Thread heritrix . lucene
Hi, While i was searching forum for my problem for searching a substring, i got few very good links. http://www.gossamer-threads.com/lists/lucene/java-user/39753?search_string=Bitset%20filter;#39753 http://www.gossamer-threads.com/lists/lucene/java-user/7813?search_string=substring;#7813 http://w

spell checker with lucene

2006-09-26 Thread Bhavin Pandya
Hi, Do anybody have idea for spell checker in java. I want to use with lucene...but which must work well for phrases also... -Bhavin pandya

How to remove duplicate records from result

2006-09-26 Thread Bhavin Pandya
Hi, I searched the index and i found say 1000 records but out of that 1000 records i want to filter duplicate records based on value of one field. is there any way except looping through whole Hit object ? Because it wont work when number of hit is too large... Thanks. Bhavin pandya

Re: [Lucene2.0]How to not highlight keywords in some fields?

2006-09-26 Thread zhu jiang
Thx a lot! 2006/9/26, markharw00d <[EMAIL PROTECTED]>: Pass a field name to the QueryScorer constructor. See "testFieldSpecificHighlighting" method in the Junit test for the highlighter for an example. Cheers Mark zhu jiang wrote: > Hi all, > >For example, if I have a document with two f

Re: How to tell if IndexSearcher/IndexReader was closed?

2006-09-26 Thread Simon Willnauer
I guess there are many possibilities to implement some control structure to track the references to your searcher / reader. As it is best practice to have one single searcher open you can track the reference to the searcher while one reference is hold by the class you request your searcher from. I

How to tell if IndexSearcher/IndexReader was closed?

2006-09-26 Thread Frank Kunemann
Hi all, after I delete some entries from the index, I close the IndexSearcher to ensure that the changes are done. But after this I couldn't figure out a way to tell if the searcher is closed or not. Any ideas? Regards Frank -

Re[3]: how to enhance speed of sorted search

2006-09-26 Thread Yura Smolsky
Hello, Chris. CH> 3) most likely, if you are seeing "slow" performance from sorted searches, CH> the time spent "scoring" the results isn't the biggest contributor to how CH> long thesearch takes -- it tends to be negligable for most queries. A CH> better question is: are you reusing the exact sa

Re: [Lucene2.0]How to not highlight keywords in some fields?

2006-09-26 Thread markharw00d
Pass a field name to the QueryScorer constructor. See "testFieldSpecificHighlighting" method in the Junit test for the highlighter for an example. Cheers Mark zhu jiang wrote: Hi all, For example, if I have a document with two fields text and num like this: text:foo bar