Re: search within search

2006-11-03 Thread spinergywmy
Hi, Doron, thanks for the advice. regards, Wooi Meng -- View this message in context: http://www.nabble.com/search-within-search-tf2558237.html#a7171019 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Chris Hostetter
: It seems that there is no simple function to ask the weight for a term : in a document directly. So I decide not to iterate the documents of a as i said: it depends on what you mean by "term weight" ... : term or the terms of a document. I'm iterating the terms of the index, : searching for th

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Soeren Pekrul
Chris Hostetter wrote: I don't really know what a "term matrix" is, but when you ask about "weight' is it possible you are just looking for the TermDoc.freq() of the term/doc pair? Thank you Chris, that was also my first idea. I wanted to get the document frequency indexreader.docFreq(

Re: Intermittent search performance problem

2006-11-03 Thread Ben Dotte
Good suggestion, I tried watching the GCs in YourKit while testing but unfortunately they don't seem to line up with the searches that take forever. They also don't last long enough to make up that kind of time. I have our heap limited to 1GB right now and its using around 768MB of that. On 11/3/

Re: Intermittent search performance problem

2006-11-03 Thread Yonik Seeley
On 11/3/06, Ben Dotte <[EMAIL PROTECTED]> wrote: I'm trying to figure out a way to troubleshoot a performance problem we're seeing when searching against a memory-based index. What happens is we will run a search against the index and it generally returns in 1 second or less. But every once in a

RE: TooManyClauses with MultiTermQueries

2006-11-03 Thread Silvy Mathews
Hi All, I also need to resolve this issue. What is the best way to catch this exception? Thanks Mathews -Original Message- From: Eric Louvard [mailto:[EMAIL PROTECTED] Sent: Friday, November 03, 2006 8:36 AM To: java-user@lucene.apache.org Subject: TooManyClauses with MultiTermQueries He

Re: Modelling relational data in Lucene Index?

2006-11-03 Thread Chris Lu
I personally like your effort, but technically I would disagree. The SOLR project, and the project I am working on, DBSight, have an detached approach which is implementation agnostic, no matter if it's java, ruby, php, .net. The return results can be a rendered HTML, JSON, XML. I don't think yo

Re: Suspected problem in the QueryParser

2006-11-03 Thread Chris Hostetter
: When I enter the query: "Table AND NOT Chair" I get one hit, doc3 : When I enter the query: "Table AND (NOT Chair)" I get 0 hits. : : I had thought that both queries would return the same results. Is this a : bug, or, am I not understanding the query language correctly? it's a confusing eccen

Re: Modelling relational data in Lucene Index?

2006-11-03 Thread Emmanuel Bernard
Hi, What exactly are your concerned about the "non-detached" approach (see below)? Chris Lu wrote: I would prefer a detached approach instead of Hibernate or EJB's approach, which is kind of too tightly coupled with any system. How to it is probably going to be couple with yours ;-) rebuild

Re: Re: for admins: mailing list like spam

2006-11-03 Thread Mike Klaas
On 11/3/06, Patrick Turcotte <[EMAIL PROTECTED]> wrote: > > It will make mails list more easy to read (I am using gmail and I do > not have client-side filters). That is not true. You can have labels, and, if you look at the top of the page, right beside the "Search the Web" button, you have

Re: Modelling relational data in Lucene Index?

2006-11-03 Thread Emmanuel Bernard
Hi No, he is talking about http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html Also note that I'm about to release a new version much more flexible http://www.mail-archive.com/hibernate-dev%40lists.jboss.org/msg00392.html and for the future (but flexible) http://www.mail-a

Intermittent search performance problem

2006-11-03 Thread Ben Dotte
Hi, I'm trying to figure out a way to troubleshoot a performance problem we're seeing when searching against a memory-based index. What happens is we will run a search against the index and it generally returns in 1 second or less. But every once in a while it takes 15-20 seconds for the exact sa

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Chris Hostetter
I don't really know what a "term matrix" is, but when you ask about "weight' is it possible you are just looking for the TermDoc.freq() of the term/doc pair? : Date: Thu, 02 Nov 2006 12:45:30 +0100 : From: Soeren Pekrul <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@

Re: Announcement: Lucene powering Monster job search index (Beta)

2006-11-03 Thread Peter Keegan
Daniel, Yes, this is correct if you happen to be doing a radius search and sorting by mileage. Peter On 11/3/06, Daniel Rosher <[EMAIL PROTECTED]> wrote: Hi Peter, Does this mean you are calculating the euclidean distance twice ... once for the HitCollecter to filter 'out of range' documents,

Re: search within search

2006-11-03 Thread Doron Cohen
spinergywmy <[EMAIL PROTECTED]> wrote on 03/11/2006 00:40:42: >I have another problem is I do not perform the real search within search > feature which according to the way that I have coded, because for the second > time searching, I actually go back to the index directory to search the > ent

Re: Announcement: Lucene powering Monster job search index (Beta)

2006-11-03 Thread Peter Keegan
Paramasivam, Take a look at Solr, in particular the DocSetHitCollector class. The collector simply sets a bit in a BitSet, or saves the docIds in an array (for low hit counts). Solr's BitSet was optimized (by Yonik, I believe) to be faster than Java's BitSet, so this HitCollector is very fast. Th

Multi valued fields

2006-11-03 Thread Seeta Somagani
Hi all, Our company has a set of assets and we use meta-data (XML files) to describe each asset. My job is to index and search over the meta-data associated with the assets. The interesting aspect of my problem is that an asset can have more than one meta-data file associated with it, depending

TooManyClauses with MultiTermQueries

2006-11-03 Thread Eric Louvard
Hello, in working with Lucene since several years. One of my biggest problem was the unability of lucene to search with wildcard. Also I have develop my own MultiTermQueries. Now there's a standard class for this, but you'll allways become an exception if your search is to generic, 'a*' for ex

Re: Announcement: Lucene powering Monster job search index (Beta)

2006-11-03 Thread Daniel Rosher
Hi Peter, Does this mean you are calculating the euclidean distance twice ... once for the HitCollecter to filter 'out of range' documents, and then again for the custom Comparator to sort the returned documents? especially since the filtering is done outside Lucene? Regards, Dan Joe, Fields

RE: Any experience with spring's lucene support?

2006-11-03 Thread Vladimir Olenin
Haven't used them, but had a look at them some time ago. Seems like a nice set of helper factory classes to manage Lucene engine through Spring IoC. Can't do much wrong in here I guess... If you'd be using Spring in your app, you'd have to come up with similar factories either way, so probably it'd

RE: experiences with lingpipe

2006-11-03 Thread Vladimir Olenin
> You need to increase the memory for java. I think 32-bit jave is limited to a 1.3 gig heap but > could be wrong. No heuristics at the tip of my fingers. 32-bit JVM under Linux/Windows. Solaris runs OK. Limit on the heap is ~1.7 - 1.8Gb. -Original Message- From: Breck Baldwin [mailto:[EM

Re: experiences with lingpipe

2006-11-03 Thread Breck Baldwin
Martin Braun wrote: Hi Breck, i have tried your tutorial and built (hopefully) a successful SpellCheck.model File with 49M. My Lucene Index directory is 2,4G. When I try to read the Model with the readmodel function, i get an "Exception in thread "main" java.lang.OutOfMemoryError: Java heap sp

Re: for admins: mailing list like spam

2006-11-03 Thread Patrick Turcotte
It will make mails list more easy to read (I am using gmail and I do not have client-side filters). That is not true. You can have labels, and, if you look at the top of the page, right beside the "Search the Web" button, you have a "create filter" link. Patrick

Re: Modelling relational data in Lucene Index?

2006-11-03 Thread Erick Erickson
One thing it took me a while to grasp, and is not automatic for folks with significant database backgrounds is that the fields in a Lucene document are only related to those of any other document by the meaning you, as a programmer, understand. That is, document 1 may have fields a, b, c. Document

Suspected problem in the QueryParser

2006-11-03 Thread Lucifer Hammer
Hi, I recently stumbled across what I think might be a bug in the QueryParser. Before I enter it as a bug, I wanted to run it by this group to see if I'm just not looking at the boolean expression correctly. Here's the issue: I created an index with 5 documents, all have one field: "text", with

Re: simple (?) question about scoring

2006-11-03 Thread Michele Amoretti
Yes! I modified the example to be compliant with 2.1 api, and I added the hits.score() call, for each discovered results. It works! [java] Hits for "freedom" were found in quotes by: [java] 1. Mohandas Gandhi with score = 0.53033006 [java] 2. Ayn Rand with score = 0.25 [java]

Re: for admins: mailing list like spam

2006-11-03 Thread Erik Hatcher
On Nov 3, 2006, at 3:20 AM, Michele Amoretti wrote: why not to put a [LUCENE USER] automatic tag at the beginning of e-mails subjects? Because the To and Reply-to headers indicate the list. All Apache e- mail lists operate the same, and we are not going to change this behavior. E

Re: simple (?) question about scoring

2006-11-03 Thread Michele Amoretti
http://javatechniques.com/public/java/docs/basics/lucene-memory-search.html is this good? it seems to be good.. On 11/3/06, Michele Amoretti <[EMAIL PROTECTED]> wrote: Ok, sorry I did not read it in depth. Now, where can I find an example of: - building the RAMDirectory - scoring all document

Re: simple (?) question about scoring

2006-11-03 Thread Michele Amoretti
Ok, sorry I did not read it in depth. Now, where can I find an example of: - building the RAMDirectory - scoring all documents against the query? thanks On 11/3/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I have a question: is the score for a document different if I have : only that doc

Re: Announcement: Lucene powering Monster job search index (Beta)

2006-11-03 Thread Paramasivam Srinivasan
Hi Peter When I use the CustomHitCollector, it affect the application performance. Also how you accomplish the grouping the results with out affecting performance. Also If possible give some code snippet for custome hitcollector. TIA Sri "Peter Keegan" <[EMAIL PROTECTED]> wrote in message n

Re: search within search

2006-11-03 Thread spinergywmy
Hi, Doron, good call, thanks. I have another problem is I do not perform the real search within search feature which according to the way that I have coded, because for the second time searching, I actually go back to the index directory to search the entire indeces again rather then cache

for admins: mailing list like spam

2006-11-03 Thread Michele Amoretti
Hi, why not to put a [LUCENE USER] automatic tag at the beginning of e-mails subjects? It will make mails list more easy to read (I am using gmail and I do not have client-side filters). -- Michele Amoretti, Ph.D. Distributed Systems Group Dipartimento di Ingegneria dell'Informazione Università

Re: simple (?) question about scoring

2006-11-03 Thread Paul Elschot
Michele, On Friday 03 November 2006 07:07, Michele Amoretti wrote: > I have a question: is the score for a document different if I have > only that document in my index, or if I have N documents? > If the answer is yes, I will put all N documents together, otherwise I > will evaluate them one by o