Re: [ no subject ]

2009-04-30 Thread Anshum
As per my knowledge, you'd have to index one of the docs and then run a query (second doc) to get the similarity score. Also, the default similarity would take into account more factors than the regular VSM hence, you'd even have to look into it. You may write code that on the fly creates a volati

Re: dbsight

2009-04-30 Thread Otis Gospodnetic
- Original Message > From: Erik Hatcher > > On Apr 30, 2009, at 10:32 PM, Michael Masters wrote: > > Sweet! I'll look more into solr. I wasn't under the impression solr could > index a database like dbsight. > > It's not point-and-clickable, but Solr's DataImportHandler has sophistic

Re: dbsight

2009-04-30 Thread Erik Hatcher
On Apr 30, 2009, at 10:32 PM, Michael Masters wrote: Sweet! I'll look more into solr. I wasn't under the impression solr could index a database like dbsight. It's not point-and-clickable, but Solr's DataImportHandler has sophisticated configuration capabilities for indexing any JDBC acces

Re: dbsight

2009-04-30 Thread Michael Masters
Sweet! I'll look more into solr. I wasn't under the impression solr could index a database like dbsight. -Mike On Apr 30, 2009, at 4:42 PM, Grant Ingersoll wrote: Solr (http://lucene.apache.org/solr) can import from a DB, if that is what you are after. I haven't done a full feature com

Re: kamikaze

2009-04-30 Thread John Wang
You are right, Grant.Michael, Anmol, let's move this to the kamikaze mailing list: http://groups.google.com/group/kamikaze-users Michael, I have added you by default. -John On Thu, Apr 30, 2009 at 4:37 PM, Grant Ingersoll wrote: > Does Kamikaze have a mailing list? It seems like, to me anyway,

Re: dbsight

2009-04-30 Thread Grant Ingersoll
Solr (http://lucene.apache.org/solr) can import from a DB, if that is what you are after. I haven't done a full feature comparison between DB Sight and Solr, but it appears there is a fair amount of overlap based on the front page. HTH, Grant On Apr 30, 2009, at 3:36 PM, Michael Masters w

Re: How to get the similarity between two string vectors?

2009-04-30 Thread Grant Ingersoll
Yes and no. You can have a look at More Like This in the contrib package. Additionally, you can just get the TermVectors out of Lucene and write your own. You could use a MemoryIndex which contains one document and represent the other document as a query and the "search". But, no, there

Re: kamikaze

2009-04-30 Thread Grant Ingersoll
Does Kamikaze have a mailing list? It seems like, to me anyway, this conversation would be more appropriate for that list as it is about Kamikaze, not Lucene. -Grant On Apr 30, 2009, at 2:42 PM, molz wrote: Right on. -1 if not found, index in the sorted set if found. Anmol Michael M

dbsight

2009-04-30 Thread Michael Masters
I posted this on java-...@lucene.apache.org and it was suggested that I pose this question here: Hello Everyone, I just started to use lucene recently. Great project BTW. I was wondering if anyone has suggested making an open source version of dbsight (www.dbsight.net/). I've just started using i

RE: kamikaze

2009-04-30 Thread molz
Right on. -1 if not found, index in the sorted set if found. Anmol Michael Mastroianni wrote: > > Thanks, Anmol. Just so I'm clear on this: findWithIndex(foo) returns -1 > if foo is not found, and some positive integer if it is? > > regards, > Michael > > -Original Message- > Fro

How to get the similarity between two string vectors?

2009-04-30 Thread Kamal Najib
Hi, I am new to Lucene and I want to get the similarity between two vectors of strings,is there a method, who do that? for example if i have the vectors: Vector1 :<"term1","term2","term3"> Vector2:<"term4","term5","term5"> is there a method to get the similarity between them in lucene,or is there

RE: kamikaze

2009-04-30 Thread Michael Mastroianni
Thanks, Anmol. Just so I'm clear on this: findWithIndex(foo) returns -1 if foo is not found, and some positive integer if it is? regards, Michael -Original Message- From: molz [mailto:anmol.bha...@gmail.com] Sent: Thursday, April 30, 2009 3:33 PM To: java-user@lucene.apache.org Subject:

RE: kamikaze

2009-04-30 Thread molz
Hi, That method needs to be deprecated. Please use findWithIndex() instead. I will deprecate that method in the next release. Also, I will enable line numbers in it. Anmol Michael Mastroianni wrote: > > Hi-- Using the 1.0.7 jar file, I am having problems with occasional > ArrayIndexOutOfBo

Re: Indexing becomes slow with time

2009-04-30 Thread mark harwood
If you're CPU-bound - I've had issues before with GC in long-running indexing tasks loading very large volumes (100s of millions) of docs. I was seeing lots of CPU usage tied up in GC. I solved all these problems by firing batches of indexing activity off in seperate processes then immediately

RE: kamikaze

2009-04-30 Thread Michael Mastroianni
Hi-- Using the 1.0.7 jar file, I am having problems with occasional ArrayIndexOutOfBoundsExceptions and StackOverFlowErrors when trying to do a find in a P4DocIdSet. Here is a unit test that I can reliably get to generate a StackOverFlowError. Have you seen this before? Since I'm using the jar file

IndexSearcher and out of memory error

2009-04-30 Thread Bill.Chesky
Hello, I'm using Lucene 2.2.0. I've got a query class that wraps an IndexSearcher object. Right now, we create a new IndexSearcher each time my query class gets instantiated and then it gets used throughout the life of the query class. Multiple queries get made against the IndexSearcher object

[ no subject ]

2009-04-30 Thread Kamal Najib
Hi, A am new in Lucene and I want to get the similarity between two vectors of strings,is there a method, who do that? for example assume the vectors: Vector1 :<"term1","term2","term3"> Vector2:<"term4","term5","term5"> is there a method to get the similarity between them in lucene,or is ther

Re: Indexing becomes slow with time

2009-04-30 Thread Erick Erickson
This is surprising behavior, which is another way of saying that, given what you've said so far, this shouldn't be happening. I'd really look at system metrics, like whether you're swapping etc. In particular you might want to try varying how big you allow your memory footprint to grow before you f

RE: kamikaze

2009-04-30 Thread Michael Mastroianni
Hi Anmol-- Thanks for bringing up the version I was using: when I switched back to the official jar file, this test passed. The correctness problem only seems to exist in the snapshot I grabbed. The only reason I started using a snapshot was that the jar file had line numbers turned off in the co

RE: kamikaze

2009-04-30 Thread Michael Mastroianni
Hi Anmol-- 1. I'm using a recent snapshot of your svn repo (I tried using the jar file, but line numbers were turned off, and I couldn't debug at all: I can try with the jar file from your most recent release and see how it turns out) from something like 3 days ago. 2. I just tried the snippet yo

Re: Searching for partial matches

2009-04-30 Thread Ian Lea
Hi This is possible. There is an entry on wildcards in the FAQ. See also RegexQuery and search the mailing lists for ngrams. Depending on your setup and requirements you may need to be aware of the performance implications of wild card searching, particularly leading wildcards as will be requi

Searching for partial matches

2009-04-30 Thread Huntsman84
Hello, I am new to Lucene, and I don't know if it is possible to obtain results providing part of the keyword. For example, if I try to search "in", it should return all matches with "string", "meaning", "trinity"... Am I expecting too much? Thank you so much! -- View this me

Re: Indexing becomes slow with time

2009-04-30 Thread liat oren
Yes, I do run optimize... I did start looking at these tips in the last few days, but didn't think the optimize makes it so slow. Thanks! 2009/4/30 Ian Lea > Are you maybe running optimize after every n documents? There are > lots of tips in > http://wiki.apache.org/lucene-java/ImproveIndexin

Re: Phrase Highlighting

2009-04-30 Thread Michael McCandless
On Thu, Apr 30, 2009 at 12:15 AM, Max Lynch wrote: > You should switch to the SpanScorer (in o.a.l.search.highlighter). >> That fragment scorer should only match true phrase matches. >> >> Mike >> > > Thanks Mike.  I gave it a try and it wasn't working how I expected.  I am > using pylucene right

Re: Indexing becomes slow with time

2009-04-30 Thread Ian Lea
Are you maybe running optimize after every n documents? There are lots of tips in http://wiki.apache.org/lucene-java/ImproveIndexingSpeed. -- Ian. On Thu, Apr 30, 2009 at 8:29 AM, liat oren wrote: > Hi, > > I noticed that when I start to index, it indexes 7 documents a second. After > 30 minu

Indexing becomes slow with time

2009-04-30 Thread liat oren
Hi, I noticed that when I start to index, it indexes 7 documents a second. After 30 minutes it goes down to 3 documents a second. After two hours it becomes very slow (I stopped it when it arrived to 320MB and did 1 document in almost a minute) As you can see, it happens only after 2000, 3000 doc