As per my knowledge, you'd have to index one of the docs and then run a
query (second doc) to get the similarity score.
Also, the default similarity would take into account more factors than the
regular VSM hence, you'd even have to look into it.
You may write code that on the fly creates a volati
- Original Message
> From: Erik Hatcher
>
> On Apr 30, 2009, at 10:32 PM, Michael Masters wrote:
> > Sweet! I'll look more into solr. I wasn't under the impression solr could
> index a database like dbsight.
>
> It's not point-and-clickable, but Solr's DataImportHandler has sophistic
On Apr 30, 2009, at 10:32 PM, Michael Masters wrote:
Sweet! I'll look more into solr. I wasn't under the impression solr
could index a database like dbsight.
It's not point-and-clickable, but Solr's DataImportHandler has
sophisticated configuration capabilities for indexing any JDBC
acces
Sweet! I'll look more into solr. I wasn't under the impression solr
could index a database like dbsight.
-Mike
On Apr 30, 2009, at 4:42 PM, Grant Ingersoll
wrote:
Solr (http://lucene.apache.org/solr) can import from a DB, if that
is what you are after. I haven't done a full feature com
You are right, Grant.Michael, Anmol, let's move this to the kamikaze mailing
list:
http://groups.google.com/group/kamikaze-users
Michael, I have added you by default.
-John
On Thu, Apr 30, 2009 at 4:37 PM, Grant Ingersoll wrote:
> Does Kamikaze have a mailing list? It seems like, to me anyway,
Solr (http://lucene.apache.org/solr) can import from a DB, if that is
what you are after. I haven't done a full feature comparison between
DB Sight and Solr, but it appears there is a fair amount of overlap
based on the front page.
HTH,
Grant
On Apr 30, 2009, at 3:36 PM, Michael Masters w
Yes and no. You can have a look at More Like This in the contrib
package. Additionally, you can just get the TermVectors out of Lucene
and write your own. You could use a MemoryIndex which contains one
document and represent the other document as a query and the "search".
But, no, there
Does Kamikaze have a mailing list? It seems like, to me anyway, this
conversation would be more appropriate for that list as it is about
Kamikaze, not Lucene.
-Grant
On Apr 30, 2009, at 2:42 PM, molz wrote:
Right on.
-1 if not found, index in the sorted set if found.
Anmol
Michael M
I posted this on java-...@lucene.apache.org and it was suggested that
I pose this question here:
Hello Everyone,
I just started to use lucene recently. Great project BTW. I was
wondering if anyone has suggested making an open source version of
dbsight (www.dbsight.net/). I've just started using i
Right on.
-1 if not found, index in the sorted set if found.
Anmol
Michael Mastroianni wrote:
>
> Thanks, Anmol. Just so I'm clear on this: findWithIndex(foo) returns -1
> if foo is not found, and some positive integer if it is?
>
> regards,
> Michael
>
> -Original Message-
> Fro
Hi,
I am new to Lucene and I want to get the similarity between two vectors of
strings,is there a method, who do that?
for example if i have the vectors:
Vector1 :<"term1","term2","term3">
Vector2:<"term4","term5","term5">
is there a method to get the similarity between them in lucene,or is there
Thanks, Anmol. Just so I'm clear on this: findWithIndex(foo) returns -1
if foo is not found, and some positive integer if it is?
regards,
Michael
-Original Message-
From: molz [mailto:anmol.bha...@gmail.com]
Sent: Thursday, April 30, 2009 3:33 PM
To: java-user@lucene.apache.org
Subject:
Hi,
That method needs to be deprecated. Please use findWithIndex() instead. I
will deprecate that method in the next release. Also, I will enable line
numbers in it.
Anmol
Michael Mastroianni wrote:
>
> Hi-- Using the 1.0.7 jar file, I am having problems with occasional
> ArrayIndexOutOfBo
If you're CPU-bound - I've had issues before with GC in long-running indexing
tasks loading very large volumes (100s of millions) of docs. I was seeing lots
of CPU usage tied up in GC.
I solved all these problems by firing batches of indexing activity off in
seperate processes then immediately
Hi-- Using the 1.0.7 jar file, I am having problems with occasional
ArrayIndexOutOfBoundsExceptions and StackOverFlowErrors when trying to
do a find in a P4DocIdSet. Here is a unit test that I can reliably get
to generate a StackOverFlowError. Have you seen this before? Since I'm
using the jar file
Hello,
I'm using Lucene 2.2.0. I've got a query class that wraps an
IndexSearcher object. Right now, we create a new IndexSearcher each
time my query class gets instantiated and then it gets used throughout
the life of the query class. Multiple queries get made against the
IndexSearcher object
Hi,
A am new in Lucene and I want to get the similarity between two vectors of
strings,is there a method, who do that?
for example assume the vectors:
Vector1 :<"term1","term2","term3">
Vector2:<"term4","term5","term5">
is there a method to get the similarity between them in lucene,or is ther
This is surprising behavior, which is another way of saying that,
given what you've said so far, this shouldn't be happening. I'd
really look at system metrics, like whether you're swapping
etc. In particular you might want to try varying how big you
allow your memory footprint to grow before you f
Hi Anmol--
Thanks for bringing up the version I was using: when I switched back to
the official jar file, this test passed. The correctness problem only
seems to exist in the snapshot I grabbed.
The only reason I started using a snapshot was that the jar file had
line numbers turned off in the co
Hi Anmol--
1. I'm using a recent snapshot of your svn repo (I tried using the jar
file, but line numbers were turned off, and I couldn't debug at all: I
can try with the jar file from your most recent release and see how it
turns out) from something like 3 days ago.
2. I just tried the snippet yo
Hi
This is possible. There is an entry on wildcards in the FAQ. See
also RegexQuery and search the mailing lists for ngrams.
Depending on your setup and requirements you may need to be aware of
the performance implications of wild card searching, particularly
leading wildcards as will be requi
Hello,
I am new to Lucene, and I don't know if it is possible to obtain results
providing part of the keyword.
For example, if I try to search "in", it should return all matches with
"string", "meaning", "trinity"...
Am I expecting too much?
Thank you so much!
--
View this me
Yes, I do run optimize...
I did start looking at these tips in the last few days, but didn't think the
optimize makes it so slow.
Thanks!
2009/4/30 Ian Lea
> Are you maybe running optimize after every n documents? There are
> lots of tips in
> http://wiki.apache.org/lucene-java/ImproveIndexin
On Thu, Apr 30, 2009 at 12:15 AM, Max Lynch wrote:
> You should switch to the SpanScorer (in o.a.l.search.highlighter).
>> That fragment scorer should only match true phrase matches.
>>
>> Mike
>>
>
> Thanks Mike. I gave it a try and it wasn't working how I expected. I am
> using pylucene right
Are you maybe running optimize after every n documents? There are
lots of tips in
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
--
Ian.
On Thu, Apr 30, 2009 at 8:29 AM, liat oren wrote:
> Hi,
>
> I noticed that when I start to index, it indexes 7 documents a second. After
> 30 minu
Hi,
I noticed that when I start to index, it indexes 7 documents a second. After
30 minutes it goes down to 3 documents a second.
After two hours it becomes very slow (I stopped it when it arrived to 320MB
and did 1 document in almost a minute)
As you can see, it happens only after 2000, 3000 doc
26 matches
Mail list logo