Re: JDK version with lucene 1.4.3

2005-11-02 Thread Otis Gospodnetic
Just use 1.5 if you can. I've been using it for months on simpy.com along with Lucene 1.9 and haven't had any problems so far. Otis --- "Sharma, Siddharth" <[EMAIL PROTECTED]> wrote: > I have downloaded Lucene 1.4.3 > I am trying to narrow down on the JRE version to use. > We have the flexibili

Re: Use spell checker in lucene?

2005-11-02 Thread Otis Gospodnetic
Xin, Look for a Lucene-based spell checked in Lucene's contrib directory (in SVN). Otis --- Xin Herbert Wu <[EMAIL PROTECTED]> wrote: > Anyone plug-in a spell checker into lucene to implement google-like > function > "do you mean .?" for wrong spelled word or phrase? > > Also, which spell ch

RE: Lucene or DB?

2005-11-02 Thread lmuxer-mailinglists
Thanks. I will take a look at those classes. I do need to support search queries like: - Find all files that are named foo.doc. - Find all the files that have not been accessed in last 6 months(atime). - Find all PDF files with size > 2 MB The HW requirements are flexible in terms of memory and CP

RE: Lucene or DB?

2005-11-02 Thread Pasha Bizhan
Hi, > From: [EMAIL PROTECTED] > > I am looking at Lucene to index and search file metadata - > filename, size, permissions, mtime, ctime, atime, etc. > > I do not need to index and search the contents of the file. I > was wondering if Lucene is the right choice for such an > application. Th

Lucene or DB?

2005-11-02 Thread lmuxer-mailinglists
Hi, I am looking at Lucene to index and search file metadata - filename, size, permissions, mtime, ctime, atime, etc. I do not need to index and search the contents of the file. I was wondering if Lucene is the right choice for such an application. This will be at enterprise level so there could

RE: lucene jar and war

2005-11-02 Thread Sharma, Siddharth
Place the lucene jar file in the WEB-INF/lib directory of your web application prior to creating its war. If your ISP inspects the war and removes all jar files within it, then I suppose you might just have to place all the lucene classes under WEB-INF/classes of your web application as 'loose cla

Use spell checker in lucene?

2005-11-02 Thread Xin Herbert Wu
Anyone plug-in a spell checker into lucene to implement google-like function "do you mean .?" for wrong spelled word or phrase? Also, which spell checker product is good? Thanks! -Xin

lucene jar and war

2005-11-02 Thread Gaston
Hello, My provider only allows to upload war files. My problem is I make a war archive out of the lucene-1.4.3.jar file and my jsp webpages based on lucene. And this does not work. I hava one solution to solve my problem: I have to unpack the lucene-1.4.3.jar file and pack it again with my .j

JDK version with lucene 1.4.3

2005-11-02 Thread Sharma, Siddharth
I have downloaded Lucene 1.4.3 I am trying to narrow down on the JRE version to use. We have the flexibility to use 1.3.1 up. Which JVM will be the best for running Lucene? I saw a note on the FAQ that said that Lucene will run on 1.3.1 but will require 1.4 to compile. Why would anyone want to com

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
> Ah, so the fact that "1" actually appears many times in the string you > give Lucene is important. Neat application! > > Sounds like the custom Analyzer (really a custom TokenStream) approach > suggested by others may be the way for you to go. If the information > you get from the MySQL profile

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Michael D. Curtin
Richard Jones wrote: If you're willing to continue subsetting / summarizing the data out into Lucene, how about subsetting it out into a dedicated MySQL instance for this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int = roughly 1 GB of data, which would easily fit into RAM. Queries

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
> If you're willing to continue subsetting / summarizing the data out into > Lucene, how about subsetting it out into a dedicated MySQL instance for > this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int = > roughly 1 GB of data, which would easily fit into RAM. Queries should > be pret

Re: Vector Model and Relevance Feedback

2005-11-02 Thread mark harwood
> since Lucene doesn't > currently support negative boosts See here for an approach to negative boosts: http://wiki.apache.org/jakarta-lucene/CommunityContributions Cheers Mark ___ Yahoo! Messenger

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Michael D. Curtin
Richard Jones wrote: The data i'm dealing with is stored over a few mysql dbs on different machines, horizontally partitioned so each user is assigned to a single db. The queries i'm doing can be done in SQL in parallel over all machines then combined, which i've tested - it's unacceptably slo

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Grant Ingersoll
Not sure if this is feasible, but is there someway you could use a "fake" analyzer that you constructed using your hashtable/termvector and then have it output the tokens directly from the hashtable via the TokenStream? Maybe you would have to pass in an empty/dummy string to the field constru

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
Hi Erik Our lucene-powered music search went live this week, so your search should work now: http://www.last.fm/explore/search.php?q=Michael+Hedges Before we discovered lucene our search sucked *really* badly ;) Adding multiple fields like this is similar to what i'm doing now (i am using whites

Re: Vector Model and Relevance Feedback

2005-11-02 Thread Grant Ingersoll
Others can correct me if I am wrong, but I don't think a "pure" Rochio feedback loop is possible in the current state, since Lucene doesn't currently support negative boosts (http://lucene.apache.org/java/docs/queryparsersyntax.html). Having said that, what we do, in a nutshell is similar to w

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
> I can think of a few ways. If elegance is your goal, then a little > relational database theory might help. Specifically, instead of having > one record per listener, have one record per listener-artist > combination, with three fields: listenerid, artistid, and count. Your > example above wo

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Erik Hatcher
On 2 Nov 2005, at 08:10, Richard Jones wrote: If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5 times and Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3" I use this index for quickly finding "top fans" of an artist or combination of

Re: Vector Model and Relevance Feedback

2005-11-02 Thread Ian Soboroff
Stefan Gusenbauer <[EMAIL PROTECTED]> writes: > Is there an add on for lucene to get a real vector representation? > Does anyone has experiences with this issue? No code, but some small thinking. You can do hacks with boosts and whatnot, but I think in the end you really want a new Query subclas

Vector Model and Relevance Feedback

2005-11-02 Thread Stefan Gusenbauer
I've some thoughts about Lucene and Relevance Feedback. I want to implement some variation of the Roccio Formula and there is the problem. The formula is like this: Query(new) = alpha * Query(old) + beta * Sum(Relevant Documents) - gamma * Sum(Non Relevant Documents) The relevant documents in

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Michael D. Curtin
Richard Jones wrote: Hi, I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for various things, and i've run into a situation that seems somewhat inelegant regarding populating fields which i already know the termvector for. I'm creating a document for each user (last.fm t

Indexing derived data

2005-11-02 Thread Max Pfingsthorn
Dear fellow users, I was wondering if anyone is using Lucene right now to index data derived from business object models. My general problem is to index data which may be the result of an expensive computation involving a graph of objects (for example computing which customer has which items in

Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
Hi, I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for various things, and i've run into a situation that seems somewhat inelegant regarding populating fields which i already know the termvector for. I'm creating a document for each user (last.fm tracks music taste for pe

Re: indexing records in hierachy

2005-11-02 Thread Paul . Illingworth
Hello, You could try looking at http://www.nabble.com/Hierarchical-Documents-t242604.html#a677841 where this has been discussed a little before. Regards Paul I. Urvashi Gadi