Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
> Ah, so the fact that "1" actually appears many times in the string you > give Lucene is important. Neat application! > > Sounds like the custom Analyzer (really a custom TokenStream) approach > suggested by others may be the way for you to go. If the information > you get from the MySQL profile

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
> If you're willing to continue subsetting / summarizing the data out into > Lucene, how about subsetting it out into a dedicated MySQL instance for > this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int = > roughly 1 GB of data, which would easily fit into RAM. Queries should > be pret

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
Hi Erik Our lucene-powered music search went live this week, so your search should work now: http://www.last.fm/explore/search.php?q=Michael+Hedges Before we discovered lucene our search sucked *really* badly ;) Adding multiple fields like this is similar to what i'm doing now (i am using whites

Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
> I can think of a few ways. If elegance is your goal, then a little > relational database theory might help. Specifically, instead of having > one record per listener, have one record per listener-artist > combination, with three fields: listenerid, artistid, and count. Your > example above wo

Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage)

2005-11-02 Thread Richard Jones
Hi, I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for various things, and i've run into a situation that seems somewhat inelegant regarding populating fields which i already know the termvector for. I'm creating a document for each user (last.fm tracks music taste for pe