Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng
Oh, thanks. Shuai On Fri, 20 Aug 2010, Otis Gospodnetic wrote: Hi, Are you actually talking about Solr? Sounds like it. Check solr-u...@lucene list. Maybe you need to treat those words are protected words? See the protwords.txt file in the conf dir. Otis Sematext :: http://sematext

Re: lucene indexing configuration

2010-08-20 Thread Otis Gospodnetic
Hi, Are you actually talking about Solr? Sounds like it. Check solr-u...@lucene list. Maybe you need to treat those words are protected words? See the protwords.txt file in the conf dir. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://se

Re: Calculate Term Co-occurrence Matrix

2010-08-20 Thread Otis Gospodnetic
There is also a non-Mahout Key Phrase Extractor for Collocations, SIPs, and a few other things: http://sematext.com/products/key-phrase-extractor/index.html One of the demos that uses news data is at http://sematext.com/demo/kpe/index.html Otis Sematext :: http://sematext.com/ :: Solr - Lu

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng
Hey, Currently we have indexed some biological full text pages, I was wondering how to config the schema.xml such that the gene names 'met1', 'met2', 'met3' will be treated as different words. Currently they are all mapped to 'met'. Thanks, Shuai

RE: Tokenization / Analyzer question

2010-08-20 Thread Beard, Brian
So I've been thinking about this more and what seems most plausible is to just use Store.NO for the delimiters, so they will have payloads encoded correctly but not affect the stored data, and store separate instances of the subId information which can be retrieved along the boundaries. There shoul

Re: How to convert WAR application into console application (Making Unicorn has console application)

2010-08-20 Thread Erick Erickson
Please do not cross-post to multiple users lists with questions like this, it's considered quite rude. Take the time to find the appropriate list and post to that one. Erick On Fri, Aug 20, 2010 at 7:44 AM, Ranjith wrote: > Hi all, > Hi all, Unicorn just provide a URI and push the button. I

Re: Calculate Term Co-occurrence Matrix

2010-08-20 Thread Grant Ingersoll
You might also be interested in Mahout's collocations package: http://cwiki.apache.org/confluence/display/MAHOUT/Collocations -Grant On Aug 19, 2010, at 11:39 AM, ahmed algohary wrote: > Hi all, > > I need to know if there is a Lucene plug-in or a Lucene-based API for > calculating the term co-

Re: asking about incremental update

2010-08-20 Thread Grant Ingersoll
On Aug 19, 2010, at 7:55 AM, Yakob wrote: > do you reckon I should use a timer or a thread instead to periodically > update the index? That's likely what most people do, setup something to watch a directory or check a timestamp. If your data is in a DB, then you can do a query to get what's c

How to convert WAR application into console application (Making Unicorn has console application)

2010-08-20 Thread Ranjith
Hi all, Hi all, Unicorn just provide a URI and push the button. It will call a series of validation services and report the results.I have already downloaded and installed Unicorn. To Download the source code it is only available for download from the Mercurial repository. To download it, use