date:20100820

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng

Oh, thanks. Shuai On Fri, 20 Aug 2010, Otis Gospodnetic wrote: Hi, Are you actually talking about Solr? Sounds like it. Check solr-u...@lucene list. Maybe you need to treat those words are protected words? See the protwords.txt file in the conf dir. Otis Sematext :: http://sematext

Re: lucene indexing configuration

2010-08-20 Thread Otis Gospodnetic

Hi, Are you actually talking about Solr? Sounds like it. Check solr-u...@lucene list. Maybe you need to treat those words are protected words? See the protwords.txt file in the conf dir. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://se

Re: Calculate Term Co-occurrence Matrix

2010-08-20 Thread Otis Gospodnetic

There is also a non-Mahout Key Phrase Extractor for Collocations, SIPs, and a few other things: http://sematext.com/products/key-phrase-extractor/index.html One of the demos that uses news data is at http://sematext.com/demo/kpe/index.html Otis Sematext :: http://sematext.com/ :: Solr - Lu

Re: lucene indexing configuration

2010-08-20 Thread Shuai Weng

Hey, Currently we have indexed some biological full text pages, I was wondering how to config the schema.xml such that the gene names 'met1', 'met2', 'met3' will be treated as different words. Currently they are all mapped to 'met'. Thanks, Shuai

RE: Tokenization / Analyzer question

2010-08-20 Thread Beard, Brian

So I've been thinking about this more and what seems most plausible is to just use Store.NO for the delimiters, so they will have payloads encoded correctly but not affect the stored data, and store separate instances of the subId information which can be retrieved along the boundaries. There shoul

Re: How to convert WAR application into console application (Making Unicorn has console application)

2010-08-20 Thread Erick Erickson

Please do not cross-post to multiple users lists with questions like this, it's considered quite rude. Take the time to find the appropriate list and post to that one. Erick On Fri, Aug 20, 2010 at 7:44 AM, Ranjith wrote: > Hi all, > Hi all, Unicorn just provide a URI and push the button. I

Re: Calculate Term Co-occurrence Matrix

2010-08-20 Thread Grant Ingersoll

You might also be interested in Mahout's collocations package: http://cwiki.apache.org/confluence/display/MAHOUT/Collocations -Grant On Aug 19, 2010, at 11:39 AM, ahmed algohary wrote: > Hi all, > > I need to know if there is a Lucene plug-in or a Lucene-based API for > calculating the term co-

Re: asking about incremental update

2010-08-20 Thread Grant Ingersoll

On Aug 19, 2010, at 7:55 AM, Yakob wrote: > do you reckon I should use a timer or a thread instead to periodically > update the index? That's likely what most people do, setup something to watch a directory or check a timestamp. If your data is in a DB, then you can do a query to get what's c

How to convert WAR application into console application (Making Unicorn has console application)

2010-08-20 Thread Ranjith

Hi all, Hi all, Unicorn just provide a URI and push the button. It will call a series of validation services and report the results.I have already downloaded and installed Unicorn. To Download the source code it is only available for download from the Mercurial repository. To download it, use

Re: lucene indexing configuration

Re: lucene indexing configuration

Re: Calculate Term Co-occurrence Matrix

Re: lucene indexing configuration

RE: Tokenization / Analyzer question

Re: How to convert WAR application into console application (Making Unicorn has console application)

Re: Calculate Term Co-occurrence Matrix

Re: asking about incremental update

How to convert WAR application into console application (Making Unicorn has console application)

9 matches

Site Navigation

Mail list logo

Footer information