Re: Modifying IDF

2010-02-01 Thread Franz Allan Valencia See
Hmm My Analyzer is a Dictionary-based Analyzer. And so, it only recognizes tokens in its dictionary. Adding every url (or domain) is not a viable solution. So how could I include that to my analyzer? Lucene Filter? FilterReader? Thanks, -- Franz Allan Valencia See | Java Software Engineer

Re: Modifying IDF

2010-01-30 Thread Ian Lea
Are you asking how to get lucene.apache.org out of http://lucene.apache.org/ or how to get apache.org out of lucene.apache.org? The getHost() method of java.net.URL will give you the former. Or use a regexp. I don't know an easy way to do the latter, but depending on your requirements you could s

Re: Modifying IDF

2010-01-29 Thread Franz Allan Valencia See
How should I go about identifying the domain? Thanks, -- Franz Allan Valencia See | Java Software Engineer franz@gmail.com LinkedIn: http://www.linkedin.com/in/franzsee Twitter: http://www.twitter.com/franz_see On Fri, Jan 29, 2010 at 6:42 PM, Ian Lea wrote: > Instead of playing around wi

Re: Modifying IDF

2010-01-29 Thread Ian Lea
Instead of playing around with tf/idf, how about just indexing and searching the domain. -- Ian. On Fri, Jan 29, 2010 at 3:43 AM, Franz Allan Valencia See wrote: > Good day, > > I am currently using lucene for my searches. And one of the problems that Im > facing is when keyword is a url. The