Re: URL Tokenization

2010-06-25 Thread Sudha Verma
Thanks, That worked from Lucene API. Because the code is not fully released, some of it had build errors. Nothing big. I ran into a few compile errors because the path for some of the analysis classes got changed to standard/ or core/...A lot of the import statements in solr source from that trun

RE: URL Tokenization

2010-06-24 Thread Steven A Rowe
that, change directory to the root directory of the checked out working copy and apply the patch, like you did previously. Steve > -Original Message- > From: Sudha Verma [mailto:verma.su...@gmail.com] > Sent: Thursday, June 24, 2010 12:57 PM > To: java-user@lucene.apache.org &

Re: URL Tokenization

2010-06-24 Thread Sudha Verma
Hi Steve, Thanks for the quick reply and implementing support for URL tokenization. Another newbie question about applying this patch. I have the Lucene 3.0.2 source and I downloaded the patch and tried to apply it: lucene-3.0.2> patch -p0 < LUCENE-2167.patch Comes back with the error message:

RE: URL Tokenization

2010-06-23 Thread Steven A Rowe
Hi Sudha, There is such a tokenizer, named NewStandardTokenizer, in the most recent patch on the following JIRA issue: https://issues.apache.org/jira/browse/LUCENE-2167 It keeps (HTTP(S), FTP, and FILE) URLs together as single tokens, and e-mails too, in accordance with the relevant IETF R