subject:"Custom tokenizer"

RE: Custom tokenizer

2015-01-12 Thread Uwe Schindler

gt; > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: Vihari Piratla [mailto:viharipira...@gmail.com] > > >

Re: Custom tokenizer

2015-01-12 Thread Vihari Piratla

.com] > > Sent: Monday, January 12, 2015 8:51 AM > > To: java-user@lucene.apache.org > > Subject: Custom tokenizer > > > > Hi, > > I am trying to implement a custom tokenizer for my application and I have > > few queries regarding the same. > > 1. Is

RE: Custom tokenizer

2015-01-12 Thread Uwe Schindler

iratla [mailto:viharipira...@gmail.com] > Sent: Monday, January 12, 2015 8:51 AM > To: java-user@lucene.apache.org > Subject: Custom tokenizer > > Hi, > I am trying to implement a custom tokenizer for my application and I have > few queries regarding the same. > 1. Is ther

Custom tokenizer

2015-01-11 Thread Vihari Piratla

Hi, I am trying to implement a custom tokenizer for my application and I have few queries regarding the same. 1. Is there a way to provide an existing analyzer (say EnglishAnanlyzer) the custom tokenizer and make it use this tokenizer instead of say StandardTokenizer? 2. Why are analyzers such as

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Ahmet Arslan

wrote: Hi, I have a requirement to write a custom tokenizer using Lucene framework. My requirement is it should have capabilities to match multiple words as one token. for example. When user passes String as International Business machine logo or IBM logo it should return International Business

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Michael McCandless

If you already know the set of phrases you need to detect then you can use Lucene's SynonymFilter to spot them and insert a new token. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 20, 2014 at 7:21 AM, Benson Margulies wrote: > It sounds like you've been asked to implement Named E

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Benson Margulies

It sounds like you've been asked to implement Named Entity Recognition. OpenNLP has some capability here. There are also, um, commercial alternatives. On Thu, Feb 20, 2014 at 6:24 AM, Yann-Erwan Perio wrote: > On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar > wrote: > > Hi, > > > My requirement

Re: Custom Tokenizer/Analyzer

2014-02-20 Thread Yann-Erwan Perio

On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar wrote: Hi, > My requirement is it should have capabilities to match multiple words as > one token. for example. When user passes String as International Business > machine logo or IBM logo it should return International Business Machine as > one tok

Custom Tokenizer/Analyzer

2014-02-20 Thread Geet Gangwar

Hi, I have a requirement to write a custom tokenizer using Lucene framework. My requirement is it should have capabilities to match multiple words as one token. for example. When user passes String as International Business machine logo or IBM logo it should return International Business Machine

Re: Custom Tokenizer

2013-12-05 Thread Erick Erickson

You can also string together one of a myriad of TokenFilters, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters I'd recommend spending some time on the admin/analysis page to understand what all the combinations do. I'd also recommend against dealing with punctuation etc by using wi

Re: Custom Tokenizer

2013-12-05 Thread Furkan KAMACI

Hi; Standard tokenizer includes of that bydefault: StandardFilter, LowerCaseFilter and StopFilter You can consider char filters. Did you read here: https://cwiki.apache.org/confluence/display/solr/CharFilterFactories Thanks; Furkan KAMACI 2013/12/5 > Hi, > > I have used StandardAnalyzer in

Custom Tokenizer

2013-12-05 Thread raghavendra.k.rao

Hi, I have used StandardAnalyzer in my code and it is working fine. One of the challenges that I face is the fact that, this Analyzer by default tokenizes on some special characters such as hyphen, apart from the SPACE character. I want to tokenize only on the SPACE character. Could you please

TermPositions with custom Tokenizer

2009-10-01 Thread Christopher Tignor

Hello, I have created a custom Tokenizer and am trying to set and extract my own positions for each Token using: reusableToken.reinit(word.getWord(),tokenStart,tokenEnd); later when querying my index using a SpanTermQuery the start() and end() tags don't correspond to these values but se

RE: Token offset values for custom Tokenizer

2007-07-16 Thread Ard Schrijvers

Hello, The issue is about lucene 1.9. Can you test it with lucene 2.2? Perhaps the issue is already addressed and solved... Regards Ard > > Thank you for the reply Ard, > > The tokens exist in the index and are returned accurately, except for > the offsets. In this case I am not dealing with

Re: Token offset values for custom Tokenizer

2007-07-16 Thread Shahan Khatchadourian

The issue continues to exist with nightly 146 from Jul 10, 2007. http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/146/ Ard Schrijvers wrote: Hello, The issue is about lucene 1.9. Can you test it with lucene 2.2? Perhaps the issue is already addressed and solved... Regards Ard

Re: Token offset values for custom Tokenizer

2007-07-16 Thread Shahan Khatchadourian

Thank you for the reply Ard, The tokens exist in the index and are returned accurately, except for the offsets. In this case I am not dealing with the positions, so the termvector is specified as using 'with_offsets'. I have left the term position incrememt as its default. Looking at the exist

RE: Token offset values for custom Tokenizer

2007-07-16 Thread Ard Schrijvers

Hello, > Hi, > I am storing custom values in the Tokens provided by a Tokenizer but > when retrieving them from the index the values don't match. What do you mean by retrieving? Do you mean retrieving terms, or do you mean doing a search with words you know that should be in, but you do not fi

Token offset values for custom Tokenizer

2007-07-13 Thread Shahan Khatchadourian

Hi, I am storing custom values in the Tokens provided by a Tokenizer but when retrieving them from the index the values don't match. I've looked in the LIA book but it's not current since it mentioned term vectors aren't stored. I'm using Lucene Nightly 146 but the same thing has happened with

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

Because I wanted to use the javaCC input code from Lucene. 99.99% of what the standard parser did was VERY GOOD. having worked with computer-generated compilers in the past, I realized that if I were to modify the parser itself, I would eventually get into real trouble. So I took the time to

Re: Installing a custom tokenizer

2006-08-29 Thread yueyu lin

Your problem is that StandardTokenizer doesn's fit your requirements. Since you know how to implement a new one, just do it. If you just want to modify StandardTokenizer, you can get the codes and rename it to your class, then modify something that you dislike. I think it's a so simple stuff, why

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

On Aug 29, 2006, at 7:12 PM, Mark Miller wrote: 2. The ParseException that is generated when making the StandardAnalyzer must be killed because there is another ParseException class (maybe in queryparser?) that must be used instead. The lucene build file excludes the StandardAnalyzer Parse

Re: Installing a custom tokenizer

2006-08-29 Thread Mark Miller

Bill Taylor wrote: I have copied Lucene's StandardTokenizer.jj into my directory, renamed it, and did a global change of the names to my class name, LogTokenizer. The issue is that the generated LogTokenizer.java does not compile for 2 reasons: 1) in the constructor, this(new FastCharStream(

Re: Installing a custom tokenizer

2006-08-29 Thread Mark Miller

Bill Taylor wrote: I have copied Lucene's StandardTokenizer.jj into my directory, renamed it, and did a global change of the names to my class name, LogTokenizer. The issue is that the generated LogTokenizer.java does not compile for 2 reasons: 1) in the constructor, this(new FastCharStream(

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

I have copied Lucene's StandardTokenizer.jj into my directory, renamed it, and did a global change of the names to my class name, LogTokenizer. The issue is that the generated LogTokenizer.java does not compile for 2 reasons: 1) in the constructor, this(new FastCharStream(reader)); fails bec

Re: Installing a custom tokenizer

2006-08-29 Thread Erick Erickson

Tucked away in the contrib section of Lucene (I'm using 2.0) there is org.apache.lucene.index.memory.PatternAnalyzer which takes a regular expression as and tokenizes with it. Would that help? Word of warning... the regex determines what is NOT a token, not what IS a token (as I remember),

Re: Installing a custom tokenizer

2006-08-29 Thread Mark Miller

Bill Taylor wrote: On Aug 29, 2006, at 1:46 PM, Erick Erickson wrote: I'm in a real rush here, so pardon my brevity, but. one of the constructors for IndexWriter takes an Analyzer as a parameter, which can be a PerFieldAnalyzerWrapper. That, if I understand your issue, should fix you ri

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

On Aug 29, 2006, at 2:47 PM, Chris Hostetter wrote: : Have a look at PerFieldAnalyzerWrapper: : http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/ PerFieldAnalyzerWrapper.html ...which can be specified in the constructors for IndexWriter and QueryParser. As I understand

Re: Installing a custom tokenizer

2006-08-29 Thread Chris Hostetter

: Have a look at PerFieldAnalyzerWrapper: : http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html ...which can be specified in the constructors for IndexWriter and QueryParser. -Hoss --

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

On Aug 29, 2006, at 1:46 PM, Erick Erickson wrote: I'm in a real rush here, so pardon my brevity, but. one of the constructors for IndexWriter takes an Analyzer as a parameter, which can be a PerFieldAnalyzerWrapper. That, if I understand your issue, should fix you right up. that almos

Re: Installing a custom tokenizer

2006-08-29 Thread Erick Erickson

I'm in a real rush here, so pardon my brevity, but. one of the constructors for IndexWriter takes an Analyzer as a parameter, which can be a PerFieldAnalyzerWrapper. That, if I understand your issue, should fix you right up. Same kind of thing for a Query. Erick On 8/29/06, Bill Taylor <[EM

Re: Installing a custom tokenizer

2006-08-29 Thread Ronnie Kolehmainen

ss this tokenstream through other filters you are > > interested in */ > > } > > } > > > > Krovi. > > > > -Original Message- > > From: Bill Taylor [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, August 29, 2006 8:10 PM > > To:

Re: Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

interested in */ } } Krovi. -Original Message- From: Bill Taylor [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 29, 2006 8:10 PM To: java-user@lucene.apache.org Subject: Installing a custom tokenizer I am indexing documents which are filled with government jargon. As one would expect

RE: Installing a custom tokenizer

2006-08-29 Thread Krovi, DVSR_Sarma

ubject: Installing a custom tokenizer I am indexing documents which are filled with government jargon. As one would expect, the standard tokenizer has problems with governmenteese. In particular, the documents use words such as 310N-P-Q as references to other documents. The standard tokenizer break

Installing a custom tokenizer

2006-08-29 Thread Bill Taylor

I am indexing documents which are filled with government jargon. As one would expect, the standard tokenizer has problems with governmenteese. In particular, the documents use words such as 310N-P-Q as references to other documents. The standard tokenizer breaks this "word" at the dashes so

RE: Custom tokenizer

Re: Custom tokenizer

RE: Custom tokenizer

Custom tokenizer

Re: Custom Tokenizer/Analyzer

Re: Custom Tokenizer/Analyzer

Re: Custom Tokenizer/Analyzer

Re: Custom Tokenizer/Analyzer

Custom Tokenizer/Analyzer

Re: Custom Tokenizer

Re: Custom Tokenizer

Custom Tokenizer

TermPositions with custom Tokenizer

RE: Token offset values for custom Tokenizer

Re: Token offset values for custom Tokenizer

Re: Token offset values for custom Tokenizer

RE: Token offset values for custom Tokenizer

Token offset values for custom Tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

Re: Installing a custom tokenizer

RE: Installing a custom tokenizer

Installing a custom tokenizer

34 matches

Site Navigation

Mail list logo

Footer information