Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
better? Said that, I'd love to hear more specific requirements about Tokenizer to avoid the above odd deliveries :) regards Valery -- View this message in context: http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25078755.html Sent from the Lucene

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
Simon Willnauer wrote: > > you could do > the whole job in a Tokenizer but this would not be a good separation > of concerns right!? > right, it wouldn't be a good separation of concerns. That's why I wanted to know what you consider as "Tokenizer's job". -- View this message in context:

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
thin the token filter > [...] > I would wait for Simon's answer to the question "What do you expect from the Tokenizer?" Then I will give my 2cents on this and perhaps then I could sum up all opinions and adopt a common conclusion. :) regards Valery -- View this message

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-21 Thread Valery
Hi Simon, Simon Willnauer wrote: > > Valery, have you tried to use whitespaceTokenizer / CharTokenizer and > [...]?! > > simon > yes, I did, please find the info in the initial message. Here are the excerpts: Valery wrote: > > 2) WhitespaceTokenizer gives me

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-20 Thread Valery
", "R/3", "SAP R/3"} later. I try to follow a spirit that a token (or its lexem) usually should never be parsed again. One can build more complex (compound) things from the tokens. However, usually one never chops a lexem into smaller pieces. What do you think, Robert?

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-20 Thread Valery
maybe even both?.. regards, Valery Ken Krugler wrote: > > Hi Valery, > > From our experience at Krugle, we wound up having to create our own > tokenizers (actually kind of specialized parser) for the different > languages. It didn't seem like a good option to try

Re: Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-20 Thread Valery
Hi Robert, thanks for the hint. Indeed, a natural way to go. Especially if one builds a Tokenizer of the level of quality like StandardTokenizer's. OTOH, you mean that the out-of-the-box stuff is indeed not customizable for this task?.. regards Valery Robert Muir wrote: >

Any Tokenizator friendly to C++, C#, .NET, etc ?

2009-08-20 Thread Valery
d also react on delimiting characters and emit the token. However, it should distinguish between delimiters like whitespaces along with ";,?" and the delimiters like "./&". Indeed, the delimiters like whitespaces and ";,?" should be thrown away from Lexem