Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Daniel Watson Wed, 09 Aug 2023 16:50:09 -0700

Currently those exceptions do capture token and character index
information, but i think im just using it to create the message. I get what
you're saying but without them testing becomes less accurate. If IAE is
being thrown all over the place then asserting a failure can't actually
guarantee that it failed in the expected way.



In regards to what Elliotte said...


Not every set of tokens can actually be represented deterministcally in
every case. Which is why I think exceptions are needed.

my-component-1

Is a valid kebab cased string, with tokens my,component,1

However this cannot be formatted in camel case or Pascal case, because they
are delimited by alpha characters.

If those tokens were passed to those cases I would expect an exception to
be thrown, other wise the result is not reciprocal.  e.g. MyComponent1 is
only two PascalCase tokens.

On Wed, Aug 9, 2023, 7:36 PM Daniel Watson <dcwatso...@gmail.com> wrote:

> Meant to add...
>
> The reason I would favor exceptions is that the underlying implementation
> can be easily customized. If the user needs to allow non alphanumeric
> characters there is a boolean flag in the underlying abstract class
> (AbstractConfigurableCase) that will simply turn that validation off. I
> don't think we need to make any specific implementation be significantly
> error tolerant.
>
> An extension of snake case to allow all characters should look like..
>
>
> class MySnakeCase extends SnakeCase {
> MySnakeCase(){
> super().
> this.alphanuneric = false;
> }
> }
>
>
> On Wed, Aug 9, 2023, 7:29 PM Daniel Watson <dcwatso...@gmail.com> wrote:
>
>> Currently I'm planning a set of exceptions that are thrown for various
>> reasons. I created multiple classes to allow for clearer testing.
>>
>> ReservedCharacterException (extends InvalidCharacterException below) -
>> thrown specifically when a reserved character is encountered within a token.
>>
>> InvalidCharacterException (extends IllegalArgumentException) thrown
>> directly any time an illegal character is encountered.
>>
>> ZeroLengthTokenException (extends Illegal arg excep) - thrown when a zero
>> length token is encountered and Case does not support it.
>>
>> There are a few other error cases I believe. I'm not looking at the code
>> right this moment but I'm fairly certain about the need for the above 3.
>>
>>
>> On Wed, Aug 9, 2023, 6:08 PM Elliotte Rusty Harold <elh...@ibiblio.org>
>> wrote:
>>
>>> What happens when a token contains an unpermitted character?
>>>
>>> On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <dcwatso...@gmail.com>
>>> wrote:
>>> >
>>> > Here's my stab at a spec. Wanted to clarify some parts of the Case
>>> > interface first before jumping into the implementations. Wondering
>>> what a
>>> > good package name for this stuff is, given that "case" is a reserved
>>> word?
>>> >
>>> > Case (interface)
>>> > The Case interface defines two methods:
>>> > * String format(Iterable<String> tokens)
>>> > The format method accepts an Iterable of String tokens and returns a
>>> single
>>> > String formatted according to the implementation. The format method is
>>> > intended to handle transforming between cases, thus tokens passed to
>>> the
>>> > format() method need not be properly formatted for the given Case
>>> instance,
>>> > though they must still respect any reserve character restrictions.
>>> > * List<String> parse(String string)
>>> > The parse method accepts a single string and returns a List of string
>>> > tokens that abide by the Case implementation.
>>> > Note: format() and parse() methods must be fully reciprocal. ie. On a
>>> > single Case instance, when calling parse() with a valid string, and
>>> passing
>>> > the resulting tokens into format(), a matching string should be
>>> returned.
>>> >
>>> > DelimitedCase (base class for kebab and snake)
>>> > Defines a Case where all tokens are separated by a single character
>>> > delimiter. The delimiter is considered a reserved character and is not
>>> > allowed to appear within tokens when formatting. No further
>>> restrictions
>>> > are placed on token contents by this base implementation. Tokens can
>>> > contain any valid Java String character. DelimitedCases can support
>>> > zero-length tokens, which can occur if there are no characters between
>>> two
>>> > instances of the delimiter or if the parsed string begins or ends with
>>> the
>>> > delimiter.
>>> > Note: Other Case implementations may not support zero-length tokens,
>>> and
>>> > attempts to call format(...) with empty tokens may fail.
>>> >
>>> > KebabCase
>>> > Extends DelimitedCase and initializes the delimiter as the hyphen '-'
>>> > character. This case allows only alphanumeric characters within tokens.
>>> >
>>> > SnakeCase
>>> > Extends DelimitedCase and initializes the delimiter as the underscore
>>> '_'
>>> > character. This case allows only alphanumeric characters within tokens.
>>> >
>>> > PascalCase
>>> > Defines a Case where tokens begin with an uppercase alpha character.
>>> All
>>> > subsequent token characters must be lowercase alpha or numeric
>>> characters.
>>> > Whenever an uppercase alpha character is encountered, the previous
>>> token is
>>> > considered complete and a new token begins, with the uppercase
>>> character
>>> > being the first character of the new token. PascalCase does not allow
>>> > zero-length tokens when formatting, as it would violate the reciprocal
>>> > contract of format() and parse().
>>> >
>>> > CamelCase
>>> > Extends PascalCase and sets one additional restriction - that the first
>>> > character of the first token (ie the first character of the full
>>> string)
>>> > must be a lowercase alpha character (rather than the uppercase
>>> requirement
>>> > of PascalCase). All other restrictions of PascalCase apply.
>>> >
>>> >
>>> > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <dcwatso...@gmail.com>
>>> wrote:
>>> >
>>> > > Kebab case is extremely common for web identifiers, eg html element
>>> ids,
>>> > > classes, attributes, etc.
>>> > >
>>> > > In regards to PascalCase, i agree that most people won't understand
>>> the
>>> > > reasoning behind the name, but it is nevertheless a widely accepted
>>> term
>>> > > for that case style. If an alternative is deemed necessary then
>>> > > "ProperCase" might work - since that is also how English proper
>>> nouns are
>>> > > cased. Understanding that name just depends on your knowledge of
>>> English
>>> > > grammar.
>>> > >
>>> > > A spec can definitely be written for the 4 provided concrete
>>> > > implementations. And... I may eat these words but... the spec should
>>> not be
>>> > > all that complex. I will take a stab at it.
>>> > >
>>> > > Thanks for the feedback!
>>> > > Any other thoughts or comments are welcome!
>>> > >
>>> > > Dan
>>> > >
>>> > >
>>> > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <
>>> elh...@ibiblio.org>
>>> > > wrote:
>>> > >
>>> > >> This is a good idea and seems like useful functionality. In order to
>>> > >> accept it into commons, it needs solid documentation and excellent
>>> > >> test coverage. I've worked on code like this in another language
>>> (not
>>> > >> Java) and the production bugs were bad. E.g. what happens when a
>>> > >> string contains numbers as well as letters?
>>> > >>
>>> > >> I'd like to see a full spec that unambiguously defines how every
>>> > >> Unicode string is converted into camel/snake/kebab case. The spec
>>> > >> should be independent of the code. That's not easy to write but it's
>>> > >> essential.
>>> > >>
>>> > >> I don't want any loose/strict modes. It should all be strict
>>> according to
>>> > >> spec.
>>> > >>
>>> > >> I've never heard of kebab cases before. Is that a common name? I'd
>>> > >> also like to rename Pascal case. How many programmers under 40 have
>>> > >> even heard of Pascal, much less are familiar with its case
>>> > >> conventions?
>>> > >>
>>> > >> Long story short - a PR is premature until there's an agreed upon
>>> spec.
>>> > >>
>>> > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <dcwatso...@gmail.com>
>>> > >> wrote:
>>> > >> >
>>> > >> > I have a bit of code that adds the ability to parse and format
>>> strings
>>> > >> into
>>> > >> > various case patterns. Wanted to check if it's of worth and
>>> in-scope for
>>> > >> > commons-text...
>>> > >> >
>>> > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...)
>>> method.
>>> > >> > Rather than simply formatting tokens into the case, this API adds
>>> the
>>> > >> > additional goal of being able to transform one case to another.
>>> e.g.
>>> > >> >
>>> > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns
>>> > >> > My_Pascal_String
>>> > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns
>>> > >> > mySnakeString
>>> > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns
>>> > >> > my-Camel-String
>>> > >> > //Note that kebab and snake do not alter the alphabetic case of
>>> the
>>> > >> tokens,
>>> > >> > as they are essentially case agnostic joining, according to this
>>> > >> > implementation. Though this can be overridden by end users.
>>> > >> >
>>> > >> > The API has one core interface: Case, which has format and parse
>>> > >> methods.
>>> > >> > There is a single abstract implementation of it -
>>> > >> AbstractConfigurableCase
>>> > >> > - which is a configuration driven way to create a case pattern.
>>> It has
>>> > >> > enough options to accommodate the 4 popular cases, and thus the
>>> > >> subclasses
>>> > >> > just have to configure these options rather than implement them
>>> > >> directly.
>>> > >> > Any further extensions can override or extend the api as
>>> necessary.
>>> > >> >
>>> > >> > There are five core concrete implementations:
>>> > >> >
>>> > >> > PascalCase
>>> > >> > CamelCase (extends PascalCase)
>>> > >> > DelimitedCase
>>> > >> > KebabCase (extends DelimitedCase)
>>> > >> > SnakeCase (extends DelimitedCase)
>>> > >> >
>>> > >> > Each has a static INSTANCE field to avoid redundant instantiation.
>>> > >> >
>>> > >> > Some of my reasoning / concerns...
>>> > >> >
>>> > >> > * I considered bundling all of this logic into static methods,
>>> similar
>>> > >> to
>>> > >> > CaseUtils, but that prevents the user from truly customizing or
>>> > >> extending
>>> > >> > the code for odd cases. This approach is, in my opinion, far
>>> easier to
>>> > >> > understand, extend, and debug.
>>> > >> > * I believe the parsing side should potentially have a loose /
>>> strict
>>> > >> mode,
>>> > >> > in that the logic can ignore non-critical rules on the parsing
>>> side.
>>> > >> e.g.
>>> > >> > the command CamelCase.parse("MyString") should work, even though
>>> the
>>> > >> input
>>> > >> > is not strictly camel case. Strict parsing would ensure (if
>>> possible)
>>> > >> that
>>> > >> > the input abides by all elements of the format.
>>> > >> > * I'm still unsure about how best to handle reserved characters
>>> when
>>> > >> > translating. e.g. How should
>>> > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the
>>> hyphen?
>>> > >> > Should the kebab case strip the reserved character from the token
>>> > >> values?
>>> > >> >
>>> > >> > Long story short - is this worth pursuing in the form of a pull
>>> request
>>> > >> for
>>> > >> > review? Or is it out of scope for commons-text?
>>> > >> >
>>> > >> > Dan
>>> > >>
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Elliotte Rusty Harold
>>> > >> elh...@ibiblio.org
>>> > >>
>>> > >>
>>> ---------------------------------------------------------------------
>>> > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> > >> For additional commands, e-mail: dev-h...@commons.apache.org
>>> > >>
>>> > >>
>>>
>>>
>>>
>>> --
>>> Elliotte Rusty Harold
>>> elh...@ibiblio.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>>

Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Reply via email to