Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Daniel Watson Wed, 09 Aug 2023 16:36:47 -0700

Meant to add...

The reason I would favor exceptions is that the underlying implementation
can be easily customized. If the user needs to allow non alphanumeric
characters there is a boolean flag in the underlying abstract class
(AbstractConfigurableCase) that will simply turn that validation off. I
don't think we need to make any specific implementation be significantly
error tolerant.


An extension of snake case to allow all characters should look like..


class MySnakeCase extends SnakeCase {
MySnakeCase(){
super().
this.alphanuneric = false;
}
}


On Wed, Aug 9, 2023, 7:29 PM Daniel Watson <[email protected]> wrote:

> Currently I'm planning a set of exceptions that are thrown for various
> reasons. I created multiple classes to allow for clearer testing.
>
> ReservedCharacterException (extends InvalidCharacterException below) -
> thrown specifically when a reserved character is encountered within a token.
>
> InvalidCharacterException (extends IllegalArgumentException) thrown
> directly any time an illegal character is encountered.
>
> ZeroLengthTokenException (extends Illegal arg excep) - thrown when a zero
> length token is encountered and Case does not support it.
>
> There are a few other error cases I believe. I'm not looking at the code
> right this moment but I'm fairly certain about the need for the above 3.
>
>
> On Wed, Aug 9, 2023, 6:08 PM Elliotte Rusty Harold <[email protected]>
> wrote:
>
>> What happens when a token contains an unpermitted character?
>>
>> On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <[email protected]>
>> wrote:
>> >
>> > Here's my stab at a spec. Wanted to clarify some parts of the Case
>> > interface first before jumping into the implementations. Wondering what
>> a
>> > good package name for this stuff is, given that "case" is a reserved
>> word?
>> >
>> > Case (interface)
>> > The Case interface defines two methods:
>> > * String format(Iterable<String> tokens)
>> > The format method accepts an Iterable of String tokens and returns a
>> single
>> > String formatted according to the implementation. The format method is
>> > intended to handle transforming between cases, thus tokens passed to the
>> > format() method need not be properly formatted for the given Case
>> instance,
>> > though they must still respect any reserve character restrictions.
>> > * List<String> parse(String string)
>> > The parse method accepts a single string and returns a List of string
>> > tokens that abide by the Case implementation.
>> > Note: format() and parse() methods must be fully reciprocal. ie. On a
>> > single Case instance, when calling parse() with a valid string, and
>> passing
>> > the resulting tokens into format(), a matching string should be
>> returned.
>> >
>> > DelimitedCase (base class for kebab and snake)
>> > Defines a Case where all tokens are separated by a single character
>> > delimiter. The delimiter is considered a reserved character and is not
>> > allowed to appear within tokens when formatting. No further restrictions
>> > are placed on token contents by this base implementation. Tokens can
>> > contain any valid Java String character. DelimitedCases can support
>> > zero-length tokens, which can occur if there are no characters between
>> two
>> > instances of the delimiter or if the parsed string begins or ends with
>> the
>> > delimiter.
>> > Note: Other Case implementations may not support zero-length tokens, and
>> > attempts to call format(...) with empty tokens may fail.
>> >
>> > KebabCase
>> > Extends DelimitedCase and initializes the delimiter as the hyphen '-'
>> > character. This case allows only alphanumeric characters within tokens.
>> >
>> > SnakeCase
>> > Extends DelimitedCase and initializes the delimiter as the underscore
>> '_'
>> > character. This case allows only alphanumeric characters within tokens.
>> >
>> > PascalCase
>> > Defines a Case where tokens begin with an uppercase alpha character. All
>> > subsequent token characters must be lowercase alpha or numeric
>> characters.
>> > Whenever an uppercase alpha character is encountered, the previous
>> token is
>> > considered complete and a new token begins, with the uppercase character
>> > being the first character of the new token. PascalCase does not allow
>> > zero-length tokens when formatting, as it would violate the reciprocal
>> > contract of format() and parse().
>> >
>> > CamelCase
>> > Extends PascalCase and sets one additional restriction - that the first
>> > character of the first token (ie the first character of the full string)
>> > must be a lowercase alpha character (rather than the uppercase
>> requirement
>> > of PascalCase). All other restrictions of PascalCase apply.
>> >
>> >
>> > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <[email protected]>
>> wrote:
>> >
>> > > Kebab case is extremely common for web identifiers, eg html element
>> ids,
>> > > classes, attributes, etc.
>> > >
>> > > In regards to PascalCase, i agree that most people won't understand
>> the
>> > > reasoning behind the name, but it is nevertheless a widely accepted
>> term
>> > > for that case style. If an alternative is deemed necessary then
>> > > "ProperCase" might work - since that is also how English proper nouns
>> are
>> > > cased. Understanding that name just depends on your knowledge of
>> English
>> > > grammar.
>> > >
>> > > A spec can definitely be written for the 4 provided concrete
>> > > implementations. And... I may eat these words but... the spec should
>> not be
>> > > all that complex. I will take a stab at it.
>> > >
>> > > Thanks for the feedback!
>> > > Any other thoughts or comments are welcome!
>> > >
>> > > Dan
>> > >
>> > >
>> > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> This is a good idea and seems like useful functionality. In order to
>> > >> accept it into commons, it needs solid documentation and excellent
>> > >> test coverage. I've worked on code like this in another language (not
>> > >> Java) and the production bugs were bad. E.g. what happens when a
>> > >> string contains numbers as well as letters?
>> > >>
>> > >> I'd like to see a full spec that unambiguously defines how every
>> > >> Unicode string is converted into camel/snake/kebab case. The spec
>> > >> should be independent of the code. That's not easy to write but it's
>> > >> essential.
>> > >>
>> > >> I don't want any loose/strict modes. It should all be strict
>> according to
>> > >> spec.
>> > >>
>> > >> I've never heard of kebab cases before. Is that a common name? I'd
>> > >> also like to rename Pascal case. How many programmers under 40 have
>> > >> even heard of Pascal, much less are familiar with its case
>> > >> conventions?
>> > >>
>> > >> Long story short - a PR is premature until there's an agreed upon
>> spec.
>> > >>
>> > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <[email protected]>
>> > >> wrote:
>> > >> >
>> > >> > I have a bit of code that adds the ability to parse and format
>> strings
>> > >> into
>> > >> > various case patterns. Wanted to check if it's of worth and
>> in-scope for
>> > >> > commons-text...
>> > >> >
>> > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...)
>> method.
>> > >> > Rather than simply formatting tokens into the case, this API adds
>> the
>> > >> > additional goal of being able to transform one case to another.
>> e.g.
>> > >> >
>> > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns
>> > >> > My_Pascal_String
>> > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns
>> > >> > mySnakeString
>> > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns
>> > >> > my-Camel-String
>> > >> > //Note that kebab and snake do not alter the alphabetic case of the
>> > >> tokens,
>> > >> > as they are essentially case agnostic joining, according to this
>> > >> > implementation. Though this can be overridden by end users.
>> > >> >
>> > >> > The API has one core interface: Case, which has format and parse
>> > >> methods.
>> > >> > There is a single abstract implementation of it -
>> > >> AbstractConfigurableCase
>> > >> > - which is a configuration driven way to create a case pattern. It
>> has
>> > >> > enough options to accommodate the 4 popular cases, and thus the
>> > >> subclasses
>> > >> > just have to configure these options rather than implement them
>> > >> directly.
>> > >> > Any further extensions can override or extend the api as necessary.
>> > >> >
>> > >> > There are five core concrete implementations:
>> > >> >
>> > >> > PascalCase
>> > >> > CamelCase (extends PascalCase)
>> > >> > DelimitedCase
>> > >> > KebabCase (extends DelimitedCase)
>> > >> > SnakeCase (extends DelimitedCase)
>> > >> >
>> > >> > Each has a static INSTANCE field to avoid redundant instantiation.
>> > >> >
>> > >> > Some of my reasoning / concerns...
>> > >> >
>> > >> > * I considered bundling all of this logic into static methods,
>> similar
>> > >> to
>> > >> > CaseUtils, but that prevents the user from truly customizing or
>> > >> extending
>> > >> > the code for odd cases. This approach is, in my opinion, far
>> easier to
>> > >> > understand, extend, and debug.
>> > >> > * I believe the parsing side should potentially have a loose /
>> strict
>> > >> mode,
>> > >> > in that the logic can ignore non-critical rules on the parsing
>> side.
>> > >> e.g.
>> > >> > the command CamelCase.parse("MyString") should work, even though
>> the
>> > >> input
>> > >> > is not strictly camel case. Strict parsing would ensure (if
>> possible)
>> > >> that
>> > >> > the input abides by all elements of the format.
>> > >> > * I'm still unsure about how best to handle reserved characters
>> when
>> > >> > translating. e.g. How should
>> > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the
>> hyphen?
>> > >> > Should the kebab case strip the reserved character from the token
>> > >> values?
>> > >> >
>> > >> > Long story short - is this worth pursuing in the form of a pull
>> request
>> > >> for
>> > >> > review? Or is it out of scope for commons-text?
>> > >> >
>> > >> > Dan
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Elliotte Rusty Harold
>> > >> [email protected]
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: [email protected]
>> > >> For additional commands, e-mail: [email protected]
>> > >>
>> > >>
>>
>>
>>
>> --
>> Elliotte Rusty Harold
>> [email protected]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Reply via email to