Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Daniel Watson Wed, 09 Aug 2023 13:30:29 -0700

Here's my stab at a spec. Wanted to clarify some parts of the Case
interface first before jumping into the implementations. Wondering what a
good package name for this stuff is, given that "case" is a reserved word?

Case (interface)
The Case interface defines two methods:
* String format(Iterable<String> tokens)
The format method accepts an Iterable of String tokens and returns a single
String formatted according to the implementation. The format method is
intended to handle transforming between cases, thus tokens passed to the
format() method need not be properly formatted for the given Case instance,
though they must still respect any reserve character restrictions.
* List<String> parse(String string)
The parse method accepts a single string and returns a List of string
tokens that abide by the Case implementation.
Note: format() and parse() methods must be fully reciprocal. ie. On a
single Case instance, when calling parse() with a valid string, and passing
the resulting tokens into format(), a matching string should be returned.

DelimitedCase (base class for kebab and snake)
Defines a Case where all tokens are separated by a single character
delimiter. The delimiter is considered a reserved character and is not
allowed to appear within tokens when formatting. No further restrictions
are placed on token contents by this base implementation. Tokens can
contain any valid Java String character. DelimitedCases can support
zero-length tokens, which can occur if there are no characters between two
instances of the delimiter or if the parsed string begins or ends with the
delimiter.
Note: Other Case implementations may not support zero-length tokens, and
attempts to call format(...) with empty tokens may fail.

KebabCase
Extends DelimitedCase and initializes the delimiter as the hyphen '-'
character. This case allows only alphanumeric characters within tokens.

SnakeCase
Extends DelimitedCase and initializes the delimiter as the underscore '_'
character. This case allows only alphanumeric characters within tokens.

PascalCase
Defines a Case where tokens begin with an uppercase alpha character. All
subsequent token characters must be lowercase alpha or numeric characters.
Whenever an uppercase alpha character is encountered, the previous token is
considered complete and a new token begins, with the uppercase character
being the first character of the new token. PascalCase does not allow
zero-length tokens when formatting, as it would violate the reciprocal
contract of format() and parse().

CamelCase
Extends PascalCase and sets one additional restriction - that the first
character of the first token (ie the first character of the full string)
must be a lowercase alpha character (rather than the uppercase requirement
of PascalCase). All other restrictions of PascalCase apply.

On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <dcwatso...@gmail.com> wrote:

> Kebab case is extremely common for web identifiers, eg html element ids,
> classes, attributes, etc.
>
> In regards to PascalCase, i agree that most people won't understand the
> reasoning behind the name, but it is nevertheless a widely accepted term
> for that case style. If an alternative is deemed necessary then
> "ProperCase" might work - since that is also how English proper nouns are
> cased. Understanding that name just depends on your knowledge of English
> grammar.
>
> A spec can definitely be written for the 4 provided concrete
> implementations. And... I may eat these words but... the spec should not be
> all that complex. I will take a stab at it.
>
> Thanks for the feedback!
> Any other thoughts or comments are welcome!
>
> Dan
>
>
> On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <elh...@ibiblio.org>
> wrote:
>
>> This is a good idea and seems like useful functionality. In order to
>> accept it into commons, it needs solid documentation and excellent
>> test coverage. I've worked on code like this in another language (not
>> Java) and the production bugs were bad. E.g. what happens when a
>> string contains numbers as well as letters?
>>
>> I'd like to see a full spec that unambiguously defines how every
>> Unicode string is converted into camel/snake/kebab case. The spec
>> should be independent of the code. That's not easy to write but it's
>> essential.
>>
>> I don't want any loose/strict modes. It should all be strict according to
>> spec.
>>
>> I've never heard of kebab cases before. Is that a common name? I'd
>> also like to rename Pascal case. How many programmers under 40 have
>> even heard of Pascal, much less are familiar with its case
>> conventions?
>>
>> Long story short - a PR is premature until there's an agreed upon spec.
>>
>> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <dcwatso...@gmail.com>
>> wrote:
>> >
>> > I have a bit of code that adds the ability to parse and format strings
>> into
>> > various case patterns. Wanted to check if it's of worth and in-scope for
>> > commons-text...
>> >
>> > Its a bit broader than the existing CaseUtils.toCamelCase(...) method.
>> > Rather than simply formatting tokens into the case, this API adds the
>> > additional goal of being able to transform one case to another. e.g.
>> >
>> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns
>> > My_Pascal_String
>> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns
>> > mySnakeString
>> > KebabCase.format(CamelCase.parse("myCamelString")); // returns
>> > my-Camel-String
>> > //Note that kebab and snake do not alter the alphabetic case of the
>> tokens,
>> > as they are essentially case agnostic joining, according to this
>> > implementation. Though this can be overridden by end users.
>> >
>> > The API has one core interface: Case, which has format and parse
>> methods.
>> > There is a single abstract implementation of it -
>> AbstractConfigurableCase
>> > - which is a configuration driven way to create a case pattern. It has
>> > enough options to accommodate the 4 popular cases, and thus the
>> subclasses
>> > just have to configure these options rather than implement them
>> directly.
>> > Any further extensions can override or extend the api as necessary.
>> >
>> > There are five core concrete implementations:
>> >
>> > PascalCase
>> > CamelCase (extends PascalCase)
>> > DelimitedCase
>> > KebabCase (extends DelimitedCase)
>> > SnakeCase (extends DelimitedCase)
>> >
>> > Each has a static INSTANCE field to avoid redundant instantiation.
>> >
>> > Some of my reasoning / concerns...
>> >
>> > * I considered bundling all of this logic into static methods, similar
>> to
>> > CaseUtils, but that prevents the user from truly customizing or
>> extending
>> > the code for odd cases. This approach is, in my opinion, far easier to
>> > understand, extend, and debug.
>> > * I believe the parsing side should potentially have a loose / strict
>> mode,
>> > in that the logic can ignore non-critical rules on the parsing side.
>> e.g.
>> > the command CamelCase.parse("MyString") should work, even though the
>> input
>> > is not strictly camel case. Strict parsing would ensure (if possible)
>> that
>> > the input abides by all elements of the format.
>> > * I'm still unsure about how best to handle reserved characters when
>> > translating. e.g. How should
>> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the hyphen?
>> > Should the kebab case strip the reserved character from the token
>> values?
>> >
>> > Long story short - is this worth pursuing in the form of a pull request
>> for
>> > review? Or is it out of scope for commons-text?
>> >
>> > Dan
>>
>>
>>
>> --
>> Elliotte Rusty Harold
>> elh...@ibiblio.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>

Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Reply via email to