Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Gary Gregory Wed, 09 Aug 2023 15:17:10 -0700

Probably should be an IAE...?

Gary


On Wed, Aug 9, 2023, 6:07 PM Elliotte Rusty Harold <[email protected]>
wrote:

> What happens when a token contains an unpermitted character?
>
> On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <[email protected]> wrote:
> >
> > Here's my stab at a spec. Wanted to clarify some parts of the Case
> > interface first before jumping into the implementations. Wondering what a
> > good package name for this stuff is, given that "case" is a reserved
> word?
> >
> > Case (interface)
> > The Case interface defines two methods:
> > * String format(Iterable<String> tokens)
> > The format method accepts an Iterable of String tokens and returns a
> single
> > String formatted according to the implementation. The format method is
> > intended to handle transforming between cases, thus tokens passed to the
> > format() method need not be properly formatted for the given Case
> instance,
> > though they must still respect any reserve character restrictions.
> > * List<String> parse(String string)
> > The parse method accepts a single string and returns a List of string
> > tokens that abide by the Case implementation.
> > Note: format() and parse() methods must be fully reciprocal. ie. On a
> > single Case instance, when calling parse() with a valid string, and
> passing
> > the resulting tokens into format(), a matching string should be returned.
> >
> > DelimitedCase (base class for kebab and snake)
> > Defines a Case where all tokens are separated by a single character
> > delimiter. The delimiter is considered a reserved character and is not
> > allowed to appear within tokens when formatting. No further restrictions
> > are placed on token contents by this base implementation. Tokens can
> > contain any valid Java String character. DelimitedCases can support
> > zero-length tokens, which can occur if there are no characters between
> two
> > instances of the delimiter or if the parsed string begins or ends with
> the
> > delimiter.
> > Note: Other Case implementations may not support zero-length tokens, and
> > attempts to call format(...) with empty tokens may fail.
> >
> > KebabCase
> > Extends DelimitedCase and initializes the delimiter as the hyphen '-'
> > character. This case allows only alphanumeric characters within tokens.
> >
> > SnakeCase
> > Extends DelimitedCase and initializes the delimiter as the underscore '_'
> > character. This case allows only alphanumeric characters within tokens.
> >
> > PascalCase
> > Defines a Case where tokens begin with an uppercase alpha character. All
> > subsequent token characters must be lowercase alpha or numeric
> characters.
> > Whenever an uppercase alpha character is encountered, the previous token
> is
> > considered complete and a new token begins, with the uppercase character
> > being the first character of the new token. PascalCase does not allow
> > zero-length tokens when formatting, as it would violate the reciprocal
> > contract of format() and parse().
> >
> > CamelCase
> > Extends PascalCase and sets one additional restriction - that the first
> > character of the first token (ie the first character of the full string)
> > must be a lowercase alpha character (rather than the uppercase
> requirement
> > of PascalCase). All other restrictions of PascalCase apply.
> >
> >
> > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <[email protected]>
> wrote:
> >
> > > Kebab case is extremely common for web identifiers, eg html element
> ids,
> > > classes, attributes, etc.
> > >
> > > In regards to PascalCase, i agree that most people won't understand the
> > > reasoning behind the name, but it is nevertheless a widely accepted
> term
> > > for that case style. If an alternative is deemed necessary then
> > > "ProperCase" might work - since that is also how English proper nouns
> are
> > > cased. Understanding that name just depends on your knowledge of
> English
> > > grammar.
> > >
> > > A spec can definitely be written for the 4 provided concrete
> > > implementations. And... I may eat these words but... the spec should
> not be
> > > all that complex. I will take a stab at it.
> > >
> > > Thanks for the feedback!
> > > Any other thoughts or comments are welcome!
> > >
> > > Dan
> > >
> > >
> > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <[email protected]
> >
> > > wrote:
> > >
> > >> This is a good idea and seems like useful functionality. In order to
> > >> accept it into commons, it needs solid documentation and excellent
> > >> test coverage. I've worked on code like this in another language (not
> > >> Java) and the production bugs were bad. E.g. what happens when a
> > >> string contains numbers as well as letters?
> > >>
> > >> I'd like to see a full spec that unambiguously defines how every
> > >> Unicode string is converted into camel/snake/kebab case. The spec
> > >> should be independent of the code. That's not easy to write but it's
> > >> essential.
> > >>
> > >> I don't want any loose/strict modes. It should all be strict
> according to
> > >> spec.
> > >>
> > >> I've never heard of kebab cases before. Is that a common name? I'd
> > >> also like to rename Pascal case. How many programmers under 40 have
> > >> even heard of Pascal, much less are familiar with its case
> > >> conventions?
> > >>
> > >> Long story short - a PR is premature until there's an agreed upon
> spec.
> > >>
> > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <[email protected]>
> > >> wrote:
> > >> >
> > >> > I have a bit of code that adds the ability to parse and format
> strings
> > >> into
> > >> > various case patterns. Wanted to check if it's of worth and
> in-scope for
> > >> > commons-text...
> > >> >
> > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...)
> method.
> > >> > Rather than simply formatting tokens into the case, this API adds
> the
> > >> > additional goal of being able to transform one case to another. e.g.
> > >> >
> > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns
> > >> > My_Pascal_String
> > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns
> > >> > mySnakeString
> > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns
> > >> > my-Camel-String
> > >> > //Note that kebab and snake do not alter the alphabetic case of the
> > >> tokens,
> > >> > as they are essentially case agnostic joining, according to this
> > >> > implementation. Though this can be overridden by end users.
> > >> >
> > >> > The API has one core interface: Case, which has format and parse
> > >> methods.
> > >> > There is a single abstract implementation of it -
> > >> AbstractConfigurableCase
> > >> > - which is a configuration driven way to create a case pattern. It
> has
> > >> > enough options to accommodate the 4 popular cases, and thus the
> > >> subclasses
> > >> > just have to configure these options rather than implement them
> > >> directly.
> > >> > Any further extensions can override or extend the api as necessary.
> > >> >
> > >> > There are five core concrete implementations:
> > >> >
> > >> > PascalCase
> > >> > CamelCase (extends PascalCase)
> > >> > DelimitedCase
> > >> > KebabCase (extends DelimitedCase)
> > >> > SnakeCase (extends DelimitedCase)
> > >> >
> > >> > Each has a static INSTANCE field to avoid redundant instantiation.
> > >> >
> > >> > Some of my reasoning / concerns...
> > >> >
> > >> > * I considered bundling all of this logic into static methods,
> similar
> > >> to
> > >> > CaseUtils, but that prevents the user from truly customizing or
> > >> extending
> > >> > the code for odd cases. This approach is, in my opinion, far easier
> to
> > >> > understand, extend, and debug.
> > >> > * I believe the parsing side should potentially have a loose /
> strict
> > >> mode,
> > >> > in that the logic can ignore non-critical rules on the parsing side.
> > >> e.g.
> > >> > the command CamelCase.parse("MyString") should work, even though the
> > >> input
> > >> > is not strictly camel case. Strict parsing would ensure (if
> possible)
> > >> that
> > >> > the input abides by all elements of the format.
> > >> > * I'm still unsure about how best to handle reserved characters when
> > >> > translating. e.g. How should
> > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the
> hyphen?
> > >> > Should the kebab case strip the reserved character from the token
> > >> values?
> > >> >
> > >> > Long story short - is this worth pursuing in the form of a pull
> request
> > >> for
> > >> > review? Or is it out of scope for commons-text?
> > >> >
> > >> > Dan
> > >>
> > >>
> > >>
> > >> --
> > >> Elliotte Rusty Harold
> > >> [email protected]
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [email protected]
> > >> For additional commands, e-mail: [email protected]
> > >>
> > >>
>
>
>
> --
> Elliotte Rusty Harold
> [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Reply via email to