Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Gary Gregory Wed, 09 Aug 2023 16:42:58 -0700

IMO, these can all be replaced by IAE because there is nothing I would
do as a call site if I caught one of these custom exceptions vs.
another, it's all the same issue, probably bad user input. The only
reason to create a custom exception would be to wrap additional
information like a location (line number, column number), but that's
not what you describe here. You can imagine an editor catching a
syntax error exception and extracting a line and column number and
changing the style for that area of the text.


Gary

On Wed, Aug 9, 2023 at 7:30 PM Daniel Watson <[email protected]> wrote:
>
> Currently I'm planning a set of exceptions that are thrown for various
> reasons. I created multiple classes to allow for clearer testing.
>
> ReservedCharacterException (extends InvalidCharacterException below) -
> thrown specifically when a reserved character is encountered within a token.
>
> InvalidCharacterException (extends IllegalArgumentException) thrown
> directly any time an illegal character is encountered.
>
> ZeroLengthTokenException (extends Illegal arg excep) - thrown when a zero
> length token is encountered and Case does not support it.
>
> There are a few other error cases I believe. I'm not looking at the code
> right this moment but I'm fairly certain about the need for the above 3.
>
>
> On Wed, Aug 9, 2023, 6:08 PM Elliotte Rusty Harold <[email protected]>
> wrote:
>
> > What happens when a token contains an unpermitted character?
> >
> > On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <[email protected]> wrote:
> > >
> > > Here's my stab at a spec. Wanted to clarify some parts of the Case
> > > interface first before jumping into the implementations. Wondering what a
> > > good package name for this stuff is, given that "case" is a reserved
> > word?
> > >
> > > Case (interface)
> > > The Case interface defines two methods:
> > > * String format(Iterable<String> tokens)
> > > The format method accepts an Iterable of String tokens and returns a
> > single
> > > String formatted according to the implementation. The format method is
> > > intended to handle transforming between cases, thus tokens passed to the
> > > format() method need not be properly formatted for the given Case
> > instance,
> > > though they must still respect any reserve character restrictions.
> > > * List<String> parse(String string)
> > > The parse method accepts a single string and returns a List of string
> > > tokens that abide by the Case implementation.
> > > Note: format() and parse() methods must be fully reciprocal. ie. On a
> > > single Case instance, when calling parse() with a valid string, and
> > passing
> > > the resulting tokens into format(), a matching string should be returned.
> > >
> > > DelimitedCase (base class for kebab and snake)
> > > Defines a Case where all tokens are separated by a single character
> > > delimiter. The delimiter is considered a reserved character and is not
> > > allowed to appear within tokens when formatting. No further restrictions
> > > are placed on token contents by this base implementation. Tokens can
> > > contain any valid Java String character. DelimitedCases can support
> > > zero-length tokens, which can occur if there are no characters between
> > two
> > > instances of the delimiter or if the parsed string begins or ends with
> > the
> > > delimiter.
> > > Note: Other Case implementations may not support zero-length tokens, and
> > > attempts to call format(...) with empty tokens may fail.
> > >
> > > KebabCase
> > > Extends DelimitedCase and initializes the delimiter as the hyphen '-'
> > > character. This case allows only alphanumeric characters within tokens.
> > >
> > > SnakeCase
> > > Extends DelimitedCase and initializes the delimiter as the underscore '_'
> > > character. This case allows only alphanumeric characters within tokens.
> > >
> > > PascalCase
> > > Defines a Case where tokens begin with an uppercase alpha character. All
> > > subsequent token characters must be lowercase alpha or numeric
> > characters.
> > > Whenever an uppercase alpha character is encountered, the previous token
> > is
> > > considered complete and a new token begins, with the uppercase character
> > > being the first character of the new token. PascalCase does not allow
> > > zero-length tokens when formatting, as it would violate the reciprocal
> > > contract of format() and parse().
> > >
> > > CamelCase
> > > Extends PascalCase and sets one additional restriction - that the first
> > > character of the first token (ie the first character of the full string)
> > > must be a lowercase alpha character (rather than the uppercase
> > requirement
> > > of PascalCase). All other restrictions of PascalCase apply.
> > >
> > >
> > > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <[email protected]>
> > wrote:
> > >
> > > > Kebab case is extremely common for web identifiers, eg html element
> > ids,
> > > > classes, attributes, etc.
> > > >
> > > > In regards to PascalCase, i agree that most people won't understand the
> > > > reasoning behind the name, but it is nevertheless a widely accepted
> > term
> > > > for that case style. If an alternative is deemed necessary then
> > > > "ProperCase" might work - since that is also how English proper nouns
> > are
> > > > cased. Understanding that name just depends on your knowledge of
> > English
> > > > grammar.
> > > >
> > > > A spec can definitely be written for the 4 provided concrete
> > > > implementations. And... I may eat these words but... the spec should
> > not be
> > > > all that complex. I will take a stab at it.
> > > >
> > > > Thanks for the feedback!
> > > > Any other thoughts or comments are welcome!
> > > >
> > > > Dan
> > > >
> > > >
> > > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <[email protected]
> > >
> > > > wrote:
> > > >
> > > >> This is a good idea and seems like useful functionality. In order to
> > > >> accept it into commons, it needs solid documentation and excellent
> > > >> test coverage. I've worked on code like this in another language (not
> > > >> Java) and the production bugs were bad. E.g. what happens when a
> > > >> string contains numbers as well as letters?
> > > >>
> > > >> I'd like to see a full spec that unambiguously defines how every
> > > >> Unicode string is converted into camel/snake/kebab case. The spec
> > > >> should be independent of the code. That's not easy to write but it's
> > > >> essential.
> > > >>
> > > >> I don't want any loose/strict modes. It should all be strict
> > according to
> > > >> spec.
> > > >>
> > > >> I've never heard of kebab cases before. Is that a common name? I'd
> > > >> also like to rename Pascal case. How many programmers under 40 have
> > > >> even heard of Pascal, much less are familiar with its case
> > > >> conventions?
> > > >>
> > > >> Long story short - a PR is premature until there's an agreed upon
> > spec.
> > > >>
> > > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <[email protected]>
> > > >> wrote:
> > > >> >
> > > >> > I have a bit of code that adds the ability to parse and format
> > strings
> > > >> into
> > > >> > various case patterns. Wanted to check if it's of worth and
> > in-scope for
> > > >> > commons-text...
> > > >> >
> > > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...)
> > method.
> > > >> > Rather than simply formatting tokens into the case, this API adds
> > the
> > > >> > additional goal of being able to transform one case to another. e.g.
> > > >> >
> > > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns
> > > >> > My_Pascal_String
> > > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns
> > > >> > mySnakeString
> > > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns
> > > >> > my-Camel-String
> > > >> > //Note that kebab and snake do not alter the alphabetic case of the
> > > >> tokens,
> > > >> > as they are essentially case agnostic joining, according to this
> > > >> > implementation. Though this can be overridden by end users.
> > > >> >
> > > >> > The API has one core interface: Case, which has format and parse
> > > >> methods.
> > > >> > There is a single abstract implementation of it -
> > > >> AbstractConfigurableCase
> > > >> > - which is a configuration driven way to create a case pattern. It
> > has
> > > >> > enough options to accommodate the 4 popular cases, and thus the
> > > >> subclasses
> > > >> > just have to configure these options rather than implement them
> > > >> directly.
> > > >> > Any further extensions can override or extend the api as necessary.
> > > >> >
> > > >> > There are five core concrete implementations:
> > > >> >
> > > >> > PascalCase
> > > >> > CamelCase (extends PascalCase)
> > > >> > DelimitedCase
> > > >> > KebabCase (extends DelimitedCase)
> > > >> > SnakeCase (extends DelimitedCase)
> > > >> >
> > > >> > Each has a static INSTANCE field to avoid redundant instantiation.
> > > >> >
> > > >> > Some of my reasoning / concerns...
> > > >> >
> > > >> > * I considered bundling all of this logic into static methods,
> > similar
> > > >> to
> > > >> > CaseUtils, but that prevents the user from truly customizing or
> > > >> extending
> > > >> > the code for odd cases. This approach is, in my opinion, far easier
> > to
> > > >> > understand, extend, and debug.
> > > >> > * I believe the parsing side should potentially have a loose /
> > strict
> > > >> mode,
> > > >> > in that the logic can ignore non-critical rules on the parsing side.
> > > >> e.g.
> > > >> > the command CamelCase.parse("MyString") should work, even though the
> > > >> input
> > > >> > is not strictly camel case. Strict parsing would ensure (if
> > possible)
> > > >> that
> > > >> > the input abides by all elements of the format.
> > > >> > * I'm still unsure about how best to handle reserved characters when
> > > >> > translating. e.g. How should
> > > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the
> > hyphen?
> > > >> > Should the kebab case strip the reserved character from the token
> > > >> values?
> > > >> >
> > > >> > Long story short - is this worth pursuing in the form of a pull
> > request
> > > >> for
> > > >> > review? Or is it out of scope for commons-text?
> > > >> >
> > > >> > Dan
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Elliotte Rusty Harold
> > > >> [email protected]
> > > >>
> > > >> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: [email protected]
> > > >> For additional commands, e-mail: [email protected]
> > > >>
> > > >>
> >
> >
> >
> > --
> > Elliotte Rusty Harold
> > [email protected]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [commons-text] Additional CaseUtils type functionality that can handle snake, kebab, camel, pascal, and others

Reply via email to