Probably should be an IAE...? Gary
On Wed, Aug 9, 2023, 6:07 PM Elliotte Rusty Harold <elh...@ibiblio.org> wrote: > What happens when a token contains an unpermitted character? > > On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <dcwatso...@gmail.com> wrote: > > > > Here's my stab at a spec. Wanted to clarify some parts of the Case > > interface first before jumping into the implementations. Wondering what a > > good package name for this stuff is, given that "case" is a reserved > word? > > > > Case (interface) > > The Case interface defines two methods: > > * String format(Iterable<String> tokens) > > The format method accepts an Iterable of String tokens and returns a > single > > String formatted according to the implementation. The format method is > > intended to handle transforming between cases, thus tokens passed to the > > format() method need not be properly formatted for the given Case > instance, > > though they must still respect any reserve character restrictions. > > * List<String> parse(String string) > > The parse method accepts a single string and returns a List of string > > tokens that abide by the Case implementation. > > Note: format() and parse() methods must be fully reciprocal. ie. On a > > single Case instance, when calling parse() with a valid string, and > passing > > the resulting tokens into format(), a matching string should be returned. > > > > DelimitedCase (base class for kebab and snake) > > Defines a Case where all tokens are separated by a single character > > delimiter. The delimiter is considered a reserved character and is not > > allowed to appear within tokens when formatting. No further restrictions > > are placed on token contents by this base implementation. Tokens can > > contain any valid Java String character. DelimitedCases can support > > zero-length tokens, which can occur if there are no characters between > two > > instances of the delimiter or if the parsed string begins or ends with > the > > delimiter. > > Note: Other Case implementations may not support zero-length tokens, and > > attempts to call format(...) with empty tokens may fail. > > > > KebabCase > > Extends DelimitedCase and initializes the delimiter as the hyphen '-' > > character. This case allows only alphanumeric characters within tokens. > > > > SnakeCase > > Extends DelimitedCase and initializes the delimiter as the underscore '_' > > character. This case allows only alphanumeric characters within tokens. > > > > PascalCase > > Defines a Case where tokens begin with an uppercase alpha character. All > > subsequent token characters must be lowercase alpha or numeric > characters. > > Whenever an uppercase alpha character is encountered, the previous token > is > > considered complete and a new token begins, with the uppercase character > > being the first character of the new token. PascalCase does not allow > > zero-length tokens when formatting, as it would violate the reciprocal > > contract of format() and parse(). > > > > CamelCase > > Extends PascalCase and sets one additional restriction - that the first > > character of the first token (ie the first character of the full string) > > must be a lowercase alpha character (rather than the uppercase > requirement > > of PascalCase). All other restrictions of PascalCase apply. > > > > > > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <dcwatso...@gmail.com> > wrote: > > > > > Kebab case is extremely common for web identifiers, eg html element > ids, > > > classes, attributes, etc. > > > > > > In regards to PascalCase, i agree that most people won't understand the > > > reasoning behind the name, but it is nevertheless a widely accepted > term > > > for that case style. If an alternative is deemed necessary then > > > "ProperCase" might work - since that is also how English proper nouns > are > > > cased. Understanding that name just depends on your knowledge of > English > > > grammar. > > > > > > A spec can definitely be written for the 4 provided concrete > > > implementations. And... I may eat these words but... the spec should > not be > > > all that complex. I will take a stab at it. > > > > > > Thanks for the feedback! > > > Any other thoughts or comments are welcome! > > > > > > Dan > > > > > > > > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold <elh...@ibiblio.org > > > > > wrote: > > > > > >> This is a good idea and seems like useful functionality. In order to > > >> accept it into commons, it needs solid documentation and excellent > > >> test coverage. I've worked on code like this in another language (not > > >> Java) and the production bugs were bad. E.g. what happens when a > > >> string contains numbers as well as letters? > > >> > > >> I'd like to see a full spec that unambiguously defines how every > > >> Unicode string is converted into camel/snake/kebab case. The spec > > >> should be independent of the code. That's not easy to write but it's > > >> essential. > > >> > > >> I don't want any loose/strict modes. It should all be strict > according to > > >> spec. > > >> > > >> I've never heard of kebab cases before. Is that a common name? I'd > > >> also like to rename Pascal case. How many programmers under 40 have > > >> even heard of Pascal, much less are familiar with its case > > >> conventions? > > >> > > >> Long story short - a PR is premature until there's an agreed upon > spec. > > >> > > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <dcwatso...@gmail.com> > > >> wrote: > > >> > > > >> > I have a bit of code that adds the ability to parse and format > strings > > >> into > > >> > various case patterns. Wanted to check if it's of worth and > in-scope for > > >> > commons-text... > > >> > > > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...) > method. > > >> > Rather than simply formatting tokens into the case, this API adds > the > > >> > additional goal of being able to transform one case to another. e.g. > > >> > > > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns > > >> > My_Pascal_String > > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns > > >> > mySnakeString > > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns > > >> > my-Camel-String > > >> > //Note that kebab and snake do not alter the alphabetic case of the > > >> tokens, > > >> > as they are essentially case agnostic joining, according to this > > >> > implementation. Though this can be overridden by end users. > > >> > > > >> > The API has one core interface: Case, which has format and parse > > >> methods. > > >> > There is a single abstract implementation of it - > > >> AbstractConfigurableCase > > >> > - which is a configuration driven way to create a case pattern. It > has > > >> > enough options to accommodate the 4 popular cases, and thus the > > >> subclasses > > >> > just have to configure these options rather than implement them > > >> directly. > > >> > Any further extensions can override or extend the api as necessary. > > >> > > > >> > There are five core concrete implementations: > > >> > > > >> > PascalCase > > >> > CamelCase (extends PascalCase) > > >> > DelimitedCase > > >> > KebabCase (extends DelimitedCase) > > >> > SnakeCase (extends DelimitedCase) > > >> > > > >> > Each has a static INSTANCE field to avoid redundant instantiation. > > >> > > > >> > Some of my reasoning / concerns... > > >> > > > >> > * I considered bundling all of this logic into static methods, > similar > > >> to > > >> > CaseUtils, but that prevents the user from truly customizing or > > >> extending > > >> > the code for odd cases. This approach is, in my opinion, far easier > to > > >> > understand, extend, and debug. > > >> > * I believe the parsing side should potentially have a loose / > strict > > >> mode, > > >> > in that the logic can ignore non-critical rules on the parsing side. > > >> e.g. > > >> > the command CamelCase.parse("MyString") should work, even though the > > >> input > > >> > is not strictly camel case. Strict parsing would ensure (if > possible) > > >> that > > >> > the input abides by all elements of the format. > > >> > * I'm still unsure about how best to handle reserved characters when > > >> > translating. e.g. How should > > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the > hyphen? > > >> > Should the kebab case strip the reserved character from the token > > >> values? > > >> > > > >> > Long story short - is this worth pursuing in the form of a pull > request > > >> for > > >> > review? Or is it out of scope for commons-text? > > >> > > > >> > Dan > > >> > > >> > > >> > > >> -- > > >> Elliotte Rusty Harold > > >> elh...@ibiblio.org > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > >> For additional commands, e-mail: dev-h...@commons.apache.org > > >> > > >> > > > > -- > Elliotte Rusty Harold > elh...@ibiblio.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >