Meant to add... The reason I would favor exceptions is that the underlying implementation can be easily customized. If the user needs to allow non alphanumeric characters there is a boolean flag in the underlying abstract class (AbstractConfigurableCase) that will simply turn that validation off. I don't think we need to make any specific implementation be significantly error tolerant.
An extension of snake case to allow all characters should look like.. class MySnakeCase extends SnakeCase { MySnakeCase(){ super(). this.alphanuneric = false; } } On Wed, Aug 9, 2023, 7:29 PM Daniel Watson <dcwatso...@gmail.com> wrote: > Currently I'm planning a set of exceptions that are thrown for various > reasons. I created multiple classes to allow for clearer testing. > > ReservedCharacterException (extends InvalidCharacterException below) - > thrown specifically when a reserved character is encountered within a token. > > InvalidCharacterException (extends IllegalArgumentException) thrown > directly any time an illegal character is encountered. > > ZeroLengthTokenException (extends Illegal arg excep) - thrown when a zero > length token is encountered and Case does not support it. > > There are a few other error cases I believe. I'm not looking at the code > right this moment but I'm fairly certain about the need for the above 3. > > > On Wed, Aug 9, 2023, 6:08 PM Elliotte Rusty Harold <elh...@ibiblio.org> > wrote: > >> What happens when a token contains an unpermitted character? >> >> On Wed, Aug 9, 2023 at 8:30 PM Daniel Watson <dcwatso...@gmail.com> >> wrote: >> > >> > Here's my stab at a spec. Wanted to clarify some parts of the Case >> > interface first before jumping into the implementations. Wondering what >> a >> > good package name for this stuff is, given that "case" is a reserved >> word? >> > >> > Case (interface) >> > The Case interface defines two methods: >> > * String format(Iterable<String> tokens) >> > The format method accepts an Iterable of String tokens and returns a >> single >> > String formatted according to the implementation. The format method is >> > intended to handle transforming between cases, thus tokens passed to the >> > format() method need not be properly formatted for the given Case >> instance, >> > though they must still respect any reserve character restrictions. >> > * List<String> parse(String string) >> > The parse method accepts a single string and returns a List of string >> > tokens that abide by the Case implementation. >> > Note: format() and parse() methods must be fully reciprocal. ie. On a >> > single Case instance, when calling parse() with a valid string, and >> passing >> > the resulting tokens into format(), a matching string should be >> returned. >> > >> > DelimitedCase (base class for kebab and snake) >> > Defines a Case where all tokens are separated by a single character >> > delimiter. The delimiter is considered a reserved character and is not >> > allowed to appear within tokens when formatting. No further restrictions >> > are placed on token contents by this base implementation. Tokens can >> > contain any valid Java String character. DelimitedCases can support >> > zero-length tokens, which can occur if there are no characters between >> two >> > instances of the delimiter or if the parsed string begins or ends with >> the >> > delimiter. >> > Note: Other Case implementations may not support zero-length tokens, and >> > attempts to call format(...) with empty tokens may fail. >> > >> > KebabCase >> > Extends DelimitedCase and initializes the delimiter as the hyphen '-' >> > character. This case allows only alphanumeric characters within tokens. >> > >> > SnakeCase >> > Extends DelimitedCase and initializes the delimiter as the underscore >> '_' >> > character. This case allows only alphanumeric characters within tokens. >> > >> > PascalCase >> > Defines a Case where tokens begin with an uppercase alpha character. All >> > subsequent token characters must be lowercase alpha or numeric >> characters. >> > Whenever an uppercase alpha character is encountered, the previous >> token is >> > considered complete and a new token begins, with the uppercase character >> > being the first character of the new token. PascalCase does not allow >> > zero-length tokens when formatting, as it would violate the reciprocal >> > contract of format() and parse(). >> > >> > CamelCase >> > Extends PascalCase and sets one additional restriction - that the first >> > character of the first token (ie the first character of the full string) >> > must be a lowercase alpha character (rather than the uppercase >> requirement >> > of PascalCase). All other restrictions of PascalCase apply. >> > >> > >> > On Tue, Aug 8, 2023 at 8:55 PM Daniel Watson <dcwatso...@gmail.com> >> wrote: >> > >> > > Kebab case is extremely common for web identifiers, eg html element >> ids, >> > > classes, attributes, etc. >> > > >> > > In regards to PascalCase, i agree that most people won't understand >> the >> > > reasoning behind the name, but it is nevertheless a widely accepted >> term >> > > for that case style. If an alternative is deemed necessary then >> > > "ProperCase" might work - since that is also how English proper nouns >> are >> > > cased. Understanding that name just depends on your knowledge of >> English >> > > grammar. >> > > >> > > A spec can definitely be written for the 4 provided concrete >> > > implementations. And... I may eat these words but... the spec should >> not be >> > > all that complex. I will take a stab at it. >> > > >> > > Thanks for the feedback! >> > > Any other thoughts or comments are welcome! >> > > >> > > Dan >> > > >> > > >> > > On Tue, Aug 8, 2023, 7:45 PM Elliotte Rusty Harold < >> elh...@ibiblio.org> >> > > wrote: >> > > >> > >> This is a good idea and seems like useful functionality. In order to >> > >> accept it into commons, it needs solid documentation and excellent >> > >> test coverage. I've worked on code like this in another language (not >> > >> Java) and the production bugs were bad. E.g. what happens when a >> > >> string contains numbers as well as letters? >> > >> >> > >> I'd like to see a full spec that unambiguously defines how every >> > >> Unicode string is converted into camel/snake/kebab case. The spec >> > >> should be independent of the code. That's not easy to write but it's >> > >> essential. >> > >> >> > >> I don't want any loose/strict modes. It should all be strict >> according to >> > >> spec. >> > >> >> > >> I've never heard of kebab cases before. Is that a common name? I'd >> > >> also like to rename Pascal case. How many programmers under 40 have >> > >> even heard of Pascal, much less are familiar with its case >> > >> conventions? >> > >> >> > >> Long story short - a PR is premature until there's an agreed upon >> spec. >> > >> >> > >> On Tue, Aug 8, 2023 at 8:04 PM Daniel Watson <dcwatso...@gmail.com> >> > >> wrote: >> > >> > >> > >> > I have a bit of code that adds the ability to parse and format >> strings >> > >> into >> > >> > various case patterns. Wanted to check if it's of worth and >> in-scope for >> > >> > commons-text... >> > >> > >> > >> > Its a bit broader than the existing CaseUtils.toCamelCase(...) >> method. >> > >> > Rather than simply formatting tokens into the case, this API adds >> the >> > >> > additional goal of being able to transform one case to another. >> e.g. >> > >> > >> > >> > SnakeCase.format(PascalCase.parse("MyPascalString")); // returns >> > >> > My_Pascal_String >> > >> > CamelCase.format(SnakeCase.parse("my_snake_string")); // returns >> > >> > mySnakeString >> > >> > KebabCase.format(CamelCase.parse("myCamelString")); // returns >> > >> > my-Camel-String >> > >> > //Note that kebab and snake do not alter the alphabetic case of the >> > >> tokens, >> > >> > as they are essentially case agnostic joining, according to this >> > >> > implementation. Though this can be overridden by end users. >> > >> > >> > >> > The API has one core interface: Case, which has format and parse >> > >> methods. >> > >> > There is a single abstract implementation of it - >> > >> AbstractConfigurableCase >> > >> > - which is a configuration driven way to create a case pattern. It >> has >> > >> > enough options to accommodate the 4 popular cases, and thus the >> > >> subclasses >> > >> > just have to configure these options rather than implement them >> > >> directly. >> > >> > Any further extensions can override or extend the api as necessary. >> > >> > >> > >> > There are five core concrete implementations: >> > >> > >> > >> > PascalCase >> > >> > CamelCase (extends PascalCase) >> > >> > DelimitedCase >> > >> > KebabCase (extends DelimitedCase) >> > >> > SnakeCase (extends DelimitedCase) >> > >> > >> > >> > Each has a static INSTANCE field to avoid redundant instantiation. >> > >> > >> > >> > Some of my reasoning / concerns... >> > >> > >> > >> > * I considered bundling all of this logic into static methods, >> similar >> > >> to >> > >> > CaseUtils, but that prevents the user from truly customizing or >> > >> extending >> > >> > the code for odd cases. This approach is, in my opinion, far >> easier to >> > >> > understand, extend, and debug. >> > >> > * I believe the parsing side should potentially have a loose / >> strict >> > >> mode, >> > >> > in that the logic can ignore non-critical rules on the parsing >> side. >> > >> e.g. >> > >> > the command CamelCase.parse("MyString") should work, even though >> the >> > >> input >> > >> > is not strictly camel case. Strict parsing would ensure (if >> possible) >> > >> that >> > >> > the input abides by all elements of the format. >> > >> > * I'm still unsure about how best to handle reserved characters >> when >> > >> > translating. e.g. How should >> > >> > KebabCase.format(PascalCase.parse("MyPascal-String")) handle the >> hyphen? >> > >> > Should the kebab case strip the reserved character from the token >> > >> values? >> > >> > >> > >> > Long story short - is this worth pursuing in the form of a pull >> request >> > >> for >> > >> > review? Or is it out of scope for commons-text? >> > >> > >> > >> > Dan >> > >> >> > >> >> > >> >> > >> -- >> > >> Elliotte Rusty Harold >> > >> elh...@ibiblio.org >> > >> >> > >> --------------------------------------------------------------------- >> > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> > >> For additional commands, e-mail: dev-h...@commons.apache.org >> > >> >> > >> >> >> >> >> -- >> Elliotte Rusty Harold >> elh...@ibiblio.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >>