On 4/12/2016 1:25 AM, Sara Golemon wrote: > On Mon, Apr 11, 2016 at 4:09 PM, Stanislav Malyshev <smalys...@gmail.com> > wrote: >> The API looks a bit strange - new IntlCharsetDetector($text) and then >> detect(). Can't we just have detect([$text])? >> > I went with a direct wrapping of the underlying API because it always > feels like we regret adding magic eventually. It's trivial for some > composer installable library to wrap this into something nicer. In > fact, one probably WOULD use a userspace library in order to provide > fallback to mb_detect_encoding() on PHP < 7.1 > > That said, how do you feel about compromising by adding this function > in addition to the raw API? > > function ucsdet_detect_encoding(string $text, string $hint = null, > bool $filter = false) { > $det = new IntlCharsetDetector($text); > if ($hint !== null) { > $det->setDeclaredEncoding($hint); > } > $det->enableInputFiltering($filter); > return $det->detect()['name']; > } > > That'll give simplicity for the 80% case (I have a string, I want a > best guess) but still provide the true API for consideration of > confidence and/or other guesses. > > -Sara >
I agree with Stanislav here. I think that the HHVM implementation is much better because it enables dependency injection and does not fall back to arbitrary native arrays. However, the detect method should take an optional text argument directly, as Stanislav proposed. I understand the argument that extending native interfaces sometimes led to problems but simply copying stuff without thinking of better ways is also not really solving all problems. In our native language: class IntlCharsetDetector { fn __construct(string $text = null); fn setText(string $text): void; fn setDeclaredEncoding(string $encoding): void; // I incorporated your proposal here because I think it is // extremely useful. fn detect( string $text = null, string $hint = null, bool $filter = false ): IntlEncodingMatch; fn detectAll(string $text = null): array<IntlEncodingMatch>; fn inputFilterEnabled(): bool; fn enableInputFilter(): void; fn disableInputFilter(): void; fn getDetectableCharsets(): array<string>; fn enableDetectableCharset(string $charset): void; fn disableDetectableCharset(string $charset): void; static fn getAllDetectableCharsets(): array<string>; } NOTE how I changed the methods that take bool arguments int he Java implementation to direct calls. This removes the necessity to validate the arguments of the caller and results in a very clean API that does not require extensive documentation to be easily comprehensible. class IntlEncodingMatch { fn isValid(): bool; fn getEncoding(): string; fn getConfidence(): int; fn getLanguage(): string; fn getUtf8(): string; } NOTE that I also use *getUtf8* as does HHVM and not simply *getString* as it is used in the Java implementation because again it reduces the amount of documentation reading because it is clear what this method is supposed to do (return). I did not write *getUTF8* though because I think it violates the /camelCase/ rule but that is not really clarified in the PHP Coding Standards and thus hard to argue about. Having the *IntlEncodingMatch* object as a result has the advantage that one does not need to look up the documentation every time they want to handle the result, no creation of the strings to access the offsets at runtime, no error prone mistyping issues plus the added value of type hinting against it. That being said, the procedural functions could still return native arrays. I do not see a reason why the procedural versions have to be 1:1 mapping to the OO versions. -- Richard "Fleshgrinder" Fussenegger
signature.asc
Description: OpenPGP digital signature