On 4/12/2016 1:25 AM, Sara Golemon wrote:
> On Mon, Apr 11, 2016 at 4:09 PM, Stanislav Malyshev <smalys...@gmail.com> 
> wrote:
>> The API looks a bit strange - new IntlCharsetDetector($text) and then
>> detect(). Can't we just have detect([$text])?
>>
> I went with a direct wrapping of the underlying API because it always
> feels like we regret adding magic eventually.  It's trivial for some
> composer installable library to wrap this into something nicer.  In
> fact, one probably WOULD use a userspace library in order to provide
> fallback to mb_detect_encoding() on PHP < 7.1
> 
> That said, how do you feel about compromising by adding this function
> in addition to the raw API?
> 
> function ucsdet_detect_encoding(string $text, string $hint = null,
> bool $filter = false) {
>   $det = new IntlCharsetDetector($text);
>   if ($hint !== null) {
>     $det->setDeclaredEncoding($hint);
>   }
>   $det->enableInputFiltering($filter);
>   return $det->detect()['name'];
> }
> 
> That'll give simplicity for the 80% case (I have a string, I want a
> best guess) but still provide the true API for consideration of
> confidence and/or other guesses.
> 
> -Sara
> 

I agree with Stanislav here. I think that the HHVM implementation is
much better because it enables dependency injection and does not fall
back to arbitrary native arrays. However, the detect method should take
an optional text argument directly, as Stanislav proposed. I understand
the argument that extending native interfaces sometimes led to problems
but simply copying stuff without thinking of better ways is also not
really solving all problems.

In our native language:

    class IntlCharsetDetector {

        fn __construct(string $text = null);

        fn setText(string $text): void;

        fn setDeclaredEncoding(string $encoding): void;

        // I incorporated your proposal here because I think it is
        // extremely useful.
        fn detect(
            string $text = null,
            string $hint = null,
            bool $filter = false
        ): IntlEncodingMatch;

        fn detectAll(string $text = null): array<IntlEncodingMatch>;

        fn inputFilterEnabled(): bool;

        fn enableInputFilter(): void;

        fn disableInputFilter(): void;

        fn getDetectableCharsets(): array<string>;

        fn enableDetectableCharset(string $charset): void;

        fn disableDetectableCharset(string $charset): void;

        static fn getAllDetectableCharsets(): array<string>;

    }

NOTE how I changed the methods that take bool arguments int he Java
implementation to direct calls. This removes the necessity to validate
the arguments of the caller and results in a very clean API that does
not require extensive documentation to be easily comprehensible.

    class IntlEncodingMatch {

        fn isValid(): bool;

        fn getEncoding(): string;

        fn getConfidence(): int;

        fn getLanguage(): string;

        fn getUtf8(): string;

    }

NOTE that I also use *getUtf8* as does HHVM and not simply *getString*
as it is used in the Java implementation because again it reduces the
amount of documentation reading because it is clear what this method is
supposed to do (return). I did not write *getUTF8* though because I
think it violates the /camelCase/ rule but that is not really clarified
in the PHP Coding Standards and thus hard to argue about.

Having the *IntlEncodingMatch* object as a result has the advantage that
one does not need to look up the documentation every time they want to
handle the result, no creation of the strings to access the offsets at
runtime, no error prone mistyping issues plus the added value of type
hinting against it.

That being said, the procedural functions could still return native
arrays. I do not see a reason why the procedural versions have to be 1:1
mapping to the OO versions.

-- 
Richard "Fleshgrinder" Fussenegger

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to