Hi Tony,

On Thu, Sep 7, 2017 at 5:40 PM, Tony Marston <tonymars...@hotmail.com>
wrote:

> "Dan Ackroyd"  wrote in message news:CA+kxMuSL1kEW60S7DFJb06+r
> 2q3rc1ueewu1jap78fy65aj...@mail.gmail.com...
>
>>
>> On 6 September 2017 at 13:31, Rowan Collins <rowan.coll...@gmail.com>
>> wrote:
>>
>>> I'm going to assume that the code you posted was something of a straw
>>> man, and you're not actually advocating people copy 20 lines of code for
>>> every variable they want to validate.
>>>
>>
>> You assume wrong. No it's not, and yes I am.
>>
>> I can point a junior developer at the function and they can understand it.
>>
>> If I ask that junior developer to add an extra rule that doesn't
>> currently exist, they can without having to dive into a full library
>> of validation code.
>>
>> If I need to modify the validation based on extra input (e.g whether
>> the user has already made several purchases, or whether they're a
>> brand new signup), it's trivial to add that to the function.
>>
>> This is one of the times where code re-use through copying and pasting
>> is far superior to trying to make stuff "simple" by going through an
>> array based 'specification'. It turns out that that doesn't save much
>> time to begin with, and then becomes hard to manage when your
>> requirements get more complication.
>>
>
> As a person who has been developing database applications for several
> decades and with PHP since 2003 I'd like to chip in with my 2 cent's worth.
> Firstly I agree with Dan's statement:
>
> This type of library should be done in PHP, not in C.
>
> Secondly, there is absolutely no way that you can construct a standard
> library which can execute all the possible validation rules that may exist.
> In my not inconsiderable experience there are two types of validation:
> 1) Primary validation, where each field is validated against the column
> specifications in the database to ensure that the value can be written to
> that column without causing an error. For example this checks that a number
> is a number, a data is a date, a required field is not null, etc.
> 2) Secondary validation, where additional validation/business rules are
> applied such as comparing the values from several fields. For example, to
> check that START_DATE is not later tyhan END_DATE.
>
> Primary validation is easy to automate. I have a separate class for each
> database table, and each class contains an array of field specifications.
> This is never written by hand as it is produced by my Data Dictionary which
> imports data from the database schema then exports that data in the form of
> table class files and table structure files. When data is sent to a table
> class for inserting or updating in the database I have written a standard
> validation procedure which takes two arrays - an array of field=value pairs
> and a array of field=specifications - and then checks that each field
> conforms to its specifications. This validation procedure is built into the
> framework and executed automatically before any data is written to the
> database, so requires absolutely no intervention by the developer.
>
> Secondary validation cannot be automated, so it requires additional code
> to be inserted into the relevant validation method. There are several of
> these which are defined in my abstract table class and which are executed
> automatically at a predetermined point in the processing cycle. These
> methods are defined in the abstract class but are empty. If specific code
> is required then the empty class can be copied from the abstract class to
> the concrete class where it can be filled with the necessary code.
>
> If there are any developers out there who are still writing code to
> perform primary validation then you may learn something from my
> implementation.
>
> If there are any developers out there who think that secondary validation
> can be automated I can only say "dream on".
>

Please let me explain rationale behind input validation at outermost trust
boundary.
There are 3 reasons why I would like propose the validation. All of 3
requires
validation at outermost trust boundary.

1. Security reasons
Input validation should be done with Fail Fast manner.

2. Design by Contract (DbC or Contract Programming)
In order DbC to work, validations at outermost boundary is mandatory.
With DbC, all inputs are validated inside functions/methods to make sure
correct program executions.

However, almost all checks (in fact, all checks done by DbC support)
are disabled for production. How to make sure program works correctly?
All inputs data must be validated at outermost boundary when DbC is
disabled. Otherwise, DbC may not work. (DbC is supposed to achieve
both secure and efficient code execution.)

3. Native PHP Types
Although my validate module is designed not to do unwanted conversions,
but it converts basic types to PHP native types by default. (This can be
disabled)
With this conversion at outermost trust boundary, native PHP type works
fluently.

Although, my current primary goal is 1, but 2 and 3 is important as well.

2 is important especially. Providing DbC without proper basic validation
feature does not make much sense, and could be disaster.
Users may validate input with their own validation library, but my guess
is pessimistic. User wouldn't do proper validation due to too loose
validation libraries and rules. There are too few validators that do
true validations that meet requirements for 1 and 2. IMHO, even if
there are good enough validators, PHP should provide usable validator
for core features. (DbC is not implemented, though)

I hope you understand my intentions and accept the feature in core.
Feature for core should be in core. IMO.

> 1) Primary validation, where each field is validated against the column
specifications in the database to ensure that the value can be written to
that column without causing an error. For example this checks that a number
is a number, a data is a date, a required field is not null, etc.
> 2) Secondary validation, where additional validation/business rules are
applied such as comparing the values from several fields. For example, to
check that START_DATE is not later than END_DATE.

Validation rules for input, logic and database may differ.
Suppose you validate "user comment" data.
Input:        0 -    10240 bytes - Input might have to allow larger size
than logic. i.e. lacks client side validation.
Logic:      10 -     1024 bytes - Logic may require smaller range as
correct data.
Database: 0 - 102400 bytes - Database may allow much larger size for future
extension.

Under ideal situation, all of these may be the same but they are not in
real world.

I wouldn't aim to consolidate all validations, but I would like to avoid
unnecessary
incompatibilities so that different validations can cooperate if it is
possible.

I'm very interested in PDO level validation because SQLite3 could be very
dangerous.
(i.e. Type affinity allows store strings in int/float/date/etc) It may be
useful if PDO
can simply use "validate" module's rule or API.

BTW, Input validation should only validate format(used char, length, range,
encoding)
if we follow single responsibility principle. Logical correctness is upto
logic. i.e. Model in
MVC.

Anyway, goal is providing usable basic validator for core features and
security.
Required trade offs may be allowed.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

Reply via email to