Hi Tony,

On Sat, Sep 9, 2017 at 6:26 PM, Tony Marston <tonymars...@hotmail.com>
wrote:

> "Yasuo Ohgaki"  wrote in message news:CAGa2bXa4UvkL-ZsLAB2bF05L
> 4q_oduixszvvyzu9nddksvt...@mail.gmail.com...
>
>>
>> Hi Tony,
>>
>> <snip>
>
>>
>>> As a person who has been developing database applications for several
>>> decades and with PHP since 2003 I'd like to chip in with my 2 cent's
>>> worth.
>>> Firstly I agree with Dan's statement:
>>>
>>> This type of library should be done in PHP, not in C.
>>>
>>> Secondly, there is absolutely no way that you can construct a standard
>>> library which can execute all the possible validation rules that may
>>> exist.
>>> In my not inconsiderable experience there are two types of validation:
>>> 1) Primary validation, where each field is validated against the column
>>> specifications in the database to ensure that the value can be written to
>>> that column without causing an error. For example this checks that a
>>> number
>>> is a number, a data is a date, a required field is not null, etc.
>>> 2) Secondary validation, where additional validation/business rules are
>>> applied such as comparing the values from several fields. For example, to
>>> check that START_DATE is not later than END_DATE.
>>>
>>>
>>> Primary validation is easy to automate. I have a separate class for each
>>> database table, and each class contains an array of field specifications.
>>> This is never written by hand as it is produced by my Data Dictionary
>>> which
>>> imports data from the database schema then exports that data in the form
>>> of
>>> table class files and table structure files. When data is sent to a table
>>> class for inserting or updating in the database I have written a standard
>>> validation procedure which takes two arrays - an array of field=value
>>> pairs
>>> and a array of field=specifications - and then checks that each field
>>> conforms to its specifications. This validation procedure is built into
>>> the
>>> framework and executed automatically before any data is written to the
>>> database, so requires absolutely no intervention by the developer.
>>>
>>> Secondary validation cannot be automated, so it requires additional code
>>> to be inserted into the relevant validation method. There are several of
>>> these which are defined in my abstract table class and which are executed
>>> automatically at a predetermined point in the processing cycle. These
>>> methods are defined in the abstract class but are empty. If specific code
>>> is required then the empty class can be copied from the abstract class to
>>> the concrete class where it can be filled with the necessary code.
>>>
>>> If there are any developers out there who are still writing code to
>>> perform primary validation then you may learn something from my
>>> implementation.
>>>
>>> If there are any developers out there who think that secondary validation
>>> can be automated I can only say "dream on".
>>>
>>>
>> Please let me explain rationale behind input validation at outermost trust
>> boundary. There are 3 reasons why I would like propose the validation.
>> All of 3
>> requires validation at outermost trust boundary.
>>
>> 1. Security reasons
>> Input validation should be done with Fail Fast manner.
>>
>
> The language should only provide the basic features which allow values to
> be validated. That is what the filter functions are for. All that is
> necessary is for user input to be validated before any attempt is made to
> write it to the database.


The reason why data should be validated at outermost trust boundary is
explained by me and other. Validation at database level is simply too late
for security purposes.

Input validations must be done at outermost boundary for the best security.
This is a secure coding best practice.

2. Design by Contract (DbC or Contract Programming)
>> In order DbC to work, validations at outermost boundary is mandatory.
>> With DbC, all inputs are validated inside functions/methods to make sure
>> correct program executions.
>>
>
> Irrelevant. DbC is a methodology which PHP was never designed to support,
> and I see no reason why it should. If you really want DbC then switch to a
> language which supports it,  or use a third-party extension which provides
> supports.


DbC is ad-hoc. No BC nor shortcomings.
All most all languages including PHP have support feature for it in some
forms.
If PHP is designed for DbC or not is irrelevant.

One can totally ignore DbC support just like some D users do, yet
DbC can achieve both better security and performance with proper design
and usage. DbC is _extremely_ useful for building solid and faster app when
it is used properly.


> However, almost all checks (in fact, all checks done by DbC support)
>> are disabled for production. How to make sure program works correctly?
>> All inputs data must be validated at outermost boundary when DbC is
>> disabled. Otherwise, DbC may not work. (DbC is supposed to achieve
>> both secure and efficient code execution.)
>>
>
> 3. Native PHP Types
>> Although my validate module is designed not to do unwanted conversions,
>> but it converts basic types to PHP native types by default. (This can be
>> disabled) With this conversion at outermost trust boundary, native PHP
>> type works
>> fluently.
>>
>
> What is the difference between a basic type and a PHP native type?


PHP native types are NULL/BOOL/INT/FLOAT/STRING/ARRAY/OBJECT.

Almost all inputs are "text" in web apps. Data comes from clients is "text".
So they are "STRING", while PHP native types are zend_bool/zend_long/double/
zend_string/hash/object.

While basic type form string is almost the same as PHP native type, but
there is a little difference. e.g. 't' is TRUE, '999999999999999999999999'
is valid as
integer, but not for PHP int type.


> Although, my current primary goal is 1, but 2 and 3 is important as well.
>>
>> 2 is important especially. Providing DbC without proper basic validation
>> feature does not make much sense, and could be disaster.
>> Users may validate input with their own validation library, but my guess
>> is pessimistic. User wouldn't do proper validation due to too loose
>> validation libraries and rules. There are too few validators that do
>> true validations that meet requirements for 1 and 2. IMHO, even if
>> there are good enough validators, PHP should provide usable validator
>> for core features. (DbC is not implemented, though)
>>
>
> It does, in the form of the filter functions.


It seems you haven't try to use filter module seriously.
It simply does not have enough feature for input validations.
e.g. You cannot validate "strings".


I hope you understand my intentions and accept the feature in core.
>> Feature for core should be in core. IMO.
>>
>
> The filter functions are already in core. How these functions are used is
> down to userland code.


I suppose filter module is not used for validations much, since
it cannot validate string without my RFC for filter.

1) Primary validation, where each field is validated against the column
>>>
>> specifications in the database to ensure that the value can be written to
>> that column without causing an error. For example this checks that a
>> number
>> is a number, a data is a date, a required field is not null, etc.
>>
>>> 2) Secondary validation, where additional validation/business rules are
>>>
>> applied such as comparing the values from several fields. For example, to
>> check that START_DATE is not later than END_DATE.
>>
>> Validation rules for input, logic and database may differ.
>> Suppose you validate "user comment" data.
>> Input:        0 -    10240 bytes - Input might have to allow larger size
>> than logic. i.e. lacks client side validation.
>> Logic:      10 -     1024 bytes - Logic may require smaller range as
>> correct data.
>> Database: 0 - 102400 bytes - Database may allow much larger size for
>> future
>> extension.
>>
>> Under ideal situation, all of these may be the same but they are not in
>> real world.
>>
>> I wouldn't aim to consolidate all validations, but I would like to avoid
>> unnecessary
>> incompatibilities so that different validations can cooperate if it is
>> possible.
>>
>
> What exactly are these "unnecessary incompatibilities"?


I don't know either now, but there would be some.


> I'm very interested in PDO level validation because SQLite3 could be very
>> dangerous.
>>
>
> Anything which is misused can be dangerous. It is almost impossible to
> provide a function and prevent stupid people from misusing it.


Correct. However, too many users are ignoring the fact SQLite3 has type
affinity
that allows strings for any types. This is just an example for better
security.


> (i.e. Type affinity allows store strings in int/float/date/etc) It may be
>> useful if PDO
>> can simply use "validate" module's rule or API.
>>
>> BTW, Input validation should only validate format(used char, length,
>> range,
>> encoding)
>> if we follow single responsibility principle. Logical correctness is upto
>> logic. i.e. Model in
>> MVC.
>>
>> Anyway, goal is providing usable basic validator for core features and
>> security.
>>
>
> If you wish to improve the filter functions ten go ahead. Anything more
> than this would be a step too far.


I did it already by RFC with PoC patch.
https://wiki.php.net/rfc/add_validate_functions_to_filter


> Required trade offs may be allowed.
>>
>
> Do not waste time by trying to add into core what should be done in
> userland code.


Proper input validation is the most important task in secure coding.
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices
Nonetheless, I rarely see app that has proper input validations. It would
be
nice to have module for it with proper document.

"All that is necessary is for user input to be validated before any attempt
 is made to write it to the database."

This fine for database, but not for app. There are too many codes that
don't even require database. Even when database is used, there are too many
cases database level validation is too late.

BTW, PHP script implemented validator cannot be faster than native C module
function. As you know, function call overhead is not cheap. We have number
of
array functions for this reason. Why not for validation which must be
called always?

Regards,

P.S. Many of us are confused what application level validation is.
Application level input validation is $_GET/$_POST/$_COOKIE/$_SERVER/$_FILES
validation _before_ they are used by app codes. "Validate" module is
intended for this.
Logic(Model in MVC) or DB level validations are another input validations.
It cannot be replaced by others with proper design. i.e. Fail Fast, Single
Responsibility
principle.

-- 
Yasuo Ohgaki
yohg...@ohgaki.net

Reply via email to