Re: Does Null == ""?

Bruce Feist Wed, 17 Sep 2003 19:00:00 -0700

Bob Hall wrote:

Bruce Feist has initiated a discussion with me off the list,

Off the list by accident, by the way. I sometimes forget that in this list I need to do a REPLY ALL. I generally don't go private unless I want to avoid embarassing someone or need to discuss something genuinely of no interest to others in the group; in this case, the conversation was of general interest and reasonably civil <g>.

The conventional definition of NULL, whether or not it is included in the SQL standard, is "Not Known" or "Not Applicable". This is both precise and ambiguous: Ambiguous because it has two possible meanings, and precise because it has only those two meanings.

I'll agree with that.

an unambiguous definition of NULL, found in the SQL standard, is the value that always causes a comparison operator to return FALSE in any known DBMS.

Interesting approach. I haven't thought it through completely enough to decide whether or not I can agree that this is an unambiguous definition. For instance, I'm not sure what SQL is supposed to evaluate 0/0 as -- is it NULL, or does it invalidate the statement computing it, or is it some non-NULL null-like thing? (Rhetorical question; I don't necessarily expect an answer, although one would be nice.)

I do *not* think that it's a useful definition, though, because of the weakness you point out. It doesn't tell the designer how to use the feature. I cannot think of any case where it is useful to store a NULL if you don't know what it means other than that it's a value which when compared to other things returns UNKNOWN! In many ways, I see this as analagous to identifying the units that a numeric field is identified in; the field won't tell you whether it's in fortnights or seconds, so the documentation must.

the distinction between unknown and inapplicable can be important, in theory.

a bank may keep a record of credit limits for its customers. A database designer may be tempted to set aside a special number, say -1, to place in the credit limit column for customers who have no credit limit.

the attribute of having a credit limit is different information from the attribute of the credit limit size. These two types of data should not be in the same column. There should be a boolean column for credit limit existence (HasCreditLimit Boolean) and a currency column for the amount of the credit limit.

If you don't mind a brief red herring, consider the fact that not all RDBMSs support CHECK constraints, and in those which don't this approach gives rise to the possibility of inconsistent data: what happens if HasCreditLimit is FALSE and CreditLimit is $10000?

It's a red herring, of course, because we're letting the real world intrude upon our theoretical discussion. In any real DBMS, experienced DBDs know that there are compromises which must be made; these don't invalidate what we'd *like* to do, but sometimes are things that we must do to have a well-functioning, or at least usable, system. Lack of a CHECK constraint in the target DBMS doesn't impact what the correct logical design is.

This mistake is so consistent that I think of it as diagnostic; if someone is trying to create two types of NULLs,

I wasn't suggesting creating two types of NULLs... I was suggesting that if NULLs are permitted for a column, the DBD should indicate which meaning of NULL is in use for that column.

My disagreement with Bruce Feist is not over whether NULL is ambiguous; the definition that database designers use is explicitly ambiguous. We disagree over how to deal with the ambiguity. My contention is that it is rarely a problem in a well designed database. But I acknowledge that it sometimes may be a problem. Bruce advocates restricting the meaning of NULL and documenting the restriction.

I'm not sure what you mean by "restricting". If you mean "permitting only one of the two possible uses of NULL for a given column", then agree... and I think that *you* think that this is a good idea as well, given your above comments on permitting both being a violation of normalization.

To rephrase, I'm confused about your viewpoint. On one hand you're saying that if a field can contain both I-mark and A-mark NULLs that's a bad design choice; on the other, you're saying that since most current RDBMSs don't distinguish betwen I-mark and A-mark NULLs, trying to restrict the meaning to just one is a bad design choice. You're clearly knowlegable and have given thought to the subject, so I believe that there's something about your point of view that I'm just not getting. Please explain!

My problem with that is that the RDBMS will continue to accept NULLs and process them properly, even when they violate the restriction.

When I've needed to specify that data in a text column is unknown, I've used "Unknown" or "Not Known".

What if "Unknown" and "Not Known" are valid values for the column? I think that in most cases it would be best to have a separate boolean flag column (or two of them) if NULLs are not an option, especially if CHECK constraints are available to ensure consistency.

Bruce Feist


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Does Null == ""?

Reply via email to