I’ve been busy with a project, I’ve finally got round to
writing this a week later than I intended…
In a recent conversation, someone pointed out that people
sometimes remove “constraints” from a database in order to improve performance.
This made me ask myself:
Is this a good thing, or a bad thing?
I have to admit that this is a technical change that I have
considered in the past. Never-the-less, I have mixed feelings about it.
After some thought, my opinion is:
- For many situations a constraint is redundant. The fundamental structure of many applications means they are unlikely to create orphan rows.
- The cost of the constraint is in the extra processing it causes during update operations. This cost is incurred every time a value in the constrained column is updated.
- The benefit of a constraint is that it absolutely protects the constrained column from rogue values. This may be particularly relevant if the system has components (such as load utilities or interfaces with other systems) which by-pass the normal business transactions.
- Other benefits of constraints are that they unequivocally state the “intention” of a relationship between tables and they allow diagramming tools which navigate the relationships to “do their thing”. Constraints provide good documentation, which is securely integrated with the database itself.
In short:
- The costs of constraints are small, but constant and in the immediate term.
- The benefits of constraints are avoiding a potentially large cost, but all in the future.
It’s the old “insurance” argument. Make the decision
honestly based on a proper assessment of the real risk and your attitude to
taking risks. Be lucky!
More Detailed Argument
For those who don’t just want to take my word for it. Here
is a more detailed argument.
Let’s take the “business data model” of a pretty normal “selling”
application.
When we perform the activities “Take Order” (maybe that
should be “Take ORDER”), or “Update Order”
- we create or update the ORDER and ORDER_LINE entities, and
- in addition we refer to PRODUCT (to get availability and Price) and presumably to the CUSTOMER entity which isn’t shown on the diagram.
When I translate this into a Logical data model, I impose an
additional rule “Every ORDER must contain at least 1 ORDER_LINE”. The
original business model doesn’t impose this restriction.
Remember some people do allow ORDERs with no ORDER_LINES.
They usually do it as part of a “reservation” or “priority process” which we
are not going to try and have here.
When the transaction which creates the ORDER and ORDER_LINE
makes it’s updates, then it will have read CUSTOMER and ORDER, so it is
unlikely to produce orphan records, with or without constraints.
On the other hand, by having the constraints we can document
the relationships in the database (so that a diagramming tool can produce the ERD
diagram (really I suppose that should be “Table Relationship Diagram”)).
I am left wondering whether it would be possible or
desirable to enforce my “Every ORDER
must contain at least 1 ORDER_LINE” rule. I’ll think about that further. (Note
to self: Can this be represented as a constraint which does not impose
unnecessary and unintended restrictions on creating an ORDER?)
If we don’t have constraints and we have something other
than our transaction which is allowed to create ORDERs and/or ORDER_LINEs (As I
said, typically this would be an interface with another system or some kind of
bulk load), we have no way of knowing how reliably it does it’s checking, and
we might be allowing things we really do not want into our system. Constraints
would reject faulty records and the errors they created (or “threw”) could be
trapped by the interface.
No comments:
Post a Comment