A fellow by the name of Edgar F. Codd came up with the concept of Normalization in the early 70’s. From there additional contributions were made by Ronald Fagin, Raymond Boyce (with Codd) and C. Zaniolo. From 1981 to 2002 the Normalization front was pretty quiet until C. J. Date, Hugh Darwen and Nikos Lorentzos added the 6th normal form. All of this mainly pertains to relational databases the support transactional data. Why do this? There are lots of good reasons to apply some depth of normalizations. One of the primary ones is data integrity. Normalization through the reduction of duplicate objects within the database enforce data integrity by controlling the objects that can be introduced. Objects can anything from a single field value to the whole database. Another reason is for ease of data updates.
You have a record that describes a house. Some attributes of the record could include roof type, number of bathrooms, kitchen counter type, room floor type and so on. Of these attributes roof type, kitchen counter type and room floor type can be problematic for updates and consistency.
You’ve written the application that only allows certain values. That will keep it consistent right? Sure. That works. But what happens when you decide that “Shake” is not what the value should be? You need to update it to “Cedar Shake”. Ok, that’s simple, just replace all instances of Shake with Cedar Shake. You’re done now. No problem right? Well, what happens if you have a million records and each record has to be inspected for the value of roof type. Does it contain “Shake”? If yes, replace and if no, skip to the next record. A bit labor intensive but doable. I’ll skip the point about going into record and table locks and other nastiness. If you applied some normalization to the database and decided that something with the word “type” is a natural for separate table you’re well on your way.
A simple answer
With the same example, with a “Roof Type” table, you simply have to update the description in the type table. How does this work? With a simple type table you have two items, the Key and Value. The important thing to remember is that keys are immutable. Once created they cannot be changed. The value is updatable and can be modified. In your main table, the “Roof Type” key is stored in the house record. When you display the record, you join the main table with the type table and display the “value”.
Data Type for the key fields has been changed to int type to match the field type in each of the type tables
The final data model looks like:
This way when you update the value in any of the “Type” tables, the change is reflected n the main table without having to update it at all. This means you only have to process one update to affect every matching record. So if your table had a million records, the roof type in question existed on a quarter of those records, you updating once instead of 250,000 times. Make sense to me…
This is just a quick snack on database design and why normalize. Hopefully, there’s enough here that if you do not know what normalization is that you’ll be able to read up more on the process.
A good place to start is this overview of normalization.