In previous article we saw different type of DBMS architecture i.e how we design a complete solution using entire Database Management System( Database + DBMS application + UI ). Here we will read the architecture of database.
Database model provides an idea or a blueprint that how data should be processed and stored in DBMS. It works as a reference for both developers and DBMS application.
Keep in mind it is just a pseudo or logical concept. Previously we have already discussed how data are stored in a database physically.
Let’s take an example of building a house. The architect presents a small model to both builder and massion before building a house. The builder never understand that what is the measurement and all. He only see the overall design. And imagines that the real house should be looking something like this small model. While the massions and labours don’t care much about the designing. They follow that small model as a reference for the measurement only and start bricking the house.
Similarly a developer never know how data are processed and stored in a database. He/She follow a database model as a reference and consider that data might be storing and processing something like that model.
In the other side, like the massion and labour, database application took that model as reference to apply the particular logic and algorithms related to that model to process and store data in backend.
Type of Database Model
There are many types of database models. Some of them are outdated and some are heavily used. In modern days, one most common model is Relational model.
- Relational model
- Hierarchical database model
- Network model
- Object-oriented database model
- Entity-relationship model
- Document model
- Entity-attribute-value model
- Star schema
- Object-relational model
You may choose to describe a database with any one of these depending on several factors. The biggest factor is whether the database management system application you are using supports a particular model. Most database management system applications are built with a particular data model in mind and require their users to adopt that model, although some do support multiple models.
First thing first; why it’s name is relational ? Because it is based on a pure mathematical concept “Relations” in “Set theory“. This was indeed invented by E.F Codd, an English computer scientist who, while working for IBM, first realized that the discipline of mathematics could be used to inject some solid principles and rigor into the field of database management system.
Note: Before going to the original theory of the relational model you should be already aware of some terminologies in Set Theory E.g Relation, Attribute, Tuple,which are also involved with this model.
By definition, the fundamental assumption of the relational model is that all data is represented as mathematical n–ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL). Data are operated upon by means of a relational calculus or relational algebra, these being equivalent in expressive power.
A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of unique, non-duplicated items. An n-tuple is a sequence (or ordered list) of n elements, where n is a non-negative integer.
Later on, when the SQL language was introduced as a way to interact with relational databases. In SQL, the terms “relation”, “attribute”, “tuple” got replaced with generally more understandable terms: “table”, “column”, “row”.
A common misconception is that the name “relational” has to do with relationships between tables. A relation in the relational model is what SQL calls a table. The two are not synonymous. You could say that a table is just an attempt by SQL to represent a relation for ease visualisation of data for developers. However SQL deviates a lot of rules from the pure Relational model given by Codd. Read here to know what are those points. To avoid such deviation, Codd came up with twelve rules of his own, which according to him, a database must obey in order to be regarded as a true relational database.
Codd’s 12 rules:
Rule 0: The foundation rule:
- For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage data bases entirely through its relational capabilities.
Rule 1: The information rule:
- All information in a relational database is represented explicitly at the logical level and in exactly one way – by values in tables.
Rule 2: The guaranteed access rule:
- Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
Rule 3: Systematic treatment of null values:
- Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
- The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
Rule 5: The comprehensive data sublanguage rule:
- A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items:
- Data definition.
- View definition.
- Data manipulation (interactive and by program).
- Integrity constraints.
- Transaction boundaries (begin, commit and rollback).
Rule 6: The view updating rule:
- All views that are theoretically updatable are also updatable by the system.
Rule 7: Possible for high-level insert, update, and delete:
- The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.
Rule 8: Physical data independence:
- Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.
Rule 9: Logical data independence:
- Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Rule 10: Integrity independence:
- Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.
Rule 11: Distribution independence:
- The end-user must not be able to see that the data is distributed over various locations. Users should always get the impression that the data is located at one site only.
Rule 12: The non-subversion rule:
- If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).
Importance of “Keys” in Relational model
Keys are very important part of Relational database model. They are used to establish and identify relationships between tables and also to uniquely identify any record or row of data inside a table. A Key can be a single attribute or a group of attributes. Let’s discuss some of those important keys below.
Super Key : A superkey is a set of attributes that can uniquely identify a row in a table. E.g some super keys for the above table are :
[ student_id ]
[ student_id, name ]
[ student_id, name, phone ]
[ student_id, name, phone, age]
[ name, phone ]
[ phone, age ]
You can see every set given above can uniquely identify a row. You may be confused why we are adding some unnecessary attributes while we can use only one attribute “student_id” or “phone” to uniquely identify a row. Yes, you may use only one, but what is your problem if we add other attributes. The main point is if all of the above individually can uniquely identify a row or not. If yes then that is a superkey by definition.
Candidate Key : Candidate key is a set of minimal no of attributes that can uniquely identify each row in a table. So the question you asked in superkey has an answer here. Candidate key consists of only the necessary attributes. E.g some candidate key for the above table are :
[ student_id ]
[ phone ]
Entity-relationship model ( ER model )
As we already discussed, data models are just a small design that gives us an conceptual idea how data should be stored or processed. ER model says data in databases are stored as entities specifying different relationship among them. This model is usually represented in graphical form as boxes (entities) that are connected by lines (relationships) which express the associations and dependencies between entities. I suggest you to watch these video how to represent a database using ER model and how can we convert ER model to Relational model.
Strong and weak entity