Let’s discuss some important points before start learning DBMS.
What is Data ?
Data is a plural form of Datum. Datum means a piece of information. So by definition data is a collection of datum or information. Important thing to understand is information can be anything. It can be bill of your favourite take-out from a restaurant or text of Game of Thrones book or in the modern world with the concepts of Big Data or internet of things; everything around us can be converted into data. Calories you consume, time you spent on social media, number of texts you sent to your loved one, idle time of oven and it’s usage time. Everything is data and can be stored in a computer.
Where does data get stored in a computer system ?
If you are a computer student then you might have known the answer. Basically data get stored in two physical storage locations, RAM and Hard disk. RAM is volatile and mostly holds some temporary data of different computer programs until a computer is switched off . Hard disk is a permanent storage and it holds both user’s and different program’s data. However there are more physical storage area like ROM, keyboard, mouse etc. You can do a Google search if you want to know more on it. Apart from it there is one more concept i.e logical storage. It has no physical existence. People often append the word storage but it is only an imaginable concept. E.g say on a map we call different area or locations with different name to easily identify an area. In computing, this is called Filesystems or File systems or File Management System and something like imagining boundaries between different storage areas. Without a file system, information placed in a storage medium would be one large body of data with no way to tell where one piece of information stops and the next begins. By separating the data into pieces and giving each piece a name, the information is easily isolated and identified. There are many filesystems defined by the OS or Application programs according to their different structure and logic, properties of speed, flexibility, security, size and more. Both Hard disk and RAM have different file system. E.g in Hard disk it is NTFS, FAT, HFS etc and in RAM it is TMPFS etc. Also note that different OS has different file systems E.g Windows os defines NTFS where Mac os defines HFS. However Some file systems implement filename extensions as a feature of the file system itself to distinguish files more easily. E.g .txt, .jpeg etc. Filename extensions may be considered a type of metadata. They are commonly used to imply information about the way data might be stored, read and written by application . E.g Notepad in windows automatically opens the file with .txt extension.
Types of Data files
- Text files : A text file (also called ASCII files) stores information in ASCII characters. A text file contains human-readable characters. A user can read the contents of a text file or edit it using a text editor. In text files, each line of text is terminated, (delimited) with a special character known as EOL (End of Line) character. In text files some internal translations take place when this EOL character is read or written. E.g A text document
- Binary files : A binary file is a file that contains information in the same format in which the information is held in memory i.e. in the binary form. In binary file, there is no delimiter for a line. Also no translations occur in binary files. As a result,binary files are faster and easier for a program to read and write than the text files. As long as the file doesn’t need to be read or need to be ported to a different type of system, binary files are the best way to store program information. E.g A JPEG image
What is Database Management System?
As said above, different OS defines different logical concepts while storing data on physical devices for ease of indexing or retrieving of data which we call file management system (FMS), similarly some computer applications usually called as database engine like MYSQL, ORACLE etc define their own logical concepts, algorithms and boundaries for ease of indexing, retrieving and searching like more complex operations of data which we call Database Management System (DBMS).
What is Database ?
Like file is a small unit in Filesystem and acts as a container of structured or unstructured data, similarly database is a small unit in Database management system and acts as container of structured data. So file and database both are just a logical concept and somehow analogous to each other. More precisely a database is made up of two components mainly, data and a meaningful method for accessing and manipulating data. Without these two, a database is just a random set of data. A more precise example of a database can be a dictionary, which stores a large quantity of data as Key-Value pairs. At the same time, it also has a meaningful method to access data using the Key.
Where does data in database get stored and in which format?
I often see some questions around the web like ” Where does DBMS application stores its data? How is data stored in a database? Which format it follows? “. To answer those particular questions you may say DBMS engines stores their data in a physical devices like hard disk and RAM while implementing different logical concepts and algorithms for better sorting, organizing and retrieving of data. So now question arises; Can we see those databases in our computer like we see files? Yes you can. Different DBMS engines follow different ways to store their data. For example some database engines like “Microsoft access” stores entire data in single plain text file under filesystem, some engines like Mysql, Mariadb etc split those data into different plain text and binary files named as tables, indexes, log etc under filesystem, some engines like Oracle can even entirely bypass the filesystem and store their data in raw partitions of hard disk.
How DBMS is different from File system ?
You can rephrase the above question to something like ” What are the advantages of DBMS over FMS? Why we should read DBMS? File Management System vs Database Management System “. So below given some points that needs to be considered while comparing DBMS and FMS.
- Concurrency : Difficulties in different data operations like read, write etc simultaneously by several users. Say you created a simple text file in your drive and shared it with 2000 employees of your company over a LAN network. If everyone will write some data simultaneously at a particular time imagine how much difficult it is.
- Security : File system imposes very less security on data. E.g It doesn’t have any role based protection. It means every employee in you company can access to entire file and modify it. You can’t restrict to some particular part of that file. Lets say you are the CEO and you shared a file with all of your employees. You want your advisor teams to give read, write and delete access to entire file, you want your manager to read, write access to only the first page of the file. You want other employees to give only read access to last page of the file.
- Redundancy : In filesystem there may be chance of duplicate information. E.g Say you are preparing a list all of available items in your shop, you may write same item twice which will take more storage than needed unnecessarily.
- Inconsistency : Duplicate value of same data. Data redundancy leads to data inconsistency, lets take the same example that we have taken above. If you don’t update the availability of the item in every place then you may become confuse later which one is the correct availability value of that item.
- Atomicity : It states that every Operation/transaction should be atomic in nature. A Transaction can contain either a single SQL statement or multiple SQL statements. Thus by Atomic Transaction means “all or none”. Either all SQL statements/steps execute successfully in a transaction, or fail as a single unit and none of them should be treated as executed and the system should be returned to its original state. E.g If account-A & account-B both having $2000 balance, you have to transfer $1000 from account-A to account-B, this will involves 2 steps. First withdrawal from account-A, and Second deposit in account-B. Thus, both the steps should be treated as single or atomic unit and at the end account-A should have $1000 & account-B should have $3000 balance. If in case after First step the system fails or any error occurs then first step should also be rolled-back and $1000 withdrawn from account-A should be re-deposited to it, maintaining $2000 back in both the accounts. Thus there should be no intermediate state where account-A has $1000 and account-B still has $2000 balance.
Infact DBMS applications follow several mechanisms or data structure algorithms to manage the storage of data and performing various operations on those data to overcome the aforesaid limitations of file system. Though we can’t generalise their internal architecture and algorithms of storing data because it varies program to program. The most common mechanisms include B-Tree, Hash etc. More you can find here.