What is transaction in DBMS ?
Transaction is a collection of one or more operations (READ/WRITE) that satisfy following four standard properties. These properties are often acronymed as ACID properties. These property was also discussed previously with examples. You may have a look there.
- Atomicity − ensures that all operations within a transaction are completed successfully. If an operation fails within the transaction, the whole transaction is aborted at that point and all the previous operations are rolled back to their former state. It is managed by Transaction Management Component of DBMS.
- Consistency − In database systems, a consistent transaction is one that does not violate any integrity constraints during its execution. If a transaction leaves the database in an illegal state, the whole transaction is aborted and an error is reported. E.g if a field-type in database has been defined as Integer type, throughout the transaction an operation may store a value of string type in that particular field. So database may go into an invalid state due to mismatch of field type which results in violation of consistency. At this condition transaction will be rolled back. It simply means any data written to the database during a transaction must be valid according to all defined rules. Another example, a bank transaction might involve the transfer of funds from a savings account to a checking account. After the application subtracts an amount from the savings account, the two accounts are inconsistent, and remain so until the amount is added to the checking account. When both steps are completed, a point of consistency is reached. The changes can be committed and made available to other applications. Consistency is often managed by the programer. There are some advantages and disadvantages in Concurency
- Isolation − Transactions are often executed concurrently (e.g., reading and writing to multiple tables at the same time). Isolation enables transactions to operate independently to each other. It ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. It is managed by Concurrency Control Component of DBMS.
- Durability − ensures that the result or effect of a committed transaction persists even in case of a system failure (power outage or crash ) i.e once executed successfully, the changes in the transaction are permanent. Recovery Management Component of DBMS takes care of durability. It is managed by Recovery Management Component.
How to define a transaction ?
Defining a transaction purely depends upon the DBMS softwares we are using. E.g in mysql, desired operations are encapsulated within two statements like START TRANSACTION
and COMMIT
. Don’t be worry if you are not familiar with SQL. Later on we will discuss these transactional statements in details.
Example : The group of operations given below is called as a transaction.
- START TRANSACTION;
- UPDATE accounts SET balance = balance + 100 WHERE account_id = 98;
- UPDATE accounts SET balance = balance – 100 WHERE account_id = 42;
- COMMIT;
You can see that within the above transaction there are two UPDATE
queries. Either both will be executed successfully (effectively moving a value of 100 from account 42 to account 98) or neither will. The database cannot be left in a state where only one UPDATE
was completed (which would cause an obvious and undesirable inconsistency). So on failure of an operation within the transaction, the complete group of operations are rolled back to its previous state.
Note : In most RDMS (though not all) , unless you explicitly define a transaction, each query in itself will be transactionally complete i.e single statements are auto-committed internally. If any failure happens while executing a single statement or operation, it will be rolled back to previous state.
Example : The single statement given below is a transaction.
- UPDATE accounts SET balance = balance + 100 WHERE account_id = 98
Now let’s discuss an example that may make you confused. Is the group of statements given below a transaction? Answer is NO
. Because you have to explicitly wrap it within the vendor specific transactional statement. So even if on failure of second UPDATE
the first one executed successfully.
Example : The group of statements given below is not a transaction. But individually each statement is a transaction.
- UPDATE accounts SET balance = balance + 100 WHERE account_id = 98;
- UPDATE accounts SET balance = balance – 100 WHERE account_id = 42;
Transaction states
- Active, the initial state; the transaction stays in this state while it is executing
- Partially committed, after the final statement has been executed
- Failed, after the discovery that normal execution can no longer proceed
- Aborted, after the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction
- Committed, after successful completion
Concurrency in transactions
Concurrency control is the procedure in DBMS for managing simultaneous operations without conflicting with each other. Concurrent access is quite easy if all users are just reading data. There is no way they can interfere with each other. Though for any practical database, would have a mix of READ and WRITE operations and hence the concurrency is a challenge. Concurrency control is used to address such conflicts which mostly occur with a multi-user system. It helps you to make sure that database transactions are performed concurrently without violating the data integrity of respective databases.
Therefore, concurrency control is a most important element for the proper functioning of a system where two or multiple database transactions that require access to the same data, are executed simultaneously.
Potential problems in Concurrency Control
Here, are some issues which you will likely to face while using the Concurrency Control method:
- Lost Updates (Write-Write problem) It occurs when multiple transactions select the same row and update the row based on the value selected. Watch the video here.
- Dirty Read Problem This issue occur when the second transaction selects a row which is recently updated by another uncommitted transaction. Watch this video here.
- Non-Repeatable Read occurs when a second transaction is trying to access the same row several times and reads different data each time. Watch this video here.
- Phantom Read Problem Watch this video here
Concurrency Control
Different concurrency control protocols are there to overcome above stated problems which offer different benefits between the amount of concurrency they allow and the amount of overhead that they impose.
- Lock-Based Protocols
- Two Phase
- Timestamp-Based Protocols
- Validation-Based Protocols