Implementing Repeatable Read and Serializable Transaction Isolation
This article is part of a series. You do not have to read them in order but I will be referring to topics and explanations in previous articles:
Repeatable read and serializable are usually higher than most databases use by default (if they are available at all). You can read a full explanation of the differences in the levels in the previous article, SQL Transaction Isolation Levels Explained.
This article will focus on a real Python implementation of all four levels but mainly focused on repeatable read and serializable.
I’m going to use the code from the original article on MVCC which implemented read-committed and refactor it to allow us to set the transaction isolation. You should understand how MVCC works an how it is implemented for read committedin that article before proceeding.
You can view the entire program here.
Implementing the Isolation Levels as Classes
The majority of the logic of a transaction will remain in the
Transaction class. There have been some new methods added:
We also introduce a new type of error, the
RollbackException. We will use this to indicate that some race condition has occurred, that if allowed to continue would break the requirements of the chosen isolation level. This type of error becomes more common as the isolation level increases and is caught by the DBMS so that the transaction can be safely rolled back and the client notified.
It is really just an alias for
Exception. However, we want to differentiate this type of error so that other errors are not mistakenly caught:
It’s not easy to demonstrate interfaces or abstract classes in Python. Two methods that are missing from the
Transaction class above are
record_is_visible(record) and are implemented by the child class. As long as we implement these two methods we can control the isolation behaviour of the transaction.
For example the most simple isolation level is read uncommitted:
It may seem like an anti-transaction pattern to read data that is not yet confirmed by another transaction. Read uncommitted is useful for large reporting queries where you really don’t want to impose any locking that would affect in-flight or new transactions. In exchange for this your totals may be a little off. Sometimes this is insignificant enough (such as processing millions of rows) that it makes sense to trade accuracy for concurrency.
Here is the implementation for read committed (if you have read the previous article this should look very familiar):
I won’t explain this now as there was already a whole article dedicated to it.
Now we can move onto the higher isolation levels that most databases do not use by default because concurrency and performance start to suffer in exchange for greater accuracy and isolations of the transactions:
To achieve repeatable read we use the same visibility (and locking) checks as read committed (
ReadCommittedTransaction), but we add a read lock on every record that is read (and hence, visible) by us. It’s important we don’t add read locks to records that would otherwise not be visible to us, so that another transaction cannot remove it if we need to rescan the data.
Here is a very crude lock manager, it simply keeps track of which transactions hold a read lock on any particular row. How this lock manager performs isn’t important, and there many ways to make this mechanism work better and faster. For the purpose of this demonstration just know that there is something that remembers when transactions have seen a row:
Finally, we can implement serializable:
Once again it uses the logic of
ReadCommittedTransaction (but not
RepeatableReadTransaction as you might originally suspect). Serializable primarily enforces one main new restriction to prevent phantom reads. That is, we must ignore any record that is added or updated in a transaction that was created after the current transaction.
Bring On the Tests
To really demonstrate the isolation levels we need to create test cases. All of the tests start with the same fixture data:
id | name
1 | Joe
3 | Jill
A transaction test is implemented as a class. It will setup the fixture data (in the table above) and clients on which the test will be performed. It also runs the test and returns a pretty tick or cross for the outcome (we will use this later).
Individual error scenarios (dirty reads, non-repeatable reads and phantom reads) are implemented by extending the
TransactionTest and providing the
Running the complete program prints a table of results. The ✔ means that it could replicate that kind of error (a little counterintuitive, I know):
Dirty Repeat Phantom
read uncommitted ✔ ✔ ✔
read committed ✘ ✔ ✔
repeatable read ✘ ✘ ✔
serializable ✘ ✘ ✘
Originally published at http://elliot.land on February 19, 2017.