Implementing Repeatable Read and Serializable Transaction Isolation

This article is part of a series. You do not have to read them in order but I will be referring to topics and explanations in previous articles:

  1. Implementing Your Own Transactions With MVCC
  2. SQL Transaction Isolation Levels Explained
  3. Implementing Repeatable Read and Serializable Transaction Isolation

Repeatable read and serializable are usually higher than most databases use by default (if they are available at all). You can read a full explanation of the differences in the levels in the previous article, SQL Transaction Isolation Levels Explained.

This article will focus on a real Python implementation of all four levels but mainly focused on repeatable read and serializable.

I’m going to use the code from the original article on MVCC which implemented read-committed and refactor it to allow us to set the transaction isolation. You should understand how MVCC works an how it is implemented for read committedin that article before proceeding.

You can view the entire program here.

Implementing the Isolation Levels as Classes

The majority of the logic of a transaction will remain in the Transaction class. There have been some new methods added:

We also introduce a new type of error, the RollbackException. We will use this to indicate that some race condition has occurred, that if allowed to continue would break the requirements of the chosen isolation level. This type of error becomes more common as the isolation level increases and is caught by the DBMS so that the transaction can be safely rolled back and the client notified.

It is really just an alias for Exception. However, we want to differentiate this type of error so that other errors are not mistakenly caught:

It’s not easy to demonstrate interfaces or abstract classes in Python. Two methods that are missing from the Transaction class above are record_is_locked(record) and record_is_visible(record) and are implemented by the child class. As long as we implement these two methods we can control the isolation behaviour of the transaction.

For example the most simple isolation level is read uncommitted:

It may seem like an anti-transaction pattern to read data that is not yet confirmed by another transaction. Read uncommitted is useful for large reporting queries where you really don’t want to impose any locking that would affect in-flight or new transactions. In exchange for this your totals may be a little off. Sometimes this is insignificant enough (such as processing millions of rows) that it makes sense to trade accuracy for concurrency.

Here is the implementation for read committed (if you have read the previous article this should look very familiar):

I won’t explain this now as there was already a whole article dedicated to it.

Now we can move onto the higher isolation levels that most databases do not use by default because concurrency and performance start to suffer in exchange for greater accuracy and isolations of the transactions:

To achieve repeatable read we use the same visibility (and locking) checks as read committed (ReadCommittedTransaction), but we add a read lock on every record that is read (and hence, visible) by us. It’s important we don’t add read locks to records that would otherwise not be visible to us, so that another transaction cannot remove it if we need to rescan the data.

Here is a very crude lock manager, it simply keeps track of which transactions hold a read lock on any particular row. How this lock manager performs isn’t important, and there many ways to make this mechanism work better and faster. For the purpose of this demonstration just know that there is something that remembers when transactions have seen a row:

Finally, we can implement serializable:

Once again it uses the logic of ReadCommittedTransaction (but not RepeatableReadTransaction as you might originally suspect). Serializable primarily enforces one main new restriction to prevent phantom reads. That is, we must ignore any record that is added or updated in a transaction that was created after the current transaction.

Bring On the Tests

To really demonstrate the isolation levels we need to create test cases. All of the tests start with the same fixture data:

id | name
---+-----
1 | Joe
3 | Jill

A transaction test is implemented as a class. It will setup the fixture data (in the table above) and clients on which the test will be performed. It also runs the test and returns a pretty tick or cross for the outcome (we will use this later).

Individual error scenarios (dirty reads, non-repeatable reads and phantom reads) are implemented by extending the TransactionTest and providing the run() method:

Running the complete program prints a table of results. The ✔ means that it could replicate that kind of error (a little counterintuitive, I know):

                  Dirty Repeat Phantom
read uncommitted ✔ ✔ ✔
read committed ✘ ✔ ✔
repeatable read ✘ ✘ ✔
serializable ✘ ✘ ✘

Originally published at http://elliot.land on February 19, 2017.

Written by

I’m a data nerd and TDD enthusiast originally from Sydney. Currently working for Uber in New York. My thoughts here are my own. 🤓 elliotchance@gmail.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store