Understanding Transaction Isolation in DynamoDB: From Concepts to Conflicts
Isolation in DynamoDB transactions or any data system for that matter is often an overlooked property but it is very important to understand, as concurrent transactions are almost unavoidable when working on distributed systems.
But what is isolation in relation to a database system? Isolation is a property of a database system that specifies how and when any change made by an operation becomes visible to another operation that is happening concurrently.
Knowing the Isolation property of a database helps to answer some critical questions like:
Whether a transaction can read uncommitted changes made by another transaction?
Can a value read by a transaction be changed by another transaction before the first one completes?
How the system will handle conflicting read and write operations between concurrent transactions?
To answer these questions databases define isolation levels which can be considered as guarantees(kind of) given to the user or application about the behavior of transactions when acting on the same item concurrently. These levels define the extent to which the operations in one transaction are isolated from those in another.
Now coming back to DynamoDB it provides two transaction isolation levels Serializable and Read-committed.
NOTE: The above Isolation levels in DynamoDB are enforced and not chosen, unlike some of the SQL databases where you can choose the preferred isolation level of a transaction. Also, another important distinction is that all the operations in the transaction are defined at once and sent to DynamoDB as a single request (TransactWrite or TransactGet) over HTTP, whereas in SQL-based databases it's a multi-step process with the transaction starting with BEGIN, then executing the commands and then deciding on whether to COMMIT or ROLLBACK.
Serializable isolation :
Serializable isolation ensures that the results of multiple concurrent operations are the same as if no operation begins until the previous one has finished, thereby giving the user or application the view that any transaction execution which does succeed will result in an end state that appears as each operation was executed serially.
Serializable isolation exists between these operations:
Any transactional operation and any standard write operation(Put, Update and DeleteItem).
Any transactional operation and any standard read operation(GetItem).
Between TransactWriteItem and TransactGetItem operation.
Here it's important to understand that in DynamoDB, serializable isolation is maintained between transactional operations themselves, and also within each standard write that is part of a BatchWriteItem operation. However, this serializable isolation does not extend to cover the entire BatchWriteItem operation as a unit in relation to other transactions.
But the above explanation raises some more questions: What happens when concurrent item-level requests are made on the same item within transactions? How will the database ensure consistency and serializability? These kinds of conflicts can occur under the following scenarios in DynamoDB:
A PutItem, UpdateItem, or DeleteItem request for an item conflicts with an ongoing TransactWriteItems request that includes the same item.
An item within a TransactWriteItems request is part of another ongoing TransactWriteItems request.
An item within a TransactGetItems request is part of an ongoing TransactWriteItems, BatchWriteItem, PutItem, UpdateItem, or DeleteItem request.
DynamoDB handles these conflicts by rejecting the request and raising exceptions, so when a PutItem, UpdateItem, or DeleteItem request is rejected, the request fails with a
TransactionConflictException. If any item-level request within TransactWriteItems or TransactGetItems is rejected, the request fails with a
TransactionCanceledException. Under these circumstances, it would be up to the programmer to decide how to handle these cases gracefully.
Read committed isolation ensures that read operations will always return committed values and the read will never present a view of the item representing a state from a transactional write which did not ultimately succeed so we'll never end up with a value that is stale or potentially rolled back.
The isolation level is read-committed between :
Any transactional operation and any read operation that involves multiple standard reads (BatchGetItem, Query, or Scan).
If a transactional write updates an item in the middle of a
BatchGetItem, Query, or Scan operation, the subsequent part of the read operation returns the newly committed value (with ConsistentRead)` or possibly a prior committed value (eventually consistent reads).
But we have to note that read-committed isolation will not prevent modifications of the item immediately after the read operation, so once we have read the value, it's possible that it will be updated by another transaction, leading to read phenomena like lost updates. To handle read phenomena like lost updates, we'll have to either rely on updates with a condition check or application-level logic. Implementing that is a separate topic on its own, so that's for some other day.
Grasping the isolation levels in DynamoDB extends beyond mere technical proficiency; it enriches our understanding of how databases manage concurrent transactions. It also piques our interest, encouraging us to delve further into data systems, and instills a sense of appreciation for the remarkable systems that we often take for granted.