November 2, 2010

New Approach for a Cloud Based OLTP System

In this blog entry, I want to summarize some emerging ideas about, how OLTP can be performed on distributed systems by weakening consistency.
Scaling out servers is a common approach to achieve a higher performance or higher throughput. If systems are scaled out, more servers are used to handle workload. Using more servers often involves the usage of distributed transactions and partitioning. However, distributed transactions according to ACID which can be achieved using two phase commits are expensive, as they increase the latency of a transaction and weaken the availability or the resilience to network partitions.

CAP Theorem and ACID 2.0

According to the CAP theorem, consistency, availability, and partitioning cannot be achieved at the same time. Since availability is a crucial business factor and a system should also be tolerate network partitions, it is common in key value stores to offer reduced consistency guarantees in order to achieve full availability.
According to Helland (Helland 2007), a transaction should only involve single entities that are stored on the same instance. If a transaction should involve more than one entity, the business logic developer has to deal with potential inconsistencies. Between groups of entities the system could support store-and-forward messaging.
Helland defines a new kind of consistency that is called ACID 2.0 that stands for Associative, Commutative, Idempotent, and Distributed (Helland 2009). Main goal for ACID 2.0 is to succeed if the pieces of work happen: at least once, anywhere in the system, and in any order.

Fault Tolerance

Another issue is fault resilience. A common approach to increase fault resilience and availability is to use replication, e.g. store multiple copies of the data. However, storing two copies of data synchronously leads to the issue of increasing latency. Helland proposes to asynchronously replicate updates of data to a second server and let the business application developer cope with the probability that data may be lost in case of errors. For transactions involving low business values, it might work to live with inconsistencies or lost data. In contrast, for transactions with high business values, the system should provide guaranteed consistency and durability of a transaction.
For providing fault tolerance, single transactions need to be idempotent, which means that executing a single transaction twice should not lead to a substantive change of an entity, whereas the application shall define what substantive really means. To achieve idempotent transactions, each transaction has to be versioned or hash values over the request have to be stored, to reject transactions that have already been executed. For that purpose, Helland proposes to store all activities of related entities, e.g. the received messages.
The ideas proposed by Helland are very interesting. However, further research has to show whether it is really feasible and viable to let the business application developer cope with data inconsistencies.

References

(Helland 2007) Pat Helland: Life beyond Distributed Transactions: an Apostate's Opinion. CIDR 2007: 132-141
(Helland 2009) Pat Helland, David Campbell: Building on Quicksand. CIDR 2009

Author: Benjamin Eckart