Transaction Manager, which provides APIs to begin/commit/abort transactions. More...

Detailed Description

Transaction Manager, which provides APIs to begin/commit/abort transactions.

This package is the implementation of the commit protocol, the gut of concurrency control.

Get Started

First thing first. Here's a minimal example to start one transaction in the engine.

// Example to start and commit one transaction
foedus::ErrorStack run_my_task(foedus::thread::Thread* context, ...) {
  foedus::Engine *engine = context->get_engine();
  foedus::xct::XctManager* xct_manager = engine->get_xct_manager();
  WRAP_ERROR_CODE(xct_manager->begin_xct(context, foedus::xct::kSerializable));
  ... // read/modify data. See storage module's document for examples.
  foedus::Epoch commit_epoch;
  WRAP_ERROR_CODE(xct_manager->precommit_xct(context, &commit_epoch));
  WRAP_ERROR_CODE(xct_manager->wait_for_commit(commit_epoch));
  return foedus::kRetOk;
}

Notice the wait_for_commit(commit_epoch) call. Without invoking the method, you should not consider that your transactions are committed. That's why the name of the method invoked above is "precommit_xct".

Here's a minimal example to start several transactions and commit them together, or group-commit, which is the primary usecase our engine is optimized for.

// Example to start and commit several transactions
foedus::Epoch highest_commit_epoch;
for (int i = 0; i < 1000; ++i) {
  WRAP_ERROR_CODE(xct_manager->begin_xct(context, foedus::xct::kSerializable));
  ... // read/modify data. See storage module's document for examples.
  foedus::Epoch commit_epoch;
  WRAP_ERROR_CODE(xct_manager->precommit_xct(context, &commit_epoch));
  highest_commit_epoch.store_max(commit_epoch);
}
CHECK_ERROR(xct_manager->wait_for_commit(highest_commit_epoch));

In this case, we invoke wait_for_commit() for the largest commit epoch just once at the end. This dramatically improves the throughput at the cost of latency of individual transactions.

Optimistic Concurrency Control

Our commit protocol is based on [TU13], except that we have handling of non-volatile pages. [TU13]'s commit protocol completely avoids writes to shared memory by read operations. [LARSON11] is also an optimistic concurrency control, but it still has "read-lock" bits which has to be written by read operations. In many-core NUMA environment, this might cause scalability issues, thus we employ [TU13]'s approach.

Epoch-based commit protocol

The engine maintains two global foedus::Epoch; current Epoch and durable Epoch. foedus::xct::XctManager keeps advancing current epoch periodically while the log module advances durable epoch when it confirms that all log entries up to the epoch becomes durable and also that the log module durably writes a savepoint ( Savepoint Manager ) file.

Epoch Chime

Epoch Chime advances the current global epoch when a configured interval elapses or the user explicitly requests it. The chime checks whether it can safely advance an epoch so that the following invariant always holds.

A newly started transaction will always commit with current global epoch or larger.
All running transactions will always commit with at least current global epoch - 1, called grace-period epoch, or larger.

In many cases, the invariants are trivially achieved. However, there are a few tricky cases.

There is a long running transaction that already acquired a commit-epoch but not yet exit from the pre-commit stage.
There is a worker thread that has been idle for a while.

Whenever the chime advances the epoch, we have to safely detect whether there is any transaction that might violate the invariant without causing expensive synchronization. This is done via the in-commit epoch guard. For more details, see the following class.

See also: foedus::xct::InCommitEpochGuard

Isolation Levels

See foedus::xct::IsolationLevel

Read-Only Transactions

bluh

References

[LARSON11] Perake Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. "High-Performance Concurrency Control Mechanisms for Main-Memory Databases." VLDB, 2011.

[TU13] Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. "Speedy transactions in multicore in-memory databases.", SOSP, 2013.

Collaboration diagram for Transaction Manager:

Modules
	Retrospective Lock List
	Retrospective Lock List (RLL) to avoid deadlocks.

	Physical-only, short-living system transactions.

Files
file	fwd.hpp
	Forward declarations of classes in transaction package.

file	xct_id.hpp
	Definitions of IDs in this package and a few related constant values.

Classes
struct	foedus::xct::InCommitEpochGuard
	Automatically sets in-commit-epoch with appropriate fence during pre-commit protocol. More...

class	foedus::xct::Xct
	Represents a user transaction. More...

struct	foedus::xct::PointerAccess
	Represents a record of following a page pointer during a transaction. More...

struct	foedus::xct::PageVersionAccess
	Represents a record of reading a page during a transaction. More...

struct	foedus::xct::ReadXctAccess
	Represents a record of read-access during a transaction. More...

struct	foedus::xct::WriteXctAccess
	Represents a record of write-access during a transaction. More...

struct	foedus::xct::LockFreeReadXctAccess
	Represents a record of special read-access during a transaction without any need for locking. More...

struct	foedus::xct::LockFreeWriteXctAccess
	Represents a record of special write-access during a transaction without any need for locking. More...

struct	foedus::xct::McsWwLock
	An exclusive-only (WW) MCS lock data structure. More...

struct	foedus::xct::McsRwLock
	An MCS reader-writer lock data structure. More...

struct	foedus::xct::XctId
	Persistent status part of Transaction ID. More...

struct	foedus::xct::LockableXctId
	Transaction ID, a 128-bit data to manage record versions and provide locking mechanism. More...

class	foedus::xct::XctManager
	Xct Manager class that provides API to begin/abort/commit transaction. More...

class	foedus::xct::XctManagerPimpl
	Pimpl object of XctManager. More...

class	foedus::xct::McsAdaptorConcept< RW_BLOCK >
	Defines an adapter template interface for our MCS lock classes. More...

class	foedus::xct::McsImpl< ADAPTOR, RW_BLOCK >
	Implements an MCS-locking Algorithm. More...

class	foedus::xct::McsWwImpl< ADAPTOR >
	A specialized/simplified implementation of an MCS-locking Algorithm for exclusive-only (WW) locks. More...

class	foedus::xct::McsWwOwnerlessImpl
	A ownerless (contextless) interface for McsWwImpl. More...

struct	foedus::xct::XctOptions
	Set of options for xct manager. More...

Typedefs
typedef uintptr_t	foedus::xct::UniversalLockId
	Universally ordered identifier of each lock. More...

typedef uint32_t	foedus::xct::LockListPosition
	Index in a lock-list, either RLL or CLL. More...

Enumerations
enum	foedus::xct::IsolationLevel { foedus::xct::kDirtyRead, foedus::xct::kSnapshot, foedus::xct::kSerializable }
	Specifies the level of isolation during transaction processing. More...

enum	foedus::xct::LockMode { foedus::xct::kNoLock = 0, foedus::xct::kReadLock, foedus::xct::kWriteLock }
	Represents a mode of lock. More...

Variables
const uint64_t	foedus::xct::kMaxXctOrdinal = (1ULL << 24) - 1U
	Maximum value of in-epoch ordinal. More...

Typedef Documentation

typedef uint32_t foedus::xct::LockListPosition

Index in a lock-list, either RLL or CLL.

The value zero is guaranteed to be invalid. So, lock lists using this type must reserve index-0 to be either a dummy entry or some sentinel entry. Thanks to this contract, it's easy to initialize structs holding this type.

See also: kLockListPositionNull

Definition at line 148 of file xct_id.hpp.

typedef uintptr_t foedus::xct::UniversalLockId

Universally ordered identifier of each lock.

This must follow a universally consistent order even across processes.

NodeId: ID of the NUMA node where this lock is allocated. Offset: the lock VA's offset based off of its owner node's virtual pool's start VA.

We attach the same shmem in fresh new processes in the same order and in the same machine.. most likely we get the same VA-mapping. // ASLR? Turn it off. I don't care security.

So far a unitptr_t, revisit later if we need a struct over a uint64_t. Now we just compare the whole uintptr, which might cause some socket's data are always later; ideally, we can compare only the offset part. Revisit later.

Definition at line 134 of file xct_id.hpp.

Enumeration Type Documentation

enum foedus::xct::IsolationLevel

Specifies the level of isolation during transaction processing.

May add:

COMMITTED_READ: see-epoch and read data -> fence -> check-epoch, then forget the read set
REPEATABLE_READ: assuming no-repeated-access (which we do assume), same as COMMITTED_READ

but probably they are superseded either by kDirtyRead or kSnapshot.

Enumerator

kDirtyRead

No guarantee at all for reads, for the sake of best performance and scalability.

This avoids checking and even storing read set, thus provides the best performance. However, concurrent transactions might be modifying the data the transaction is now reading. So, this has a chance of reading half-changed data. This mode prefers volatile pages if both a snapshot page and a volatile page is available. In other words, more recent but more inconsistent reads compared to kSnapshot.

kSnapshot

Snapshot isolation (SI), meaning the transaction reads a consistent and complete image of the database as of the previous snapshot.

Writes are same as kSerializable, but all reads simply follow snapshot-pointer from the root, so there is no race, no abort, no verification. Hence, higher scalability than kSerializable. However, this level can result in write skews. Choose this level if you want highly consistent reads and very high performance. TASK(Hideaki): Allow specifying which snapshot we should be based on. Low priority.

kSerializable

Protects against all anomalies in all situations.

This is the most expensive level, but everything good has a price. Choose this level if you want full correctness.

Definition at line 55 of file xct_id.hpp.

                     {
   kDirtyRead,
 
   kSnapshot,
 
   kSerializable,
 };

enum foedus::xct::LockMode

Represents a mode of lock.

The order is important. The larger lock mode includes the smaller.

Enumerator

kNoLock

taken_mode_: Not taken the lock yet.

preferred_mode_: Implies that we shouldn't take any lock on this entry in next run.

kReadLock

taken_mode_: we took a read-lock, not write-lock yet.

preferred_mode_: Implies that we should take a read-lock on this entry in next run.

kWriteLock

taken_mode_: we took a write-lock.

preferred_mode_: Implies that we should take a write-lock on this entry in next run.

Definition at line 95 of file xct_id.hpp.

               {
   kNoLock = 0,
   kReadLock,
   kWriteLock,
 };

Variable Documentation

const uint64_t foedus::xct::kMaxXctOrdinal = (1ULL << 24) - 1U

Maximum value of in-epoch ordinal.

We reserve 4 bytes in XctId, but in reality 3 bytes are more than enough. By restricting it to within 3 bytes, we can pack more information in a few places.

Definition at line 898 of file xct_id.hpp.

Referenced by foedus::xct::Xct::issue_next_id().

Detailed Description

Get Started

Optimistic Concurrency Control

Epoch-based commit protocol

Epoch Chime

Isolation Levels

Read-Only Transactions

References

Modules

Files

Classes

Typedefs

Enumerations

Variables

Typedef Documentation

Enumeration Type Documentation

Variable Documentation