libfoedus-core
FOEDUS Core Library
Transaction Manager

Transaction Manager, which provides APIs to begin/commit/abort transactions. More...

Detailed Description

Transaction Manager, which provides APIs to begin/commit/abort transactions.

This package is the implementation of the commit protocol, the gut of concurrency control.

Get Started

First thing first. Here's a minimal example to start one transaction in the engine.

// Example to start and commit one transaction
foedus::ErrorStack run_my_task(foedus::thread::Thread* context, ...) {
foedus::Engine *engine = context->get_engine();
foedus::xct::XctManager* xct_manager = engine->get_xct_manager();
... // read/modify data. See storage module's document for examples.
foedus::Epoch commit_epoch;
WRAP_ERROR_CODE(xct_manager->precommit_xct(context, &commit_epoch));
WRAP_ERROR_CODE(xct_manager->wait_for_commit(commit_epoch));
}

Notice the wait_for_commit(commit_epoch) call. Without invoking the method, you should not consider that your transactions are committed. That's why the name of the method invoked above is "precommit_xct".

Here's a minimal example to start several transactions and commit them together, or group-commit, which is the primary usecase our engine is optimized for.

// Example to start and commit several transactions
foedus::Epoch highest_commit_epoch;
for (int i = 0; i < 1000; ++i) {
... // read/modify data. See storage module's document for examples.
foedus::Epoch commit_epoch;
WRAP_ERROR_CODE(xct_manager->precommit_xct(context, &commit_epoch));
highest_commit_epoch.store_max(commit_epoch);
}
CHECK_ERROR(xct_manager->wait_for_commit(highest_commit_epoch));

In this case, we invoke wait_for_commit() for the largest commit epoch just once at the end. This dramatically improves the throughput at the cost of latency of individual transactions.

Optimistic Concurrency Control

Our commit protocol is based on [TU13], except that we have handling of non-volatile pages. [TU13]'s commit protocol completely avoids writes to shared memory by read operations. [LARSON11] is also an optimistic concurrency control, but it still has "read-lock" bits which has to be written by read operations. In many-core NUMA environment, this might cause scalability issues, thus we employ [TU13]'s approach.

Epoch-based commit protocol

The engine maintains two global foedus::Epoch; current Epoch and durable Epoch. foedus::xct::XctManager keeps advancing current epoch periodically while the log module advances durable epoch when it confirms that all log entries up to the epoch becomes durable and also that the log module durably writes a savepoint ( Savepoint Manager ) file.

Epoch Chime

Epoch Chime advances the current global epoch when a configured interval elapses or the user explicitly requests it. The chime checks whether it can safely advance an epoch so that the following invariant always holds.

In many cases, the invariants are trivially achieved. However, there are a few tricky cases.

Whenever the chime advances the epoch, we have to safely detect whether there is any transaction that might violate the invariant without causing expensive synchronization. This is done via the in-commit epoch guard. For more details, see the following class.

See also
foedus::xct::InCommitEpochGuard

Isolation Levels

See foedus::xct::IsolationLevel

Read-Only Transactions

bluh

References

Collaboration diagram for Transaction Manager:

Modules

 Retrospective Lock List
 Retrospective Lock List (RLL) to avoid deadlocks.
 
 Physical-only, short-living system transactions.
 

Files

file  fwd.hpp
 Forward declarations of classes in transaction package.
 
file  xct_id.hpp
 Definitions of IDs in this package and a few related constant values.
 

Classes

struct  foedus::xct::InCommitEpochGuard
 Automatically sets in-commit-epoch with appropriate fence during pre-commit protocol. More...
 
class  foedus::xct::Xct
 Represents a user transaction. More...
 
struct  foedus::xct::PointerAccess
 Represents a record of following a page pointer during a transaction. More...
 
struct  foedus::xct::PageVersionAccess
 Represents a record of reading a page during a transaction. More...
 
struct  foedus::xct::ReadXctAccess
 Represents a record of read-access during a transaction. More...
 
struct  foedus::xct::WriteXctAccess
 Represents a record of write-access during a transaction. More...
 
struct  foedus::xct::LockFreeReadXctAccess
 Represents a record of special read-access during a transaction without any need for locking. More...
 
struct  foedus::xct::LockFreeWriteXctAccess
 Represents a record of special write-access during a transaction without any need for locking. More...
 
struct  foedus::xct::McsWwLock
 An exclusive-only (WW) MCS lock data structure. More...
 
struct  foedus::xct::McsRwLock
 An MCS reader-writer lock data structure. More...
 
struct  foedus::xct::XctId
 Persistent status part of Transaction ID. More...
 
struct  foedus::xct::LockableXctId
 Transaction ID, a 128-bit data to manage record versions and provide locking mechanism. More...
 
class  foedus::xct::XctManager
 Xct Manager class that provides API to begin/abort/commit transaction. More...
 
class  foedus::xct::XctManagerPimpl
 Pimpl object of XctManager. More...
 
class  foedus::xct::McsAdaptorConcept< RW_BLOCK >
 Defines an adapter template interface for our MCS lock classes. More...
 
class  foedus::xct::McsImpl< ADAPTOR, RW_BLOCK >
 Implements an MCS-locking Algorithm. More...
 
class  foedus::xct::McsWwImpl< ADAPTOR >
 A specialized/simplified implementation of an MCS-locking Algorithm for exclusive-only (WW) locks. More...
 
class  foedus::xct::McsWwOwnerlessImpl
 A ownerless (contextless) interface for McsWwImpl. More...
 
struct  foedus::xct::XctOptions
 Set of options for xct manager. More...
 

Typedefs

typedef uintptr_t foedus::xct::UniversalLockId
 Universally ordered identifier of each lock. More...
 
typedef uint32_t foedus::xct::LockListPosition
 Index in a lock-list, either RLL or CLL. More...
 

Enumerations

enum  foedus::xct::IsolationLevel { foedus::xct::kDirtyRead, foedus::xct::kSnapshot, foedus::xct::kSerializable }
 Specifies the level of isolation during transaction processing. More...
 
enum  foedus::xct::LockMode { foedus::xct::kNoLock = 0, foedus::xct::kReadLock, foedus::xct::kWriteLock }
 Represents a mode of lock. More...
 

Variables

const uint64_t foedus::xct::kMaxXctOrdinal = (1ULL << 24) - 1U
 Maximum value of in-epoch ordinal. More...
 

Typedef Documentation

Index in a lock-list, either RLL or CLL.

The value zero is guaranteed to be invalid. So, lock lists using this type must reserve index-0 to be either a dummy entry or some sentinel entry. Thanks to this contract, it's easy to initialize structs holding this type.

See also
kLockListPositionNull

Definition at line 148 of file xct_id.hpp.

typedef uintptr_t foedus::xct::UniversalLockId

Universally ordered identifier of each lock.

This must follow a universally consistent order even across processes.

Currently we assume all locks come from volatile pages. The bits are used as: |–63–48–|–47—0–| |–NodeId–|–Offset–|

NodeId: ID of the NUMA node where this lock is allocated. Offset: the lock VA's offset based off of its owner node's virtual pool's start VA.

We attach the same shmem in fresh new processes in the same order and in the same machine.. most likely we get the same VA-mapping. // ASLR? Turn it off. I don't care security.

So far a unitptr_t, revisit later if we need a struct over a uint64_t. Now we just compare the whole uintptr, which might cause some socket's data are always later; ideally, we can compare only the offset part. Revisit later.

Definition at line 134 of file xct_id.hpp.

Enumeration Type Documentation

Specifies the level of isolation during transaction processing.

May add:

  • COMMITTED_READ: see-epoch and read data -> fence -> check-epoch, then forget the read set
  • REPEATABLE_READ: assuming no-repeated-access (which we do assume), same as COMMITTED_READ

but probably they are superseded either by kDirtyRead or kSnapshot.

Enumerator
kDirtyRead 

No guarantee at all for reads, for the sake of best performance and scalability.

This avoids checking and even storing read set, thus provides the best performance. However, concurrent transactions might be modifying the data the transaction is now reading. So, this has a chance of reading half-changed data. This mode prefers volatile pages if both a snapshot page and a volatile page is available. In other words, more recent but more inconsistent reads compared to kSnapshot.

kSnapshot 

Snapshot isolation (SI), meaning the transaction reads a consistent and complete image of the database as of the previous snapshot.

Writes are same as kSerializable, but all reads simply follow snapshot-pointer from the root, so there is no race, no abort, no verification. Hence, higher scalability than kSerializable. However, this level can result in write skews. Choose this level if you want highly consistent reads and very high performance. TASK(Hideaki): Allow specifying which snapshot we should be based on. Low priority.

kSerializable 

Protects against all anomalies in all situations.

This is the most expensive level, but everything good has a price. Choose this level if you want full correctness.

Definition at line 55 of file xct_id.hpp.

55  {
65  kDirtyRead,
66 
78  kSnapshot,
79 
87 };
Snapshot isolation (SI), meaning the transaction reads a consistent and complete image of the databas...
Definition: xct_id.hpp:78
No guarantee at all for reads, for the sake of best performance and scalability.
Definition: xct_id.hpp:65
Protects against all anomalies in all situations.
Definition: xct_id.hpp:86

Represents a mode of lock.

The order is important. The larger lock mode includes the smaller.

Enumerator
kNoLock 

taken_mode_: Not taken the lock yet.

preferred_mode_: Implies that we shouldn't take any lock on this entry in next run.

kReadLock 

taken_mode_: we took a read-lock, not write-lock yet.

preferred_mode_: Implies that we should take a read-lock on this entry in next run.

kWriteLock 

taken_mode_: we took a write-lock.

preferred_mode_: Implies that we should take a write-lock on this entry in next run.

Definition at line 95 of file xct_id.hpp.

95  {
100  kNoLock = 0,
105  kReadLock,
110  kWriteLock,
111 };
taken_mode_: we took a read-lock, not write-lock yet.
Definition: xct_id.hpp:105
taken_mode_: Not taken the lock yet.
Definition: xct_id.hpp:100
taken_mode_: we took a write-lock.
Definition: xct_id.hpp:110

Variable Documentation

const uint64_t foedus::xct::kMaxXctOrdinal = (1ULL << 24) - 1U

Maximum value of in-epoch ordinal.

We reserve 4 bytes in XctId, but in reality 3 bytes are more than enough. By restricting it to within 3 bytes, we can pack more information in a few places.

Definition at line 898 of file xct_id.hpp.

Referenced by foedus::xct::Xct::issue_next_id().