libfoedus-core
FOEDUS Core Library
Storage Manager

Storage Manager, which implements a couple of key/value stores. More...

Detailed Description

Storage Manager, which implements a couple of key/value stores.

This package contains core classes that store key/value pairs. This layer has no idea about what is stored, thus no notion of columns either.

Types of Storage

So far we provide four types of storages; Array Storage, Hashtable Storage, Sequential Storage, and Masstree Storage. Choose the type of storage based on the data to store and access patterns.

General Recommendation
In general, you should use Hashtable Storage or Masstree Storage for most tables/indexes. If the access pattern contains no range accesses, equality on prefix, nor non-equality, then pick Hashtable Storage. Otherwise, Masstree Storage. Use Array Storage and Sequential Storage where they shine; when you literally needs arrays and append-only data.
Collaboration diagram for Storage Manager:

Modules

 Array Storage
 Array Storage, a dense and regular array.
 
 Hashtable Storage
 Hashtable Storage, a concurrent hashtable.
 
 Masstree Storage
 Masstree Storage, 64-bit B-tries with internal B-trees.
 
 Sequential Storage
 Sequential Storage, an append/scan only data structure.
 

Files

file  fwd.hpp
 Forward declarations of classes in storage package.
 
file  storage_id.hpp
 Definitions of IDs in this package and a few related constant values.
 
file  storage_log_types.hpp
 Declares common log types for all (or at least multiple) storage types.
 

Classes

struct  foedus::storage::Composer::ComposeArguments
 Arguments for compose() More...
 
struct  foedus::storage::Composer::ConstructRootArguments
 Arguments for construct_root() More...
 
struct  foedus::storage::Composer::DropVolatilesArguments
 Arguments for drop_volatiles() More...
 
struct  foedus::storage::Composer::DropResult
 Retrun value of drop_volatiles() More...
 
class  foedus::storage::Composer
 Represents a logic to compose a new version of data pages for one storage. More...
 
struct  foedus::storage::Metadata::SnapshotThresholds
 Tuning parameters related to snapshotting. More...
 
struct  foedus::storage::Metadata
 Metadata of one storage. More...
 
struct  foedus::storage::PageVersion
 Just a synonym of XctId to be used as a page lock mechanism. More...
 
struct  foedus::storage::PageHeader
 Just a marker to denote that a memory region represents a data page. More...
 
struct  foedus::storage::Page
 Just a marker to denote that the memory region represents a data page. More...
 
struct  foedus::storage::VolatilePageInitArguments
 Set of arguments, both inputs and outputs, given to each volatile page initializer. More...
 
struct  foedus::storage::Partitioner::DesignPartitionArguments
 Arguments for design_partition() More...
 
struct  foedus::storage::Partitioner::PartitionBatchArguments
 Arguments for partition_batch() More...
 
struct  foedus::storage::Partitioner::SortBatchArguments
 Arguments for sort_batch() More...
 
class  foedus::storage::Partitioner
 Partitioning and sorting logic for one storage. More...
 
struct  foedus::storage::PartitionerMetadata
 Tiny metadata of partitioner for every storage used while log gleaning. More...
 
struct  foedus::storage::Record
 Represents one record in our key-value store. More...
 
struct  foedus::storage::StorageControlBlock
 A base layout of shared data for all storage types. More...
 
class  foedus::storage::Storage< CONTROL_BLOCK >
 Represents one key-value store. More...
 
struct  foedus::storage::VolatilePagePointer
 Represents a pointer to a volatile page with modification count for preventing ABA. More...
 
struct  foedus::storage::DualPagePointer
 Represents a pointer to another page (usually a child page). More...
 
struct  foedus::storage::DropLogType
 Log type of DROP STORAGE operation. More...
 
struct  foedus::storage::CreateLogType
 Base type for CREATE STORAGE operation. More...
 
class  foedus::storage::StorageManager
 Storage Manager class that provides API to create/open/close/drop key-value stores. More...
 
class  foedus::storage::StorageManagerPimpl
 Pimpl object of StorageManager. More...
 
struct  foedus::storage::StorageOptions
 Set of options for storage manager. More...
 

Typedefs

typedef void(* foedus::storage::VolatilePageInit) (const VolatilePageInitArguments &args)
 A function pointer to initialize a volatile page. More...
 
typedef uint32_t foedus::storage::StorageId
 Unique ID for storage. More...
 
typedef thread::ThreadGroupId foedus::storage::PartitionId
 As partition=NUMA node, this is just a synonym of foedus::thread::ThreadGroupId. More...
 
typedef uint64_t foedus::storage::SnapshotPagePointer
 Page ID of a snapshot page. More...
 
typedef uint64_t foedus::storage::SnapshotLocalPageId
 Represents a local page ID in each one snapshot file in some NUMA node. More...
 
typedef uint32_t foedus::storage::Checksum
 Checksum of a snapshot page. More...
 

Enumerations

enum  foedus::storage::PageType {
  foedus::storage::kUnknownPageType = 0, foedus::storage::kArrayPageType = 1, foedus::storage::kMasstreeIntermediatePageType = 2, foedus::storage::kMasstreeBorderPageType = 3,
  foedus::storage::kSequentialPageType = 4, foedus::storage::kSequentialRootPageType = 5, foedus::storage::kHashIntermediatePageType = 6, foedus::storage::kHashDataPageType = 7,
  foedus::storage::kHashComposedBinsPageType = 8, foedus::storage::kDummyLastPageType
}
 The following 1-byte value is stored in the common page header. More...
 
enum  foedus::storage::StorageType {
  foedus::storage::kInvalidStorage = 0, foedus::storage::kArrayStorage, foedus::storage::kHashStorage, foedus::storage::kMasstreeStorage,
  foedus::storage::kSequentialStorage
}
 Type of the storage, such as hash. More...
 
enum  foedus::storage::StorageStatus { foedus::storage::kNotExists = 0, foedus::storage::kExists, foedus::storage::kMarkedForDeath }
 Status of a storage. More...
 

Functions

Page * foedus::storage::to_page (const void *address)
 super-dirty way to obtain Page the address belongs to. More...
 
void foedus::storage::prefetch_page_l2 (const void *page)
 Prefetch a page to L2/L3 cache. More...
 
const char * foedus::storage::to_storage_type_name (StorageType type)
 Gives a string representation of StorageType. More...
 

Variables

const uint16_t foedus::storage::kPageSize = 1 << 12
 A constant defining the page size (in bytes) of both snapshot pages and volatile pages. More...
 

Class Documentation

struct foedus::storage::Composer::ComposeArguments

Arguments for compose()

Definition at line 95 of file composer.hpp.

Collaboration diagram for foedus::storage::Composer::ComposeArguments:
Class Members
Epoch base_epoch_ All log entries in this inputs are assured to be after this epoch.

Also, it is assured to be within 2^16 from this epoch.

SortedBuffer *const * log_streams_ Sorted runs.
uint32_t log_streams_count_ Number of sorted runs.
SnapshotFileSet * previous_snapshot_files_ To read existing snapshots.
Page * root_info_page_ [OUT] Returns pointers and related information that is required to construct the root page.

The data format depends on the composer. In all implementations, the information must fit in one page (should be, otherwise we can't have a root page)

SnapshotWriter * snapshot_writer_ Writes out composed pages.
AlignedMemory * work_memory_ Working memory to be used in this method.

Automatically expand if needed.

struct foedus::storage::Composer::ConstructRootArguments

Arguments for construct_root()

Definition at line 124 of file composer.hpp.

Collaboration diagram for foedus::storage::Composer::ConstructRootArguments:
Class Members
LogGleanerResource * gleaner_resource_ All pre-allocated resouces to help run construct_root(), such as memory buffers.
SnapshotPagePointer * new_root_page_pointer_ [OUT] Returns pointer to new root snapshot page/
SnapshotFileSet * previous_snapshot_files_ To read existing snapshots.
const Page *const * root_info_pages_ Root info pages output by compose()
uint32_t root_info_pages_count_ Number of root info pages.
SnapshotWriter * snapshot_writer_ Writes out composed pages.
struct foedus::storage::VolatilePageInitArguments

Set of arguments, both inputs and outputs, given to each volatile page initializer.

Definition at line 365 of file page.hpp.

Collaboration diagram for foedus::storage::VolatilePageInitArguments:
Class Members
Thread * context_ [IN] Thread on which the procedure is running
uint16_t index_in_parent_ [IN] Some index (meaning depends on page type) of pointer in parent page to the new page.
Page * page_ [IN, OUT] The new page to initialize.
VolatilePagePointer page_id [IN] New page ID
const Page * parent_ [IN] Parent of the new page.
struct foedus::storage::Partitioner::DesignPartitionArguments

Arguments for design_partition()

Definition at line 101 of file partitioner.hpp.

Collaboration diagram for foedus::storage::Partitioner::DesignPartitionArguments:
Class Members
SnapshotFileSet * snapshot_files_
AlignedMemory * work_memory_ Working memory to be used in this method.

Automatically expand if needed.

struct foedus::storage::Partitioner::PartitionBatchArguments

Arguments for partition_batch()

Definition at line 116 of file partitioner.hpp.

Collaboration diagram for foedus::storage::Partitioner::PartitionBatchArguments:
Class Members
PartitionId local_partition_ The node the caller (mapper) resides in.
const LogBuffer & log_buffer_ Converts from positions to physical pointers.
const BufferPosition * log_positions_ positions of log records.

All of them must be logs of this storage.

uint32_t logs_count_ number of entries to process.
PartitionId * results_ [OUT] this method will set the partition of logs[i] to results[i].
struct foedus::storage::Partitioner::SortBatchArguments

Arguments for sort_batch()

Definition at line 140 of file partitioner.hpp.

Collaboration diagram for foedus::storage::Partitioner::SortBatchArguments:
Class Members
Epoch base_epoch_ All log entries in this inputs are assured to be after this epoch.

Also, it is assured to be within 2^16 from this epoch. Even with 10 milliseconds per epoch, this corresponds to more than 10 hours. Snapshot surely happens more often than that.

const LogBuffer & log_buffer_ Converts from positions to physical pointers.
const BufferPosition * log_positions_ positions of log records.

All of them must be logs of this storage.

uint32_t logs_count_ number of entries to process.
uint32_t longest_key_length_ [masstree/hash] longest key length in the log entries.
BufferPosition * output_buffer_ sorted results are written to this variable.

the buffer size is at least of log_positions_count_.

uint32_t shortest_key_length_ [masstree/hash] shortest key length in the log entries.
AlignedMemory * work_memory_ Working memory to be used in this method.

Automatically expand if needed.

uint32_t * written_count_ [OUT] how many logs written to output_buffer.

If there was no compaction, this will be same as log_positions_count_.

Typedef Documentation

typedef uint32_t foedus::storage::Checksum

Checksum of a snapshot page.

Each snapshot page contains this checksum to be verified when we read it from media or occasionally.

Definition at line 174 of file storage_id.hpp.

typedef thread::ThreadGroupId foedus::storage::PartitionId

As partition=NUMA node, this is just a synonym of foedus::thread::ThreadGroupId.

Definition at line 65 of file storage_id.hpp.

Represents a local page ID in each one snapshot file in some NUMA node.

This is the low 40 bits of SnapshotPagePointer. Just an alias for readability. This value must not exceed 2^40. 0 means null pointer (which is a valid value).

Definition at line 89 of file storage_id.hpp.

Page ID of a snapshot page.

Snapshot Page ID is a 64-bit integer. The high 16 bits indicate which snapshot it is in (foedus::snapshot::SnapshotId). As we periodically merge all snapshots, we won't have 2^16 snapshots at one time. The mid 8 bits indicate the NUMA node ID, or which file it is in. We have one snapshot file per NUMA node, so it won't be more than 2^8. The last 40 bits indicate page offset in the file. So, one file in one snapshot must be within 4kb * 2^40 = 4PB, which is surely the case.

Definition at line 79 of file storage_id.hpp.

typedef uint32_t foedus::storage::StorageId

Unique ID for storage.

StorageId is an unsigned integer starting from 1. Value 0 is an always invalid ID (empty). Name of storage is also unique, but it's an auxiliary information for convenience. The primary way to identify storages is this StorageId.

Definition at line 55 of file storage_id.hpp.

typedef void(* foedus::storage::VolatilePageInit) (const VolatilePageInitArguments &args)

A function pointer to initialize a volatile page.

Used as a callback argument to follow_page_pointer. This is used when a method might initialize a volatile page (eg following a page pointer). Page initialization depends on page type and many of them need custom logic, so we made it a callback.

Definition at line 387 of file page.hpp.

Enumeration Type Documentation

The following 1-byte value is stored in the common page header.

These values are stored in snapshot pages too, so we must keep values' compatibility. Specify values for that reason.

Enumerator
kUnknownPageType 
kArrayPageType 
kMasstreeIntermediatePageType 
kMasstreeBorderPageType 
kSequentialPageType 
kSequentialRootPageType 
kHashIntermediatePageType 
kHashDataPageType 
kHashComposedBinsPageType 
kDummyLastPageType 

Definition at line 46 of file page.hpp.

Status of a storage.

Enumerator
kNotExists 

Initial state, which means the storage has not been created yet.

kExists 

The storage has been created and ready for use.

kMarkedForDeath 

The storage has been marked for drop and can't be used.

The status becomes kNotExists at next restart. the system. So far there is no path from this state to kNotExists. No reuse of StorageId.

Definition at line 154 of file storage_id.hpp.

154  {
156  kNotExists = 0,
158  kExists,
165 };
The storage has been marked for drop and can't be used.
Definition: storage_id.hpp:164
The storage has been created and ready for use.
Definition: storage_id.hpp:158
Initial state, which means the storage has not been created yet.
Definition: storage_id.hpp:156

Type of the storage, such as hash.

Enumerator
kInvalidStorage 

0 indicates invalid type.

kArrayStorage 

Array Storage

kHashStorage 

Hashtable Storage

kMasstreeStorage 

Masstree Storage

kSequentialStorage 

Sequential Storage

Definition at line 122 of file storage_id.hpp.

122  {
124  kInvalidStorage = 0,
128  kHashStorage,
133 };
0 indicates invalid type.
Definition: storage_id.hpp:124

Function Documentation

void foedus::storage::prefetch_page_l2 ( const void *  page)
inline

Prefetch a page to L2/L3 cache.

Parameters
[in]pagepage to prefetch.

In order to prefetch a specific part of the page to L1, directly use foedus::assorted::prefetch_cacheline().

Definition at line 34 of file page_prefetch.hpp.

References foedus::assorted::kCachelineSize, foedus::storage::kPageSize, and foedus::assorted::prefetch_l2().

Referenced by foedus::storage::array::ArrayStoragePimpl::prefetch_pages(), foedus::storage::masstree::MasstreeStoragePimpl::prefetch_pages_follow(), foedus::storage::masstree::MasstreeStoragePimpl::prefetch_pages_normalized(), and foedus::storage::array::ArrayStoragePimpl::prefetch_pages_recurse().

34  {
36 }
const uint16_t kCachelineSize
Byte count of one cache line.
Definition: cacheline.hpp:42
const uint16_t kPageSize
A constant defining the page size (in bytes) of both snapshot pages and volatile pages.
Definition: storage_id.hpp:45
void prefetch_l2(const void *address, int cacheline_count)
Prefetch multiple contiguous cachelines to L2/L3 cache.
Definition: cacheline.hpp:79

Here is the call graph for this function:

Here is the caller graph for this function:

Page* foedus::storage::to_page ( const void *  address)
inline

super-dirty way to obtain Page the address belongs to.

because all pages are 4kb aligned, we can just divide and multiply.

Definition at line 395 of file page.hpp.

References foedus::storage::kPageSize.

Referenced by foedus::storage::masstree::MasstreeCommonLogType::apply_record_prepare(), foedus::xct::SysxctLockList::assert_sorted_impl(), foedus::storage::assert_within_valid_volatile_page_impl(), foedus::xct::RetrospectiveLockList::construct(), foedus::storage::masstree::extract_page_layer(), foedus::xct::RwLockableXctId::hotter(), foedus::xct::RwLockableXctId::is_hot(), foedus::xct::lock_assert_sorted(), foedus::xct::Xct::on_record_read(), foedus::storage::masstree::MasstreeStoragePimpl::prefetch_pages_follow(), foedus::xct::to_universal_lock_id(), foedus::storage::hash::HashStoragePimpl::track_moved_record(), and foedus::storage::masstree::MasstreeStoragePimpl::track_moved_record().

395  {
396  uintptr_t int_address = reinterpret_cast<uintptr_t>(address);
397  uint64_t aligned_address = static_cast<uint64_t>(int_address) / kPageSize * kPageSize;
398  return reinterpret_cast<Page*>(
399  reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(aligned_address)));
400 }
const uint16_t kPageSize
A constant defining the page size (in bytes) of both snapshot pages and volatile pages.
Definition: storage_id.hpp:45

Here is the caller graph for this function:

const char* foedus::storage::to_storage_type_name ( StorageType  type)
inline

Gives a string representation of StorageType.

Definition at line 139 of file storage_id.hpp.

References foedus::storage::kArrayStorage, foedus::storage::kHashStorage, foedus::storage::kInvalidStorage, foedus::storage::kMasstreeStorage, and foedus::storage::kSequentialStorage.

Referenced by foedus::storage::operator<<().

139  {
140  switch (type) {
141  case kInvalidStorage: return "Invalid";
142  case kArrayStorage: return "Array";
143  case kHashStorage: return "Hash";
144  case kMasstreeStorage: return "Masstree";
145  case kSequentialStorage: return "Sequential";
146  default: return "Unknown";
147  }
148 }
0 indicates invalid type.
Definition: storage_id.hpp:124

Here is the caller graph for this function:

Variable Documentation

const uint16_t foedus::storage::kPageSize = 1 << 12

A constant defining the page size (in bytes) of both snapshot pages and volatile pages.

This number must be at least 4kb (2^12) because that's Linux's page alignment.

Definition at line 45 of file storage_id.hpp.

Referenced by foedus::storage::masstree::MasstreeCommonLogType::apply_record_prepare(), foedus::storage::assert_aligned_page(), foedus::storage::hash::HashDataPage::assert_entries_impl(), foedus::storage::hash::ComposedBinsBuffer::assure_read_buffer_size(), foedus::memory::PagePoolPimpl::attach(), foedus::storage::hash::HashComposer::construct_root(), foedus::storage::masstree::MasstreeComposer::construct_root(), foedus::storage::hash::HashTmpBin::create_memory(), foedus::storage::hash::HashDataPage::create_record_in_snapshot(), foedus::storage::masstree::MasstreePartitioner::design_partition(), foedus::storage::array::ArrayComposeContext::execute(), foedus::storage::hash::HashComposeContext::execute(), foedus::thread::ThreadPimpl::find_or_read_a_snapshot_page(), foedus::thread::ThreadPimpl::find_or_read_snapshot_pages_batch(), foedus::storage::hash::HashStoragePimpl::follow_page_bin_head(), foedus::snapshot::SnapshotWriter::get_intermediate_size(), foedus::snapshot::SnapshotWriter::get_page_size(), foedus::storage::hash::HashComposeContext::HashComposeContext(), foedus::storage::hash::ComposedBinsMergedStream::init(), foedus::storage::sequential::SequentialStoragePimpl::initialize_head_tail_pages(), foedus::memory::NumaNodeMemory::initialize_once(), foedus::snapshot::MergeSort::initialize_once(), foedus::storage::array::ArrayPage::initialize_snapshot_page(), foedus::storage::hash::HashIntermediatePage::initialize_snapshot_page(), foedus::storage::hash::HashDataPage::initialize_snapshot_page(), foedus::storage::array::ArrayPage::initialize_volatile_page(), foedus::storage::hash::HashIntermediatePage::initialize_volatile_page(), foedus::storage::hash::HashDataPage::initialize_volatile_page(), foedus::thread::ThreadPimpl::install_a_volatile_page(), foedus::storage::prefetch_page_l2(), foedus::storage::sequential::SequentialCursor::SequentialCursor(), and foedus::storage::to_page().