libfoedus-core
FOEDUS Core Library

Array Storage, a dense and regular array. More...

Detailed Description

Array Storage, a dense and regular array.

Basic Structure

Our array-storage is an extremely simple data structure at the cost of limited applicability (fix-sized, regular, dense arrays). The page layout has little per-page information that is dynamic, meaning that most data never changes after the array storage is created. Further, as everything is pre-allocated, no phantoms, no insertion/splits, nor anything complex. Thus, little synchronization hassles.

Supported Operations

Array storage allows very few data operations.

In other words, the following operations are NOT supported.

If these limitations do not cause an issue, array storage is a best choice as it's extremely efficient and scalable thanks to the simplicity.

Page Hierarchy

Despite the name of this storage type, we do have a page hierarchy, which is required to handle switches between volatile/snapshot pages. Hence, a more precise description of this storage type is a fix-sized pre-allocated tree that has only integer-keys from 0 to array_size-1.

All data (payload) are stored in leaf pages and interior pages contain only pointers to its children. There is only one root page per array, which may or may not be a leaf page. Just like B-trees in many sense.

Page Layout

Header and Data
Fix-Sized HEADER (kHeaderSize bytes)DATA
Interior Node Data
InteriorRecordInteriorRecord...
See foedus::storage::array::ArrayPage::InteriorRecord.
Leaf Node Data
RecordPaddingRecordPadding...
We simply puts foedus::storage::Record contiguously, but with a padding (0-7 bytes). to make sure Record are 8-byte aligned because we do atomic operations on Record's owner_id_.

Concurrency Control

We do a bit special concurrency control for this storage type. Because everything is pre-allocated and no split/physical-delete/merge whatsoever, we can do the ModCount concurrency control per Record, not per page. We store ModCount for each record, instead stores no ModCount in page header. Thus, individual transactions that update records in same page have no contention except they update the exact same record. We even don't track node-set; only record-set is checked at commit.

This simplifies and increases concurrency for most data accesses, but we have to be careful when the page eviction thread drops the volatile page. We need to know the largest Epoch of records in the page at the point, but concurrent transactions might be modifying records at the same time. Thus, we do two-path optimistic concurrency control similar to our commit protocol for Masstree Storage.

References

Collaboration diagram for Array Storage:

Files

file  array_id.hpp
 Definitions of IDs in this package and a few related constant values.
 
file  array_log_types.hpp
 Declares all log types used in this storage type.
 
file  fwd.hpp
 Forward declarations of classes in array storage package.
 

Classes

class  foedus::storage::array::ArrayComposer
 Composer for an array storage. More...
 
struct  foedus::storage::array::ArrayRange
 Represents an offset range in an array storage. More...
 
struct  foedus::storage::array::ArrayCreateLogType
 Log type of CREATE ARRAY STORAGE operation. More...
 
struct  foedus::storage::array::ArrayCommonUpdateLogType
 A base class for ArrayOverwriteLogType/ArrayIncrementLogType. More...
 
struct  foedus::storage::array::ArrayOverwriteLogType
 Log type of array-storage's overwrite operation. More...
 
struct  foedus::storage::array::ArrayIncrementLogType
 Log type of array-storage's increment operation. More...
 
struct  foedus::storage::array::ArrayMetadata
 Metadata of an array storage. More...
 
union  foedus::storage::array::ArrayPage::Data
 Data union for either leaf (dynamic size) or interior (fixed size). More...
 
class  foedus::storage::array::ArrayPage
 Represents one data page in Array Storage. More...
 
class  foedus::storage::array::ArrayPartitioner
 Partitioner for an array storage. More...
 
union  foedus::storage::array::LookupRoute
 Compactly represents the route to reach the given offset. More...
 
class  foedus::storage::array::LookupRouteFinder
 Packages logic and required properties to calculate LookupRoute in array storage from offset. More...
 
class  foedus::storage::array::ArrayStorage
 Represents a key-value store based on a dense and regular array. More...
 
class  foedus::storage::array::ArrayStoragePimpl
 Pimpl object of ArrayStorage. More...
 
struct  foedus::storage::sequential::SequentialAppendLogType
 Log type of sequential-storage's append operation. More...
 

Typedefs

typedef uint64_t foedus::storage::array::ArrayOffset
 The only key type in array storage. More...
 

Functions

void foedus::storage::array::array_volatile_page_init (const VolatilePageInitArguments &args)
 volatile page initialize callback for ArrayPage. More...
 
void foedus::storage::hash::hash_intermediate_volatile_page_init (const VolatilePageInitArguments &args)
 volatile page initialize callback for HashIntermediatePage. More...
 
void foedus::storage::hash::hash_data_volatile_page_init (const VolatilePageInitArguments &args)
 volatile page initialize callback for HashDataPage. More...
 

Variables

const ArrayOffset foedus::storage::array::kMaxArrayOffset = (1ULL << 48) - 1ULL
 The maximum value allowed for ArrayOffset. More...
 
const uint16_t foedus::storage::array::kHeaderSize = 64
 Byte size of header in each page of array storage. More...
 
const uint16_t foedus::storage::array::kDataSize = foedus::storage::kPageSize - kHeaderSize
 Byte size of data region in each page of array storage. More...
 
const uint16_t foedus::storage::array::kInteriorSize = 16
 Byte size of an entry in interior page of array storage. More...
 
const uint16_t foedus::storage::array::kInteriorFanout = (foedus::storage::kPageSize - kHeaderSize) / kInteriorSize
 Max number of entries in an interior page of array storage. More...
 
const uint8_t foedus::storage::array::kMaxLevels = 8
 Code in array storage assumes this number as the maximum number of levels. More...
 
const uint16_t foedus::storage::kRecordOverhead = sizeof(xct::RwLockableXctId)
 Byte size of system-managed region per each record. More...
 

Class Documentation

union foedus::storage::array::ArrayPage::Data

Data union for either leaf (dynamic size) or interior (fixed size).

Definition at line 51 of file array_page_impl.hpp.

Collaboration diagram for foedus::storage::array::ArrayPage::Data:
Class Members
DualPagePointer interior_data[kInteriorFanout]
char leaf_data[kInteriorFanout *sizeof(DualPagePointer)]

Typedef Documentation

The only key type in array storage.

The key in array storage is offset, or an integer starting from zero. This means we don't support multi-dimensional, dynamic, sparse, nor any other fancy arrays. However, those arrays can be provided by the relational layer based on this array storage. The offset-conversion is fairly straightforward.

Note
Although it is an 8-byte integer, The valid value range of ArrayOffset is 0 to 2^48 - 1. Creating an array of size 2^48 or more will fail. This won't cause any issue in reality yet allows the implementation to pack more information.
See also
kMaxArrayOffset

Definition at line 48 of file array_id.hpp.

Function Documentation

void foedus::storage::array::array_volatile_page_init ( const VolatilePageInitArguments args)

volatile page initialize callback for ArrayPage.

See also
foedus::storage::VolatilePageInit

Definition at line 78 of file array_page_impl.cpp.

References ASSERT_ND, foedus::storage::array::ArrayRange::begin_, foedus::storage::VolatilePageInitArguments::context_, foedus::storage::array::ArrayRange::end_, foedus::storage::Storage< CONTROL_BLOCK >::exists(), foedus::storage::array::ArrayPage::get_array_range(), foedus::storage::array::ArrayStorage::get_array_size(), foedus::Attachable< CONTROL_BLOCK >::get_control_block(), foedus::thread::Thread::get_engine(), foedus::savepoint::SavepointManager::get_initial_current_epoch(), foedus::storage::array::ArrayPage::get_level(), foedus::storage::array::ArrayStorage::get_payload_size(), foedus::Engine::get_savepoint_manager(), foedus::storage::array::ArrayPage::get_storage_id(), foedus::storage::VolatilePageInitArguments::index_in_parent_, foedus::storage::array::ArrayPage::initialize_volatile_page(), foedus::storage::array::kInteriorFanout, foedus::storage::VolatilePageInitArguments::page_, foedus::storage::VolatilePageInitArguments::page_id, and foedus::storage::VolatilePageInitArguments::parent_.

Referenced by foedus::storage::array::ArrayStoragePimpl::follow_pointer(), foedus::storage::array::ArrayStoragePimpl::follow_pointers_for_read_batch(), and foedus::storage::array::ArrayStoragePimpl::follow_pointers_for_write_batch().

78  {
79  ASSERT_ND(args.parent_);
80  ASSERT_ND(args.page_);
81  ASSERT_ND(args.index_in_parent_ < kInteriorFanout);
82  const ArrayPage* parent = reinterpret_cast<const ArrayPage*>(args.parent_);
83  ArrayPage* page = reinterpret_cast<ArrayPage*>(args.page_);
84 
85  Engine* engine = args.context_->get_engine();
86  StorageId storage_id = parent->get_storage_id();
87  ArrayStorage storage(engine, storage_id);
88  ASSERT_ND(storage.exists());
89  const ArrayStorageControlBlock* cb = storage.get_control_block();
90 
91  ArrayRange parent_range = parent->get_array_range();
92  uint8_t parent_level = parent->get_level();
93  ASSERT_ND(parent_level > 0);
94  ASSERT_ND(parent_range.end_ - parent_range.begin_ == cb->intervals_[parent_level]
95  || parent_range.end_ == storage.get_array_size());
96 
97  uint8_t child_level = parent_level - 1U;
98  uint64_t interval = cb->intervals_[child_level];
99  ASSERT_ND(interval > 0);
100  ArrayRange child_range;
101  child_range.begin_ = parent_range.begin_ + args.index_in_parent_ * interval;
102  child_range.end_ = child_range.begin_ + interval;
103  if (child_range.end_ > storage.get_array_size()) {
104  child_range.end_ = storage.get_array_size();
105  }
106  ASSERT_ND(child_range.end_ > 0);
107 
108  // Because this is a new page, records in it are in the earliest epoch of the system
109  Epoch initial_epoch = engine->get_savepoint_manager()->get_initial_current_epoch();
110  page->initialize_volatile_page(
111  initial_epoch,
112  storage_id,
113  args.page_id,
114  storage.get_payload_size(),
115  child_level,
116  child_range);
117 }
uint32_t StorageId
Unique ID for storage.
Definition: storage_id.hpp:55
const uint16_t kInteriorFanout
Max number of entries in an interior page of array storage.
Definition: array_id.hpp:110
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Here is the call graph for this function:

Here is the caller graph for this function:

void foedus::storage::hash::hash_data_volatile_page_init ( const VolatilePageInitArguments args)

volatile page initialize callback for HashDataPage.

See also
foedus::storage::VolatilePageInit

Definition at line 232 of file hash_page_impl.cpp.

References ASSERT_ND, foedus::storage::hash::HashBinRange::begin_, foedus::storage::VolatilePageInitArguments::context_, foedus::storage::hash::HashDataPage::get_bin(), foedus::storage::hash::HashIntermediatePage::get_bin_range(), foedus::storage::hash::HashStorage::get_bin_shifts(), foedus::thread::Thread::get_engine(), foedus::storage::Page::get_header(), foedus::storage::hash::HashIntermediatePage::get_level(), foedus::storage::PageHeader::get_page_type(), foedus::storage::VolatilePageInitArguments::index_in_parent_, foedus::storage::hash::HashDataPage::initialize_volatile_page(), foedus::storage::kHashDataPageType, foedus::storage::hash::kHashIntermediatePageFanout, foedus::storage::kHashIntermediatePageType, foedus::storage::hash::HashBinRange::length(), foedus::storage::VolatilePageInitArguments::page_, foedus::storage::VolatilePageInitArguments::page_id, foedus::storage::VolatilePageInitArguments::parent_, and foedus::storage::PageHeader::storage_id_.

Referenced by foedus::storage::hash::HashStoragePimpl::follow_page_bin_head().

232  {
233  ASSERT_ND(args.parent_);
234  ASSERT_ND(args.page_);
235  StorageId storage_id = args.parent_->get_header().storage_id_;
236  HashDataPage* page = reinterpret_cast<HashDataPage*>(args.page_);
237  PageType parent_type = args.parent_->get_header().get_page_type();
238  uint64_t bin;
239  if (parent_type == kHashIntermediatePageType) {
240  const HashIntermediatePage* parent
241  = reinterpret_cast<const HashIntermediatePage*>(args.parent_);
242 
243  ASSERT_ND(args.index_in_parent_ < kHashIntermediatePageFanout);
244  ASSERT_ND(parent->get_level() == 0);
245  ASSERT_ND(parent->get_bin_range().length() == kHashIntermediatePageFanout);
246  bin = parent->get_bin_range().begin_ + args.index_in_parent_;
247  } else {
248  ASSERT_ND(parent_type == kHashDataPageType);
249  ASSERT_ND(args.index_in_parent_ == 0);
250  const HashDataPage* parent = reinterpret_cast<const HashDataPage*>(args.parent_);
251  bin = parent->get_bin();
252  }
253  HashStorage storage(args.context_->get_engine(), storage_id);
254  uint8_t bin_shifts = storage.get_bin_shifts();
255  page->initialize_volatile_page(storage_id, args.page_id, args.parent_, bin, bin_shifts);
256 }
uint32_t StorageId
Unique ID for storage.
Definition: storage_id.hpp:55
PageType
The following 1-byte value is stored in the common page header.
Definition: page.hpp:46
const uint8_t kHashIntermediatePageFanout
Number of pointers in an intermediate page of hash storage.
Definition: hash_id.hpp:49
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Here is the call graph for this function:

Here is the caller graph for this function:

void foedus::storage::hash::hash_intermediate_volatile_page_init ( const VolatilePageInitArguments args)

volatile page initialize callback for HashIntermediatePage.

See also
foedus::storage::VolatilePageInit

Definition at line 213 of file hash_page_impl.cpp.

References ASSERT_ND, foedus::storage::Page::get_header(), foedus::storage::PageHeader::get_page_type(), foedus::storage::VolatilePageInitArguments::index_in_parent_, foedus::storage::hash::HashIntermediatePage::initialize_volatile_page(), foedus::storage::hash::kHashIntermediatePageFanout, foedus::storage::kHashIntermediatePageType, foedus::storage::hash::kHashMaxBins, foedus::storage::VolatilePageInitArguments::page_, foedus::storage::VolatilePageInitArguments::page_id, foedus::storage::VolatilePageInitArguments::parent_, and foedus::storage::PageHeader::storage_id_.

Referenced by foedus::storage::hash::HashStoragePimpl::follow_page().

213  {
214  ASSERT_ND(args.parent_); // because this is always called for non-root pages.
215  ASSERT_ND(args.page_);
216  ASSERT_ND(args.index_in_parent_ < kHashIntermediatePageFanout);
217  StorageId storage_id = args.parent_->get_header().storage_id_;
218  HashIntermediatePage* page = reinterpret_cast<HashIntermediatePage*>(args.page_);
219 
220  ASSERT_ND(args.parent_->get_header().get_page_type() == kHashIntermediatePageType);
221  const HashIntermediatePage* parent = reinterpret_cast<const HashIntermediatePage*>(args.parent_);
222 
223  ASSERT_ND(parent->get_level() > 0);
224  parent->assert_range();
225  HashBin interval = kHashMaxBins[parent->get_level()];
226  HashBin parent_begin = parent->get_bin_range().begin_;
227  HashBin begin = parent_begin + args.index_in_parent_ * interval;
228  uint8_t level = parent->get_level() - 1U;
229  page->initialize_volatile_page(storage_id, args.page_id, parent, level, begin);
230 }
uint32_t StorageId
Unique ID for storage.
Definition: storage_id.hpp:55
uint64_t HashBin
Represents a bin of a hash value.
Definition: hash_id.hpp:142
const uint8_t kHashIntermediatePageFanout
Number of pointers in an intermediate page of hash storage.
Definition: hash_id.hpp:49
const uint64_t kHashMaxBins[]
kHashTotalBins[n] gives the maximum number of hash bins n-level hash can hold.
Definition: hash_id.hpp:74
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Here is the call graph for this function:

Here is the caller graph for this function:

Variable Documentation

const uint16_t foedus::storage::array::kHeaderSize = 64

Byte size of header in each page of array storage.

Definition at line 95 of file array_id.hpp.

const uint16_t foedus::storage::array::kInteriorSize = 16

Byte size of an entry in interior page of array storage.

Definition at line 105 of file array_id.hpp.

const ArrayOffset foedus::storage::array::kMaxArrayOffset = (1ULL << 48) - 1ULL

The maximum value allowed for ArrayOffset.

Definition at line 54 of file array_id.hpp.

Referenced by foedus::storage::array::ArrayStoragePimpl::load_empty(), and foedus::storage::array::SortEntry::set().

const uint8_t foedus::storage::array::kMaxLevels = 8

Code in array storage assumes this number as the maximum number of levels.

Interior page always has a big fanout close to 256, so 8 levels are more than enough.

Definition at line 118 of file array_id.hpp.

Referenced by foedus::storage::array::ArrayComposer::compose().