libfoedus-core
FOEDUS Core Library
foedus::snapshot Namespace Reference

Snapshot Manager, which manages snapshot files of the database. More...

Detailed Description

Snapshot Manager, which manages snapshot files of the database.

This package contains classes to handle snapshot files.

Snapshot

One snapshot consists of a set of snapshot files and a snapshot metadata file (snapshot.xml). Every snapshot is tagged with base epoch and valid-until epoch. The snapshot logically contains all information of the database upto valid-until epoch, meaning the previous snapshot (whose valid-until should be equal to base epoch of the snapshot) as well as all log entries in the transactional log from a base epoch until valid-until epoch.

Snapshot Files

However, the snapshot does not necessarily physically contains all information from the previous snapshot because it makes each snapshot-ing too expensive. One common approach is LSM-Tree (Log Structured Merge Tree), but it is not a good fit for serializable transactional processing. Even a trivial primary key constraint would be too expensive on top of LSM-Tree.

Instead, snapshot files in FOEDUS are overlays of the database image. Each snapshot file contains new version of data pages that overwrite a required portion of storages. They are not incremental new-tuples/tombstones data as in LSM. They are a complete representation of the storages, but it might contain pointers to old snapshot files if pages under it had no change.

In the worst case, transactions in one epoch updates just one tuple in every page, resulting in a snapshot that physically contains all data. However, it is rare and such a workload is fundamentally expensive if data size does not fit DRAM (if it does, this approach is also fine).

Making a new Snapshot

Snapshot Manager creates a new set of snapshot files as well as its metadata file occasionally. The frequency is a tuning knob. The mechanism to create snapshot files is called Log-Gleaner (foedus::snapshot::LogGleaner). See its documentation below.

Log Gleaner Overview

LogGleaner is the main class that manages most mechanisms to construct a new set of snapshot files. Snapshot procedure constructs and calls this object during snapshot. It receives partitioning policy (which snapshot partitions to send ranges of keys) per storage and beginning/ending epoch of logs to glean while log-gleaning.

Log-gleaning consists of two components; mapper (foedus::snapshot::LogMapper) and reducer (foedus::snapshot::LogReducer), obviously named after the well-known map-reduce concepts.

Mapper

LogGleaner launches a set of mapper threads (foedus::snapshot::LogMapper) to read log files. Each LogMapper corresponds to foedus::log::Logger, the NUMA-local log writer which simply writes out log entries produced by local worker threads. Thus, the log files contain log entries that might be sent to any partitions. LogMapper maps each log entry to some partition and send it to a reducer corresponding to the partition. For more details, see foedus::snapshot::LogMapper.

Reducer

LogGleaner also launches a set of reducer threads (foedus::snapshot::LogReducer), one for each NUMA node. LogReducer sorts log entries sent from LogMapper. The log entries are sorted by key and ordinal (*), then processed just like usual APPLY at the end of transaction, but on top of snapshot files.

(*) otherwise correct result is not guaranteed. For example, imagine the following case:

  • UPDATE rec-1 to A. Log-ordinal 1.
  • UPDATE rec-1 to B. Log-ordinal 2.

Ordinal-1 must be processed before ordinal 2.

Synchronization

LogGleaner coordinates the synchronization between mappers and reducers during snapshotting. At the beginning of snapshotting, gleaner wakes up reducers and mappers. Mappers go in to sleep when they process all logs. When all mappers went to sleep, reducers start to also go into sleep when they process all logs they receive. When all of them are done, gleaner initiates the last wrap-up phase. Additionally, LogGleaner is in charge of receiving termination request from the engine if the user invokes Engine::uninitialize() and requesting reducers/mappers to stop.

Reducers/mappers occasionally check if they are requested to stop when they get idle or complete all work. They do the check at least once for a while so that the latency to stop can not be catastrophic.

Constructing Root Pages

After all mappers and reducers complete, the last phase of log gleaning is to construct root pages for the storages modified in this snapshotting. Gleaner collects root-page-info from each reducer and combines them to create the root page(s). When all set, gleaner produces maps from storage ID to a new root page ID. This will be written out in a snapshot metadata file by snapshot manager.

Note
This is a private implementation-details of Snapshot Manager, thus file name ends with _impl. Do not include this header from a client program. There is no case client program needs to access this internal class.

Classes

struct  BlockHeaderBase
 All log blocks in mapper/reducers start with this header. More...
 
class  DumpFileSortedBuffer
 Implementation of SortedBuffer that is backed by a dumped file. More...
 
struct  FillerBlockHeader
 A header for a dummy storage block that fills the gap between the end of previous storage block and the beginning of next storage block. More...
 
struct  FullBlockHeader
 All blocks that have content start with this header. More...
 
class  InMemorySortedBuffer
 Implementation of SortedBuffer that is backed by fully in-memory buffer. More...
 
struct  LogBuffer
 Packages handling of 4-bytes representation of position in log buffers. More...
 
class  LogGleaner
 A log-gleaner, which constructs a new set of snapshot files during snapshotting. More...
 
struct  LogGleanerControlBlock
 Shared data for LogGleaner. More...
 
class  LogGleanerRef
 A remote view of LogGleaner from all engines. More...
 
struct  LogGleanerResource
 Local resource for the log gleaner, which runs only in the master node. More...
 
class  LogMapper
 A log mapper, which reads log files from one logger and sends them to corresponding log reducers. More...
 
class  LogReducer
 A log reducer, which receives log entries sent from mappers and applies them to construct new snapshot files. More...
 
struct  LogReducerControlBlock
 Shared data for LogReducer. More...
 
class  LogReducerRef
 A remote view of LogReducer from all engines. More...
 
class  MapReduceBase
 Base class for LogMapper and LogReducer to share common code. More...
 
class  MergeSort
 Receives an arbitrary number of sorted buffers and emits one fully sorted stream of logs. More...
 
union  ReducerBufferStatus
 Compactly represents important status informations of a reducer buffer. More...
 
struct  Snapshot
 Represents one snapshot that converts all logs from base epoch to valid_until epoch into snapshot file(s). More...
 
class  SnapshotManager
 Snapshot manager that atomically and durably writes out a snapshot file. More...
 
struct  SnapshotManagerControlBlock
 Shared data in SnapshotManagerPimpl. More...
 
class  SnapshotManagerPimpl
 Pimpl object of SnapshotManager. More...
 
struct  SnapshotMetadata
 Represents the data in one snapshot metadata file. More...
 
struct  SnapshotOptions
 Set of options for snapshot manager. More...
 
class  SnapshotWriter
 Writes out one snapshot file for all data pages in one reducer. More...
 
class  SortedBuffer
 Represents one input stream of sorted log entries. More...
 

Typedefs

typedef uint16_t SnapshotId
 Unique ID of Snapshot. More...
 
typedef uint32_t BufferPosition
 Represents a position in some buffer. More...
 

Enumerations

enum  ReducerConstants { kFlagNoMoreWriters = 0x0001 }
 

Functions

bool is_array_log_type (uint16_t log_type)
 
bool is_hash_log_type (uint16_t log_type)
 
bool is_masstree_log_type (uint16_t log_type)
 
SnapshotId increment (SnapshotId id)
 Increment SnapshotId. More...
 
BufferPosition to_buffer_position (uint64_t byte_position)
 
uint64_t from_buffer_position (BufferPosition buffer_position)
 
std::ostream & operator<< (std::ostream &o, const SortedBuffer &v)
 
std::ostream & operator<< (std::ostream &o, const LogGleaner &v)
 
uint16_t calculate_logger_id (Engine *engine, uint16_t local_ordinal)
 Unique ID of this log mapper. More...
 
uint64_t align_io_floor (uint64_t offset)
 
uint64_t align_io_ceil (uint64_t offset)
 
void update_key_lengthes (const log::LogHeader *header, storage::StorageType storage_type, uint32_t *shortest_key_length, uint32_t *longest_key_length)
 
std::ostream & operator<< (std::ostream &o, const LogMapper &v)
 
std::ostream & operator<< (std::ostream &o, const LogReducer &v)
 
std::ostream & operator<< (std::ostream &o, const BlockHeaderBase &v)
 
uint16_t extract_shortest_key_length (SortedBuffer *const *inputs, uint16_t inputs_count)
 
uint16_t extract_longest_key_length (SortedBuffer *const *inputs, uint16_t inputs_count)
 
template<typename T >
int compare_logs_as (const log::RecordLogType *lhs, const log::RecordLogType *rhs)
 
std::ostream & operator<< (std::ostream &o, const Snapshot &v)
 
std::ostream & operator<< (std::ostream &o, const SnapshotWriter &v)
 

Variables

const SnapshotId kNullSnapshotId = 0
 
const uint64_t kIoAlignment = 0x1000
 
const MergeSort::InputIndex kInvalidInput = static_cast<MergeSort::InputIndex>(-1U)
 Represents null. More...
 
const float kWindowMoveThreshold = 0.95
 Also, when the input consumed more than this fraction of current window, we move the window. More...
 
const char * kStoragesTagName = "storages"
 

Enumeration Type Documentation

Enumerator
kFlagNoMoreWriters 

A bit-wise flag in ReducerBufferStatus's flags_.

If this bit is on, no more mappers can enter the buffer as a new writer.

Definition at line 55 of file log_reducer_impl.hpp.

55  {
60  kFlagNoMoreWriters = 0x0001,
61 };
A bit-wise flag in ReducerBufferStatus's flags_.

Function Documentation

uint64_t foedus::snapshot::align_io_ceil ( uint64_t  offset)

Definition at line 138 of file log_mapper_impl.cpp.

References align_io_floor().

Referenced by foedus::snapshot::LogMapper::handle_process().

138 { return align_io_floor(offset + kIoAlignment - 1U); }
const uint64_t kIoAlignment
uint64_t align_io_floor(uint64_t offset)

Here is the call graph for this function:

Here is the caller graph for this function:

uint64_t foedus::snapshot::align_io_floor ( uint64_t  offset)

Definition at line 137 of file log_mapper_impl.cpp.

References kIoAlignment.

Referenced by align_io_ceil(), and foedus::snapshot::LogMapper::handle_process().

137 { return (offset / kIoAlignment) * kIoAlignment; }
const uint64_t kIoAlignment

Here is the caller graph for this function:

uint16_t foedus::snapshot::calculate_logger_id ( Engine engine,
uint16_t  local_ordinal 
)

Unique ID of this log mapper.

One log mapper corresponds to one logger, so this ID is also the corresponding logger's ID (log::LoggerId).

Definition at line 54 of file log_mapper_impl.cpp.

References foedus::Engine::get_options(), foedus::Engine::get_soc_id(), foedus::EngineOptions::log_, and foedus::log::LogOptions::loggers_per_node_.

54  {
55  return engine->get_options().log_.loggers_per_node_ * engine->get_soc_id() + local_ordinal;
56 }

Here is the call graph for this function:

template<typename T >
int foedus::snapshot::compare_logs_as ( const log::RecordLogType lhs,
const log::RecordLogType rhs 
)

Definition at line 548 of file merge_sort.cpp.

548  {
549  const T* lhs_log = reinterpret_cast<const T*>(lhs);
550  const T* rhs_log = reinterpret_cast<const T*>(rhs);
551  return T::compare_logs(lhs_log, rhs_log);
552 }
uint16_t foedus::snapshot::extract_longest_key_length ( SortedBuffer *const *  inputs,
uint16_t  inputs_count 
)

Definition at line 55 of file merge_sort.cpp.

References foedus::snapshot::SortedBuffer::get_cur_block_longest_key_length().

55  {
56  uint16_t ret = inputs[0]->get_cur_block_longest_key_length();
57  for (uint16_t i = 1; i < inputs_count; ++i) {
58  ret = std::max<uint16_t>(ret, inputs[i]->get_cur_block_longest_key_length());
59  }
60  return ret;
61 }

Here is the call graph for this function:

uint16_t foedus::snapshot::extract_shortest_key_length ( SortedBuffer *const *  inputs,
uint16_t  inputs_count 
)

Definition at line 48 of file merge_sort.cpp.

References foedus::snapshot::SortedBuffer::get_cur_block_shortest_key_length().

48  {
49  uint16_t ret = inputs[0]->get_cur_block_shortest_key_length();
50  for (uint16_t i = 1; i < inputs_count; ++i) {
51  ret = std::min<uint16_t>(ret, inputs[i]->get_cur_block_shortest_key_length());
52  }
53  return ret;
54 }

Here is the call graph for this function:

uint64_t foedus::snapshot::from_buffer_position ( BufferPosition  buffer_position)
inline

Definition at line 78 of file snapshot_id.hpp.

Referenced by foedus::snapshot::LogReducerRef::append_log_chunk(), foedus::snapshot::ReducerBufferStatus::get_tail_bytes(), and foedus::snapshot::LogBuffer::resolve().

78  {
79  return static_cast<uint64_t>(buffer_position) << 3;
80 }

Here is the caller graph for this function:

bool foedus::snapshot::is_array_log_type ( uint16_t  log_type)
inline

Definition at line 516 of file merge_sort.hpp.

References foedus::log::kLogCodeArrayIncrement, and foedus::log::kLogCodeArrayOverwrite.

516  {
517  return log_type == log::kLogCodeArrayOverwrite || log_type == log::kLogCodeArrayIncrement;
518 }
0x0023 : foedus::storage::array::ArrayIncrementLogType .
Definition: log_type.hpp:116
0x0022 : foedus::storage::array::ArrayOverwriteLogType .
Definition: log_type.hpp:115
bool foedus::snapshot::is_hash_log_type ( uint16_t  log_type)
inline

Definition at line 519 of file merge_sort.hpp.

References foedus::log::kLogCodeHashDelete, foedus::log::kLogCodeHashInsert, foedus::log::kLogCodeHashOverwrite, and foedus::log::kLogCodeHashUpdate.

519  {
520  return
521  log_type == log::kLogCodeHashOverwrite
522  || log_type == log::kLogCodeHashInsert
523  || log_type == log::kLogCodeHashDelete
524  || log_type == log::kLogCodeHashUpdate;
525 }
0x002A : foedus::storage::hash::HashDeleteLogType .
Definition: log_type.hpp:123
0x0029 : foedus::storage::hash::HashInsertLogType .
Definition: log_type.hpp:122
0x0028 : foedus::storage::hash::HashOverwriteLogType .
Definition: log_type.hpp:121
0x002B : foedus::storage::hash::HashUpdateLogType .
Definition: log_type.hpp:124
bool foedus::snapshot::is_masstree_log_type ( uint16_t  log_type)
inline

Definition at line 526 of file merge_sort.hpp.

References foedus::log::kLogCodeMasstreeDelete, foedus::log::kLogCodeMasstreeInsert, foedus::log::kLogCodeMasstreeOverwrite, and foedus::log::kLogCodeMasstreeUpdate.

526  {
527  return
528  log_type == log::kLogCodeMasstreeInsert
529  || log_type == log::kLogCodeMasstreeDelete
530  || log_type == log::kLogCodeMasstreeUpdate
531  || log_type == log::kLogCodeMasstreeOverwrite;
532 }
0x0033 : foedus::storage::masstree::MasstreeInsertLogType .
Definition: log_type.hpp:127
0x0032 : foedus::storage::masstree::MasstreeOverwriteLogType .
Definition: log_type.hpp:126
0x0034 : foedus::storage::masstree::MasstreeDeleteLogType .
Definition: log_type.hpp:128
0x0035 : foedus::storage::masstree::MasstreeUpdateLogType .
Definition: log_type.hpp:129
std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const Snapshot v 
)

Definition at line 24 of file snapshot.cpp.

References foedus::snapshot::Snapshot::base_epoch_, foedus::snapshot::Snapshot::id_, foedus::snapshot::Snapshot::max_storage_id_, and foedus::snapshot::Snapshot::valid_until_epoch_.

24  {
25  o << "<Snapshot>"
26  << "<id_>" << v.id_ << "</id_>"
27  << "<base_epoch_>" << v.base_epoch_ << "</base_epoch_>"
28  << "<valid_until_epoch_>" << v.valid_until_epoch_ << "</valid_until_epoch_>"
29  << "<max_storage_id_>" << v.max_storage_id_ << "</max_storage_id_>"
30  << "</Snapshot>";
31  return o;
32 }
std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const SortedBuffer v 
)

Definition at line 32 of file log_buffer.cpp.

References foedus::snapshot::SortedBuffer::describe().

32  {
33  v.describe(&o);
34  return o;
35 }

Here is the call graph for this function:

std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const SnapshotWriter v 
)

Definition at line 174 of file snapshot_writer_impl.cpp.

174  {
175  o << "<SnapshotWriter>"
176  << "<numa_node_>" << v.numa_node_ << "</numa_node_>"
177  << "<append_>" << v.append_ << "</append_>"
178  << "<snapshot_id_>" << v.snapshot_id_ << "</snapshot_id_>"
179  << "<pool_memory_>" << *v.pool_memory_ << "</pool_memory_>"
180  << "<intermediate_memory_>" << *v.intermediate_memory_ << "</intermediate_memory_>"
181  << "<next_page_id_>" << v.next_page_id_ << "</next_page_id_>"
182  << "</SnapshotWriter>";
183  return o;
184 }
std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const LogGleaner v 
)

Definition at line 312 of file log_gleaner_impl.cpp.

References foedus::snapshot::LogGleanerControlBlock::completed_count_, foedus::snapshot::LogGleanerControlBlock::completed_mapper_count_, foedus::Attachable< CONTROL_BLOCK >::control_block_, foedus::snapshot::LogGleanerControlBlock::error_count_, and foedus::snapshot::LogGleanerControlBlock::exit_count_.

312  {
313  o << "<LogGleaner>"
314  << v.new_snapshot_
315  << "<completed_count_>" << v.control_block_->completed_count_ << "</completed_count_>"
316  << "<completed_mapper_count_>"
317  << v.control_block_->completed_mapper_count_ << "</completed_mapper_count_>"
318  << "<error_count_>" << v.control_block_->error_count_ << "</error_count_>"
319  << "<exit_count_>" << v.control_block_->exit_count_ << "</exit_count_>";
320  o << "</LogGleaner>";
321  return o;
322 }
std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const LogMapper v 
)

Definition at line 700 of file log_mapper_impl.cpp.

References foedus::snapshot::MapReduceBase::id_, and foedus::snapshot::MapReduceBase::numa_node_.

700  {
701  o << "<LogMapper>"
702  << "<id_>" << v.id_ << "</id_>"
703  << "<numa_node_>" << static_cast<int>(v.numa_node_) << "</numa_node_>"
704  << "<buckets_allocated_count_>" << v.buckets_allocated_count_ << "</buckets_allocated_count_>"
705  << "<hashlist_allocated_count>" << v.hashlist_allocated_count_ << "</hashlist_allocated_count>"
706  << "<processed_log_count_>" << v.processed_log_count_ << "</processed_log_count_>"
707  << "</LogMapper>";
708  return o;
709 }
std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const LogReducer v 
)

Definition at line 912 of file log_reducer_impl.cpp.

References foedus::snapshot::LogReducerControlBlock::current_buffer_, foedus::snapshot::MapReduceBase::get_id(), foedus::snapshot::MapReduceBase::get_numa_node(), and foedus::snapshot::LogReducerControlBlock::total_storage_count_.

912  {
913  o << "<LogReducer>"
914  << "<id_>" << v.get_id() << "</id_>"
915  << "<numa_node>" << v.get_numa_node() << "</numa_node>"
916  << "<total_storage_count>" << v.control_block_->total_storage_count_ << "</total_storage_count>"
917  << "<sort_buffer_>" << v.sort_buffer_ << "</sort_buffer_>"
918  << "<positions_buffers_>" << v.positions_buffers_ << "</positions_buffers_>"
919  << "<current_buffer_>" << v.control_block_->current_buffer_ << "</current_buffer_>"
920  << "<sorted_runs_>" << v.sorted_runs_ << "</sorted_runs_>"
921  << "</LogReducer>";
922  return o;
923 }

Here is the call graph for this function:

std::ostream& foedus::snapshot::operator<< ( std::ostream &  o,
const BlockHeaderBase v 
)

Definition at line 925 of file log_reducer_impl.cpp.

References foedus::snapshot::BlockHeaderBase::block_length_, and foedus::snapshot::BlockHeaderBase::magic_word_.

925  {
926  o << "<BlockHeader>"
927  << "<magic_word_>" << assorted::Hex(v.magic_word_) << "</magic_word_>"
928  << "<block_length_>" << v.block_length_ << "</block_length_>"
929  << "</BlockHeader>";
930  return o;
931 }
BufferPosition foedus::snapshot::to_buffer_position ( uint64_t  byte_position)
inline

Definition at line 74 of file snapshot_id.hpp.

References ASSERT_ND.

Referenced by foedus::snapshot::LogReducerRef::append_log_chunk(), and foedus::snapshot::LogBuffer::compact().

74  {
75  ASSERT_ND(byte_position % 8 == 0);
76  return byte_position >> 3;
77 }
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Here is the caller graph for this function:

void foedus::snapshot::update_key_lengthes ( const log::LogHeader header,
storage::StorageType  storage_type,
uint32_t *  shortest_key_length,
uint32_t *  longest_key_length 
)
inline

Definition at line 540 of file log_mapper_impl.cpp.

References ASSERT_ND, foedus::storage::kArrayStorage, foedus::storage::hash::HashCommonLogType::key_length_, foedus::storage::masstree::MasstreeCommonLogType::key_length_, foedus::storage::kHashStorage, foedus::storage::kMasstreeStorage, and foedus::storage::kSequentialStorage.

544  {
545  if (storage_type == storage::kArrayStorage) {
546  *shortest_key_length = sizeof(storage::array::ArrayOffset);
547  *longest_key_length = sizeof(storage::array::ArrayOffset);
548  } else if (storage_type == storage::kMasstreeStorage) {
549  const storage::masstree::MasstreeCommonLogType* the_log =
550  reinterpret_cast<const storage::masstree::MasstreeCommonLogType*>(header);
551  uint16_t key_length = the_log->key_length_;
552  ASSERT_ND(key_length > 0);
553  *shortest_key_length = std::min<uint32_t>(*shortest_key_length, key_length);
554  *longest_key_length = std::max<uint32_t>(*longest_key_length, key_length);
555  } else if (storage_type == storage::kHashStorage) {
556  const storage::hash::HashCommonLogType* the_log =
557  reinterpret_cast<const storage::hash::HashCommonLogType*>(header);
558  uint16_t key_length = the_log->key_length_;
559  ASSERT_ND(key_length > 0);
560  *shortest_key_length = std::min<uint32_t>(*shortest_key_length, key_length);
561  *longest_key_length = std::max<uint32_t>(*longest_key_length, key_length);
562  } else if (storage_type == storage::kSequentialStorage) {
563  // this has no meaning for sequential storage. just put some number.
564  *shortest_key_length = 8U;
565  *longest_key_length = 8U;
566  }
567 }
uint64_t ArrayOffset
The only key type in array storage.
Definition: array_id.hpp:48
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Variable Documentation

const MergeSort::InputIndex foedus::snapshot::kInvalidInput = static_cast<MergeSort::InputIndex>(-1U)

Represents null.

Definition at line 40 of file merge_sort.cpp.

const uint64_t foedus::snapshot::kIoAlignment = 0x1000

Definition at line 136 of file log_mapper_impl.cpp.

Referenced by align_io_floor().

const char* foedus::snapshot::kStoragesTagName = "storages"

Definition at line 32 of file snapshot_metadata.cpp.

const float foedus::snapshot::kWindowMoveThreshold = 0.95

Also, when the input consumed more than this fraction of current window, we move the window.

This means we have to memmove 5% everytime, but instead we can avoid many-small batches.

Definition at line 45 of file merge_sort.cpp.