libfoedus-core
FOEDUS Core Library
foedus::storage::masstree::MasstreeComposeContext Class Reference

MasstreeComposer's compose() implementation separated from the class itself. More...

Detailed Description

MasstreeComposer's compose() implementation separated from the class itself.

It's a complicated method, so worth being its own class. This defines all the variables maintained during one compose() call.

Algorithm Overview
As described above, the biggest advantage to separate composer from transaction execution is that it simplifies the logic to construct a page. The sorted log stream provides log entries in key order. This class follows the same order when it constructs a pages. For every key, we do:
  • Make sure this object maintains pages in the path from root to the page that contains the key.
  • Each maintained page is initialized by previous snapshot page, empty if not exists.
  • The way we initialize with snapshot page is a bit special. We keep the image of the snapshot page as original page and append them to the maintained page up to the current key. The original page is stored in work memory (because we will not write them out), and PathLevel points to it. More details below (Original page pointer).
  • Assuming the original page trick, each maintained page always appends a new record to the end or update the record in the end, thus it's trivial to achieve 100% fill factor without any record movement.
  • Each maintained page might have a next-page pointer in foster_twin[1], which is filled when the page becomes full (note: because keys are ordered, we never have to do the split with records. it just puts a next-page pointer).
Page path and open/close
We maintain one path of pages (route) from the root to a border page that contains the current key. Each level consists of exactly one page with optional next page chain and original page. As far as we keep appending a key in the page of last level, we keep the same path forever upto the end of logs. The following cases trigger a change in the path.
  • Creation of next layer. We just deepen the path for one level. This is the easiest change.
  • Nexy key is beyond high-fence of the page of last level. We then close the last level, insert to the parent, repeat until the ancestor that logically contains the key (potentially root of the first layer), then open the path for next key.
  • Full page buffer. When we can't add any more page in writer's buffer, we flush out the last level page chain upto the tail page (the tail page is kept because it's currently modified). This keeps the current path except that last level's head/low-fence are moved forward.
Original page pointer
The active page in each level has a non-null pointer with kOriginalPage flag in foster_twin[0] so that we can append remaining records/links when next key is same or above the keys. The original page image is created when the page in the path is opened, and kept until it's closed. Thus, there are at most #-of-levels original pages, which should be pretty small.

Definition at line 161 of file masstree_composer_impl.hpp.

#include <masstree_composer_impl.hpp>

Classes

struct  FenceAndPointer
 
struct  PageBoundaryInfo
 Represents a minimal information to install a new snapshot page pointer. More...
 
struct  PageBoundarySort
 Points to PageBoundaryInfo with a sorting information. More...
 
struct  PathLevel
 One level in the path. More...
 

Public Types

enum  Constants { kMaxLevels = 32, kMaxLayers = 16, kMaxLogGroupSize = 1 << 14, kTmpBoundaryArraySize = kMaxLogGroupSize }
 

Public Member Functions

 MasstreeComposeContext (Engine *engine, snapshot::MergeSort *merge_sort, const Composer::ComposeArguments &args)
 MasstreeComposeContext methods. More...
 
ErrorStack execute ()
 MasstreeComposeContext::execute() and subroutines for log groups. More...
 

Member Enumeration Documentation

Enumerator
kMaxLevels 

We assume the path wouldn't be this deep.

kMaxLayers 

We assume B-trie depth is at most this (prefix is at most 8*this value)

kMaxLogGroupSize 

Maximum number of logs execute_xxx_group methods process in one shot.

kTmpBoundaryArraySize 

Size of the tmp_boundary_array_.

Most likely we don't need this much, but this memory consumtion is negligible anyways.

Definition at line 163 of file masstree_composer_impl.hpp.

163  {
165  kMaxLevels = 32,
167  kMaxLayers = 16,
169  kMaxLogGroupSize = 1 << 14,
175  };
Maximum number of logs execute_xxx_group methods process in one shot.
We assume B-trie depth is at most this (prefix is at most 8*this value)

Constructor & Destructor Documentation

foedus::storage::masstree::MasstreeComposeContext::MasstreeComposeContext ( Engine engine,
snapshot::MergeSort merge_sort,
const Composer::ComposeArguments args 
)

MasstreeComposeContext methods.

Definition at line 291 of file masstree_composer_impl.cpp.

References foedus::snapshot::SnapshotWriter::get_next_page_id().

295  : engine_(engine),
296  merge_sort_(merge_sort),
297  id_(merge_sort->get_storage_id()),
298  storage_(engine, id_),
299  args_(args),
300  snapshot_id_(args.snapshot_writer_->get_snapshot_id()),
301  numa_node_(get_writer()->get_numa_node()),
302  max_pages_(get_writer()->get_page_size()),
303  root_(reinterpret_cast<MasstreeIntermediatePage*>(args.root_info_page_)),
304  page_base_(reinterpret_cast<Page*>(get_writer()->get_page_base())),
305  original_base_(merge_sort->get_original_pages()) {
306  page_id_base_ = get_writer()->get_next_page_id();
307  refresh_page_boundary_info_variables();
308  root_index_ = 0;
309  root_index_mini_ = 0;
310  allocated_pages_ = 0;
311  cur_path_levels_ = 0;
312  page_boundary_info_cur_pos_ = 0;
313  page_boundary_elements_ = 0;
314  std::memset(cur_path_, 0, sizeof(cur_path_));
315  std::memset(cur_prefix_be_, 0, sizeof(cur_prefix_be_));
316  std::memset(cur_prefix_slices_, 0, sizeof(cur_prefix_slices_));
317  std::memset(tmp_boundary_array_, 0, sizeof(tmp_boundary_array_));
318 }
memory::PagePoolOffset get_page_size() const __attribute__((always_inline))
storage::SnapshotPagePointer get_next_page_id() const

Here is the call graph for this function:

Member Function Documentation

ErrorStack foedus::storage::masstree::MasstreeComposeContext::execute ( )

MasstreeComposeContext::execute() and subroutines for log groups.

Definition at line 371 of file masstree_composer_impl.cpp.

References ASSERT_ND, CHECK_ERROR, foedus::snapshot::MergeSort::GroupifyResult::count_, foedus::snapshot::MergeSort::get_current_count(), foedus::storage::masstree::MasstreeCommonLogType::get_key(), foedus::snapshot::MergeSort::GroupifyResult::get_log_type(), foedus::snapshot::MergeSort::groupify(), foedus::snapshot::MergeSort::GroupifyResult::has_common_key_, foedus::snapshot::MergeSort::GroupifyResult::has_common_log_code_, foedus::snapshot::MergeSort::is_ended_all(), foedus::storage::masstree::MasstreeCommonLogType::key_length_, foedus::log::kLogCodeMasstreeDelete, foedus::log::kLogCodeMasstreeInsert, foedus::log::kLogCodeMasstreeOverwrite, foedus::log::kLogCodeMasstreeUpdate, kMaxLogGroupSize, foedus::kRetOk, LIKELY, foedus::snapshot::MergeSort::next_batch(), and foedus::snapshot::MergeSort::resolve_sort_position().

371  {
372  write_dummy_page_zero();
373  CHECK_ERROR(init_root());
374 
375 #ifndef NDEBUG
376  char debug_prev[8192];
377  uint16_t debug_prev_length = 0;
378 #endif // NDEBUG
379 
380  uint64_t processed = 0;
381  while (true) {
382  CHECK_ERROR(merge_sort_->next_batch());
383  uint64_t count = merge_sort_->get_current_count();
384  if (count == 0 && merge_sort_->is_ended_all()) {
385  break;
386  }
387 
388 #ifndef NDEBUG
389  // confirm that it's fully sorted
390  for (uint64_t i = 0; i < count; ++i) {
391  const MasstreeCommonLogType* entry =
392  reinterpret_cast<const MasstreeCommonLogType*>(merge_sort_->resolve_sort_position(i));
393  const char* key = entry->get_key();
394  KeyLength key_length = entry->key_length_;
395  ASSERT_ND(key_length > 0);
396  if (key_length < debug_prev_length) {
397  ASSERT_ND(std::memcmp(debug_prev, key, key_length) <= 0);
398  } else {
399  ASSERT_ND(std::memcmp(debug_prev, key, debug_prev_length) <= 0);
400  }
401  std::memcpy(debug_prev, key, key_length);
402  debug_prev_length = key_length;
403  }
404 #endif // NDEBUG
405 
406  // within the batch, we further split them into "groups" that share the same key or log types
407  // so that we can process a number of logs in a tight loop.
408  uint64_t cur = 0;
409  uint32_t group_count = 0;
410  while (cur < count) {
411  snapshot::MergeSort::GroupifyResult group = merge_sort_->groupify(cur, kMaxLogGroupSize);
412  ASSERT_ND(group.count_ <= kMaxLogGroupSize);
413  if (LIKELY(group.count_ == 1U)) { // misprediction is amortized by group, so LIKELY is better
414  // no grouping. slowest case. has to be reasonably fast even in this case.
415  ASSERT_ND(!group.has_common_key_);
416  ASSERT_ND(!group.has_common_log_code_);
417  CHECK_ERROR(execute_a_log(cur));
418  } else if (group.has_common_key_) {
419  CHECK_ERROR(execute_same_key_group(cur, cur + group.count_));
420  } else {
421  ASSERT_ND(group.has_common_log_code_);
422  log::LogCode log_type = group.get_log_type();
423  if (log_type == log::kLogCodeMasstreeInsert) {
424  CHECK_ERROR(execute_insert_group(cur, cur + group.count_));
425  } else if (log_type == log::kLogCodeMasstreeDelete) {
426  CHECK_ERROR(execute_delete_group(cur, cur + group.count_));
427  } else if (log_type == log::kLogCodeMasstreeUpdate) {
428  CHECK_ERROR(execute_update_group(cur, cur + group.count_));
429  } else {
431  CHECK_ERROR(execute_overwrite_group(cur, cur + group.count_));
432  }
433  }
434  cur += group.count_;
435  processed += group.count_;
436  ++group_count;
437  }
438  ASSERT_ND(cur == count);
439  VLOG(1) << storage_ << " processed a batch of " << cur << " logs that has " << group_count
440  << " groups";
441  }
442 
443  VLOG(0) << storage_ << " processed " << processed << " logs. now pages=" << allocated_pages_;
444  CHECK_ERROR(flush_buffer());
445 
446  // at the end, install pointers to the created snapshot pages except root page.
447  uint64_t installed_count = 0;
448  CHECK_ERROR(install_snapshot_pointers(&installed_count));
449  return kRetOk;
450 }
LogCode
A unique identifier of all log types.
Definition: log_type.hpp:87
0x0033 : foedus::storage::masstree::MasstreeInsertLogType .
Definition: log_type.hpp:127
const log::RecordLogType * resolve_sort_position(uint32_t sort_pos) const __attribute__((always_inline))
Definition: merge_sort.hpp:384
ErrorStack next_batch()
Executes merge-sort on several thousands of logs and provides the result as a batch.
Definition: merge_sort.cpp:143
uint16_t KeyLength
Represents a byte-length of a key in this package.
Definition: masstree_id.hpp:69
Maximum number of logs execute_xxx_group methods process in one shot.
#define LIKELY(x)
Hints that x is highly likely true.
Definition: compiler.hpp:103
0x0032 : foedus::storage::masstree::MasstreeOverwriteLogType .
Definition: log_type.hpp:126
MergedPosition get_current_count() const __attribute__((always_inline))
Definition: merge_sort.hpp:369
0x0034 : foedus::storage::masstree::MasstreeDeleteLogType .
Definition: log_type.hpp:128
#define CHECK_ERROR(x)
This macro calls x and checks its returned value.
const ErrorStack kRetOk
Normal return value for no-error case.
0x0035 : foedus::storage::masstree::MasstreeUpdateLogType .
Definition: log_type.hpp:129
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72
GroupifyResult groupify(uint32_t begin, uint32_t limit=0) const __attribute__((always_inline))
Find a group of consecutive logs from the given position that have either a common log type or a comm...
Definition: merge_sort.hpp:534

Here is the call graph for this function:


The documentation for this class was generated from the following files: