libfoedus-core
FOEDUS Core Library
foedus::storage::masstree::SplitBorder Struct Referencefinal

A system transaction to split a border page in Master-Tree. More...

Detailed Description

A system transaction to split a border page in Master-Tree.

See also
Physical-only, short-living system transactions.

When a border page becomes full or close to full, we split the page into two border pages. The new pages are placed as tentative foster twins of the page.

This does nothing and returns kErrorCodeOk in the following cases:

  • The page turns out to be already split.

Locks taken in this sysxct (in order of taking):

  • Page-lock of the target page.
  • Record-lock of all records in the target page (in canonical order).

At least after releasing enclosing user transaction's locks, there is no chance of deadlocks or even any conditional locks. max_retries=2 should be enough in run_nested_sysxct().

Definition at line 51 of file masstree_split_impl.hpp.

#include <masstree_split_impl.hpp>

Inheritance diagram for foedus::storage::masstree::SplitBorder:
Collaboration diagram for foedus::storage::masstree::SplitBorder:

Classes

struct  SplitStrategy
 

Public Member Functions

 SplitBorder (thread::Thread *context, MasstreeBorderPage *target, KeySlice trigger, bool disable_no_record_split=false, bool piggyback_reserve=false, KeyLength piggyback_remainder_length=0, PayloadLength piggyback_payload_count=0, const void *piggyback_suffix=nullptr)
 
virtual ErrorCode run (xct::SysxctWorkspace *sysxct_workspace) override
 Border node's Split. More...
 
void decide_strategy (SplitStrategy *out) const
 Subroutine to decide how we will split this page. More...
 
ErrorCode lock_existing_records (xct::SysxctWorkspace *sysxct_workspace)
 Subroutine to lock existing records in target_. More...
 
void migrate_records (KeySlice inclusive_from, KeySlice inclusive_to, MasstreeBorderPage *dest) const
 Subroutine to construct a new page. More...
 

Public Attributes

thread::Thread *const context_
 Thread context. More...
 
MasstreeBorderPage *const target_
 The page to split. More...
 
const KeySlice trigger_
 The key that triggered this split. More...
 
const bool disable_no_record_split_
 If true, we never do no-record-split (NRS). More...
 
const bool piggyback_reserve_
 An optimization to also make room for a record. More...
 
const KeyLength piggyback_remainder_length_
 
const PayloadLength piggyback_payload_count_
 
const void * piggyback_suffix_
 

Constructor & Destructor Documentation

foedus::storage::masstree::SplitBorder::SplitBorder ( thread::Thread context,
MasstreeBorderPage target,
KeySlice  trigger,
bool  disable_no_record_split = false,
bool  piggyback_reserve = false,
KeyLength  piggyback_remainder_length = 0,
PayloadLength  piggyback_payload_count = 0,
const void *  piggyback_suffix = nullptr 
)
inline

Definition at line 82 of file masstree_split_impl.hpp.

91  : xct::SysxctFunctor(),
92  context_(context),
93  target_(target),
94  trigger_(trigger),
95  disable_no_record_split_(disable_no_record_split),
96  piggyback_reserve_(piggyback_reserve),
97  piggyback_remainder_length_(piggyback_remainder_length),
98  piggyback_payload_count_(piggyback_payload_count),
99  piggyback_suffix_(piggyback_suffix) {
100  }
const KeySlice trigger_
The key that triggered this split.
MasstreeBorderPage *const target_
The page to split.
const bool piggyback_reserve_
An optimization to also make room for a record.
const bool disable_no_record_split_
If true, we never do no-record-split (NRS).
thread::Thread *const context_
Thread context.

Member Function Documentation

void foedus::storage::masstree::SplitBorder::decide_strategy ( SplitBorder::SplitStrategy out) const

Subroutine to decide how we will split this page.

Definition at line 146 of file masstree_split_impl.cpp.

References ASSERT_ND, disable_no_record_split_, foedus::storage::masstree::MasstreePage::get_key_count(), foedus::storage::masstree::MasstreeBorderPage::get_slice(), foedus::storage::masstree::MasstreeBorderPage::is_consecutive_inserts(), foedus::storage::masstree::MasstreePage::is_locked(), foedus::storage::masstree::kInfimumSlice, foedus::storage::masstree::SplitBorder::SplitStrategy::largest_slice_, foedus::storage::masstree::SplitBorder::SplitStrategy::mid_slice_, foedus::storage::masstree::SplitBorder::SplitStrategy::no_record_split_, foedus::storage::masstree::SplitBorder::SplitStrategy::original_key_count_, foedus::storage::masstree::SplitBorder::SplitStrategy::smallest_slice_, target_, trigger_, and foedus::assorted::UniformRandom::uniform_within().

Referenced by run().

146  {
148  const SlotIndex key_count = target_->get_key_count();
149  ASSERT_ND(key_count > 0);
150  out->original_key_count_ = key_count;
151  out->no_record_split_ = false;
152  out->smallest_slice_ = target_->get_slice(0);
153  out->largest_slice_ = target_->get_slice(0);
154 
155  // if consecutive_inserts_, we are already sure about the key distributions, so easy.
157  out->largest_slice_ = target_->get_slice(key_count - 1);
158  if (!disable_no_record_split_ && trigger_ > out->largest_slice_) {
159  out->no_record_split_ = true;
160  DVLOG(1) << "Obviously no record split. key_count=" << static_cast<int>(key_count);
161  out->mid_slice_ = out->largest_slice_ + 1;
162  } else {
163  if (disable_no_record_split_ && trigger_ > out->largest_slice_) {
164  DVLOG(1) << "No-record split was possible, but disable_no_record_split specified."
165  << " simply splitting in half...";
166  }
167  DVLOG(1) << "Breaks a sequential page. key_count=" << static_cast<int>(key_count);
168  out->mid_slice_ = target_->get_slice(key_count / 2);
169  }
170  return;
171  }
172 
173  for (SlotIndex i = 1; i < key_count; ++i) {
174  const KeySlice this_slice = target_->get_slice(i);
175  out->smallest_slice_ = std::min<KeySlice>(this_slice, out->smallest_slice_);
176  out->largest_slice_ = std::max<KeySlice>(this_slice, out->largest_slice_);
177  }
178 
179  ASSERT_ND(key_count >= 2U); // because it's not consecutive, there must be at least 2 records.
180 
181  {
182  // even if not, there is another easy case where two "tides" mix in this page;
183  // one tide from left sequentially inserts keys while another tide from right also sequentially
184  // inserts keys that are larger than left tide. This usually happens at the boundary of
185  // two largely independent partitions (eg multiple threads inserting keys of their partition).
186  // In that case, we should cleanly separate the two tides by picking the smallest key from
187  // right-tide as the separator.
188  KeySlice tides_max[2];
189  KeySlice second_tide_min = kInfimumSlice;
190  bool first_tide_broken = false;
191  bool both_tides_broken = false;
192  tides_max[0] = target_->get_slice(0);
193  // for example, consider the following case:
194  // 1 2 32 33 3 4 34 x
195  // There are two tides 1- and 32-. We detect them as follows.
196  // We initially consider 1,2,32,33 as the first tide because they are sequential.
197  // Then, "3" breaks the first tide. We then consider 1- and 32- as the two tides.
198  // If x breaks the tide again, we give up.
199  for (SlotIndex i = 1; i < key_count; ++i) {
200  // look for "tide breaker" that is smaller than the max of the tide.
201  // as soon as we found two of them (meaning 3 tides or more), we give up.
202  KeySlice slice = target_->get_slice(i);
203  if (!first_tide_broken) {
204  if (slice >= tides_max[0]) {
205  tides_max[0] = slice;
206  continue; // ok!
207  } else {
208  // let's find where a second tide starts.
209  first_tide_broken = true;
210  SlotIndex first_breaker;
211  for (first_breaker = 0; first_breaker < i; ++first_breaker) {
212  const KeySlice breaker_slice = target_->get_slice(first_breaker);
213  if (breaker_slice > slice) {
214  break;
215  }
216  }
217  ASSERT_ND(first_breaker < i);
218  tides_max[0] = slice;
219  ASSERT_ND(second_tide_min == kInfimumSlice);
220  second_tide_min = target_->get_slice(first_breaker);
221  tides_max[1] = target_->get_slice(i - 1);
222  ASSERT_ND(tides_max[0] < tides_max[1]);
223  ASSERT_ND(tides_max[0] < second_tide_min);
224  ASSERT_ND(second_tide_min <= tides_max[1]);
225  }
226  } else {
227  if (slice < second_tide_min && slice >= tides_max[0]) {
228  tides_max[0] = slice;
229  continue; // fine, in the first tide
230  } else if (slice >= tides_max[1]) {
231  tides_max[1] = slice; // okay, in the second tide
232  } else {
233  DVLOG(2) << "Oops, third tide. not the easy case";
234  both_tides_broken = true;
235  break;
236  }
237  }
238  }
239 
240  // Already sorted? (seems consecutive_inserts_ has some false positives)
241  if (!first_tide_broken) {
242  if (!disable_no_record_split_ && trigger_ > out->largest_slice_) {
243  out->no_record_split_ = true;
244  DVLOG(1) << "Obviously no record split. key_count=" << static_cast<int>(key_count);
245  out->mid_slice_ = out->largest_slice_ + 1;
246  } else {
247  if (disable_no_record_split_ && trigger_ > out->largest_slice_) {
248  DVLOG(1) << "No-record split was possible, but disable_no_record_split specified."
249  << " simply splitting in half...";
250  }
251  DVLOG(1) << "Breaks a sequential page. key_count=" << static_cast<int>(key_count);
252  out->mid_slice_ = target_->get_slice(key_count / 2);
253  }
254  return;
255  }
256 
257  ASSERT_ND(first_tide_broken);
258  if (!both_tides_broken) {
259  DVLOG(0) << "Yay, figured out two-tides meeting in a page.";
260  out->mid_slice_ = second_tide_min;
261  return;
262  }
263  }
264 
265 
266  // now we have to pick separator. as we don't sort in-page, this is approximate median selection.
267  // there are a few smart algorithm out there, but we don't need that much accuracy.
268  // just randomly pick a few. good enough.
269  assorted::UniformRandom uniform_random(12345);
270  const SlotIndex kSamples = 7;
271  KeySlice choices[kSamples];
272  for (uint8_t i = 0; i < kSamples; ++i) {
273  choices[i] = target_->get_slice(uniform_random.uniform_within(0, key_count - 1));
274  }
275  std::sort(choices, choices + kSamples);
276  out->mid_slice_ = choices[kSamples / 2];
277 
278  // scan through again to make sure the new separator is not used multiple times as key.
279  // this is required for the invariant "same slices must be in same page"
280  while (true) {
281  bool observed = false;
282  bool retry = false;
283  for (SlotIndex i = 0; i < key_count; ++i) {
284  const KeySlice this_slice = target_->get_slice(i);
285  if (this_slice == out->mid_slice_) {
286  if (observed) {
287  // the key appeared twice! let's try another slice.
288  ++out->mid_slice_;
289  retry = true;
290  break;
291  } else {
292  observed = true;
293  }
294  }
295  }
296  if (retry) {
297  continue;
298  } else {
299  break;
300  }
301  }
302 }
const KeySlice kInfimumSlice
SlotIndex get_key_count() const __attribute__((always_inline))
physical key count (those keys might be deleted) in this page.
KeySlice get_slice(SlotIndex index) const __attribute__((always_inline))
uint16_t SlotIndex
Index of a record in a (border) page.
bool is_locked() const __attribute__((always_inline))
bool is_consecutive_inserts() const
Whether this page is receiving only sequential inserts.
uint64_t KeySlice
Each key slice is an 8-byte integer.
const KeySlice trigger_
The key that triggered this split.
MasstreeBorderPage *const target_
The page to split.
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72
const bool disable_no_record_split_
If true, we never do no-record-split (NRS).

Here is the call graph for this function:

Here is the caller graph for this function:

ErrorCode foedus::storage::masstree::SplitBorder::lock_existing_records ( xct::SysxctWorkspace sysxct_workspace)

Subroutine to lock existing records in target_.

Definition at line 304 of file masstree_split_impl.cpp.

References ASSERT_ND, CHECK_ERROR_CODE, context_, foedus::debugging::RdtscWatch::elapsed(), foedus::storage::masstree::MasstreePage::get_key_count(), foedus::storage::masstree::MasstreeBorderPage::get_owner_id(), foedus::thread::Thread::get_thread_id(), foedus::storage::masstree::MasstreePage::header(), foedus::xct::RwLockableXctId::is_keylocked(), foedus::storage::masstree::MasstreePage::is_locked(), foedus::storage::masstree::kBorderPageMaxSlots, foedus::kErrorCodeOk, foedus::storage::PageHeader::page_id_, foedus::debugging::RdtscWatch::stop(), foedus::storage::PageHeader::storage_id_, foedus::thread::Thread::sysxct_batch_record_locks(), and target_.

Referenced by run().

304  {
305  debugging::RdtscWatch watch; // check how expensive this is
307  const SlotIndex key_count = target_->get_key_count();
308  ASSERT_ND(key_count > 0);
309 
310  // We use the batched interface. It internally sorts, but has better performance if
311  // we provide an already-sorted input. Remember that slots grow backwards,
312  // so larger indexes have smaller lock IDs.
313  xct::RwLockableXctId* record_locks[kBorderPageMaxSlots];
314  for (SlotIndex i = 0; i < key_count; ++i) {
315  record_locks[i] = target_->get_owner_id(key_count - 1U - i); // larger indexes first
316  }
317 
318  VolatilePagePointer page_id(target_->header().page_id_);
320  sysxct_workspace,
321  page_id,
322  key_count,
323  record_locks));
324 
325 #ifndef NDEBUG
326  for (SlotIndex i = 0; i < key_count; ++i) {
327  xct::RwLockableXctId* owner_id = target_->get_owner_id(i);
328  ASSERT_ND(owner_id->is_keylocked());
329  }
330 #endif // NDEBUG
331  watch.stop();
332  DVLOG(1) << "Costed " << watch.elapsed() << " cycles to lock all of "
333  << static_cast<int>(key_count) << " records while splitting";
334  if (watch.elapsed() > (1ULL << 26)) {
335  // if we see this often, we have to optimize this somehow.
336  LOG(WARNING) << "wait, wait, it costed " << watch.elapsed() << " cycles to lock all of "
337  << static_cast<int>(key_count) << " records while splitting!! that's a lot! storage="
339  << ", thread ID=" << context_->get_thread_id();
340  }
341 
342  return kErrorCodeOk;
343 }
SlotIndex get_key_count() const __attribute__((always_inline))
physical key count (those keys might be deleted) in this page.
uint16_t SlotIndex
Index of a record in a (border) page.
bool is_locked() const __attribute__((always_inline))
ErrorCode sysxct_batch_record_locks(xct::SysxctWorkspace *sysxct_workspace, storage::VolatilePagePointer page_id, uint32_t lock_count, xct::RwLockableXctId **locks)
Takes a bunch of locks in the same page for a sysxct running under this thread.
StorageId storage_id_
ID of the storage this page belongs to.
Definition: page.hpp:196
ThreadId get_thread_id() const
Definition: thread.cpp:53
0 means no-error.
Definition: error_code.hpp:87
const SlotIndex kBorderPageMaxSlots
Maximum number of slots in one MasstreeBorderPage.
#define CHECK_ERROR_CODE(x)
This macro calls x and checks its returned error code.
Definition: error_code.hpp:155
MasstreeBorderPage *const target_
The page to split.
xct::RwLockableXctId * get_owner_id(SlotIndex index) __attribute__((always_inline))
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72
thread::Thread *const context_
Thread context.
uint64_t page_id_
Page ID of this page.
Definition: page.hpp:191

Here is the call graph for this function:

Here is the caller graph for this function:

void foedus::storage::masstree::SplitBorder::migrate_records ( KeySlice  inclusive_from,
KeySlice  inclusive_to,
MasstreeBorderPage dest 
) const

Subroutine to construct a new page.

Definition at line 345 of file masstree_split_impl.cpp.

References foedus::assorted::align8(), ASSERT_ND, foedus::storage::masstree::calculate_suffix_length(), foedus::storage::masstree::MasstreePage::get_key_count(), foedus::storage::masstree::MasstreeBorderPage::get_new_slot(), foedus::storage::masstree::MasstreeBorderPage::get_record_from_offset(), foedus::storage::masstree::MasstreePage::is_locked(), foedus::storage::masstree::kInitiallyNextLayer, foedus::storage::masstree::kMaxKeyLength, foedus::storage::masstree::kSupremumSlice, foedus::storage::masstree::MasstreePage::set_key_count(), foedus::storage::masstree::MasstreeBorderPage::set_slice(), target_, and foedus::storage::masstree::MasstreeBorderPage::to_record_length().

Referenced by run().

348  {
350  const auto& copy_from = *target_;
351  const SlotIndex key_count = target_->get_key_count();
352  ASSERT_ND(dest->get_key_count() == 0);
353  dest->next_offset_ = 0;
354  SlotIndex migrated_count = 0;
355  DataOffset unused_space = sizeof(dest->data_);
356  bool sofar_consecutive = true;
357  KeySlice prev_slice = kSupremumSlice;
358  KeyLength prev_remainder = kMaxKeyLength;
359 
360  // Simply iterate over and memcpy one-by-one.
361  // We previously did a bit more complex thing to copy as many records as
362  // possible in one memcpy, but not worth it with the new page layout.
363  // We will keep an eye on the cost of this method, and optimize when it becomes bottleneck.
364  for (SlotIndex i = 0; i < key_count; ++i) {
365  const KeySlice from_slice = copy_from.get_slice(i);
366  if (from_slice >= inclusive_from && from_slice <= inclusive_to) {
367  // move this record.
368  auto* to_slot = dest->get_new_slot(migrated_count);
369  const auto* from_slot = copy_from.get_slot(i);
370  ASSERT_ND(from_slot->tid_.is_keylocked());
371  const KeyLength from_remainder = from_slot->remainder_length_;
372  const KeyLength from_suffix = calculate_suffix_length(from_remainder);
373  const PayloadLength payload = from_slot->lengthes_.components.payload_length_;
374  const KeyLength to_remainder
375  = to_slot->tid_.xct_id_.is_next_layer() ? kInitiallyNextLayer : from_remainder;
376  const KeyLength to_suffix = calculate_suffix_length(to_remainder);
377  if (to_remainder != from_remainder) {
378  ASSERT_ND(to_remainder == kInitiallyNextLayer);
379  ASSERT_ND(from_remainder != kInitiallyNextLayer && from_remainder <= kMaxKeyLength);
380  DVLOG(2) << "the old record is now a next-layer record, this new record can be initially"
381  " a next-layer, saving space for suffixes. from_remainder=" << from_remainder;
382  }
383 
384  dest->set_slice(migrated_count, from_slice);
385  to_slot->tid_.xct_id_ = from_slot->tid_.xct_id_;
386  to_slot->tid_.lock_.reset();
387  to_slot->remainder_length_ = to_remainder;
388  to_slot->lengthes_.components.payload_length_ = payload;
389  // offset/physical_length set later
390 
391  if (sofar_consecutive && migrated_count > 0) {
392  if (prev_slice > from_slice
393  || (prev_slice == from_slice && prev_remainder > from_remainder)) {
394  sofar_consecutive = false;
395  }
396  }
397  prev_slice = from_slice;
398  prev_remainder = to_remainder;
399 
400  // we migh shrink the physical record size.
401  const DataOffset record_length = MasstreeBorderPage::to_record_length(to_remainder, payload);
402  ASSERT_ND(record_length % 8 == 0);
403  ASSERT_ND(record_length <= from_slot->lengthes_.components.physical_record_length_);
404  to_slot->lengthes_.components.physical_record_length_ = record_length;
405  to_slot->lengthes_.components.offset_ = dest->next_offset_;
406  to_slot->original_physical_record_length_ = record_length;
407  to_slot->original_offset_ = dest->next_offset_;
408  dest->next_offset_ += record_length;
409  unused_space -= record_length - sizeof(*to_slot);
410 
411  // Copy the record. We want to do it in one memcpy if possible.
412  // Be careful on the case where suffix length has changed (kInitiallyNextLayer case)
413  if (record_length > 0) {
414  char* to_record = dest->get_record_from_offset(to_slot->lengthes_.components.offset_);
415  if (from_suffix != to_suffix) {
416  ASSERT_ND(to_remainder == kInitiallyNextLayer);
417  ASSERT_ND(from_remainder != kInitiallyNextLayer && from_remainder <= kMaxKeyLength);
418  ASSERT_ND(to_suffix == 0);
419  // Skip suffix part and copy only the payload.
420  std::memcpy(
421  to_record,
422  copy_from.get_record_payload(i),
423  assorted::align8(payload));
424  } else {
425  // Copy suffix (if exists) and payload together.
426  std::memcpy(to_record, copy_from.get_record(i), record_length);
427  }
428  }
429 
430  ++migrated_count;
431  dest->set_key_count(migrated_count);
432  }
433  }
434 
435  dest->consecutive_inserts_ = sofar_consecutive;
436 }
SlotIndex get_key_count() const __attribute__((always_inline))
physical key count (those keys might be deleted) in this page.
T align8(T value)
8-alignment.
uint16_t PayloadLength
Represents a byte-length of a payload in this package.
Definition: masstree_id.hpp:92
uint16_t SlotIndex
Index of a record in a (border) page.
const KeyLength kMaxKeyLength
Max length of a key.
Definition: masstree_id.hpp:75
bool is_locked() const __attribute__((always_inline))
uint16_t DataOffset
Byte-offset in a page.
uint64_t KeySlice
Each key slice is an 8-byte integer.
uint16_t KeyLength
Represents a byte-length of a key in this package.
Definition: masstree_id.hpp:69
const KeyLength kInitiallyNextLayer
A special key length value that denotes the record in a border page was initially a next-layer pointe...
Definition: masstree_id.hpp:86
MasstreeBorderPage *const target_
The page to split.
static DataOffset to_record_length(KeyLength remainder_length, PayloadLength payload_length)
returns minimal physical_record_length_ for the given remainder/payload length.
const KeySlice kSupremumSlice
KeyLength calculate_suffix_length(KeyLength remainder_length) __attribute__((always_inline))
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Here is the call graph for this function:

Here is the caller graph for this function:

ErrorCode foedus::storage::masstree::SplitBorder::run ( xct::SysxctWorkspace sysxct_workspace)
overridevirtual

Border node's Split.

Implements foedus::xct::SysxctFunctor.

Definition at line 39 of file masstree_split_impl.cpp.

References ASSERT_ND, CHECK_ERROR_CODE, context_, decide_strategy(), disable_no_record_split_, foedus::thread::GrabFreeVolatilePagesScope::dispatch(), foedus::storage::masstree::MasstreePage::get_high_fence(), foedus::storage::masstree::MasstreePage::get_key_count(), foedus::xct::RwLockableXctId::get_key_lock(), foedus::storage::masstree::MasstreePage::get_layer(), foedus::thread::Thread::get_local_volatile_page_resolver(), foedus::storage::masstree::MasstreePage::get_low_fence(), foedus::storage::masstree::MasstreeBorderPage::get_next_offset(), foedus::thread::Thread::get_numa_node(), foedus::storage::masstree::MasstreeBorderPage::get_owner_id(), foedus::storage::masstree::MasstreePage::get_version_address(), foedus::thread::GrabFreeVolatilePagesScope::grab(), foedus::storage::masstree::MasstreePage::has_foster_child(), foedus::storage::masstree::MasstreePage::header(), foedus::storage::masstree::MasstreePage::install_foster_twin(), foedus::storage::masstree::MasstreeBorderPage::is_consecutive_inserts(), foedus::storage::masstree::MasstreePage::is_empty_range(), foedus::xct::RwLockableXctId::is_keylocked(), foedus::kErrorCodeOk, foedus::storage::masstree::SplitBorder::SplitStrategy::largest_slice_, lock_existing_records(), foedus::assorted::memory_fence_release(), foedus::storage::masstree::SplitBorder::SplitStrategy::mid_slice_, migrate_records(), foedus::storage::masstree::SplitBorder::SplitStrategy::no_record_split_, foedus::xct::McsRwLock::reset(), foedus::storage::VolatilePagePointer::set(), foedus::storage::PageVersion::set_moved(), foedus::xct::XctId::set_moved(), foedus::storage::masstree::SplitBorder::SplitStrategy::smallest_slice_, foedus::storage::PageHeader::snapshot_, foedus::storage::PageHeader::storage_id_, foedus::thread::Thread::sysxct_page_lock(), target_, and foedus::xct::RwLockableXctId::xct_id_.

39  {
42 
43  debugging::RdtscWatch watch;
44  DVLOG(1) << "Splitting a page... ";
45 
46  // First, lock the page, The page's lock state is before all the records in the page,
47  // so we can simply lock it first.
48  CHECK_ERROR_CODE(context_->sysxct_page_lock(sysxct_workspace, reinterpret_cast<Page*>(target_)));
49 
50  // The lock involves atomic operation, so now all we see are finalized.
51  if (target_->has_foster_child()) {
52  DVLOG(0) << "Interesting. the page has been already split";
53  return kErrorCodeOk;
54  }
55  if (target_->get_key_count() <= 1U) {
56  DVLOG(0) << "This page has too few records. Can't split it";
57  return kErrorCodeOk;
58  }
59 
60  const SlotIndex key_count = target_->get_key_count();
61 
62  // 2 free volatile pages needed.
63  // foster-minor/major (will be placed in successful case)
64  memory::PagePoolOffset offsets[2];
65  thread::GrabFreeVolatilePagesScope free_pages_scope(context_, offsets);
66  CHECK_ERROR_CODE(free_pages_scope.grab(2));
67  const auto& resolver = context_->get_local_volatile_page_resolver();
68 
69  SplitStrategy strategy; // small. just place it on stack
70  decide_strategy(&strategy);
71  ASSERT_ND(target_->get_low_fence() <= strategy.mid_slice_);
72  ASSERT_ND(strategy.mid_slice_ <= target_->get_high_fence());
73  MasstreeBorderPage* twin[2];
74  VolatilePagePointer new_page_ids[2];
75  for (int i = 0; i < 2; ++i) {
76  twin[i] = reinterpret_cast<MasstreeBorderPage*>(resolver.resolve_offset_newpage(offsets[i]));
77  new_page_ids[i].set(context_->get_numa_node(), offsets[i]);
78  twin[i]->initialize_volatile_page(
80  new_page_ids[i],
81  target_->get_layer(),
82  i == 0 ? target_->get_low_fence() : strategy.mid_slice_, // low-fence
83  i == 0 ? strategy.mid_slice_ : target_->get_high_fence()); // high-fence
84  }
85 
86  // lock all records
87  CHECK_ERROR_CODE(lock_existing_records(sysxct_workspace));
88 
89  if (strategy.no_record_split_) {
91  // in this case, we can move all records in one memcpy.
92  // well, actually two : one for slices and another for data.
93  std::memcpy(twin[0]->slices_, target_->slices_, sizeof(KeySlice) * key_count);
94  std::memcpy(twin[0]->data_, target_->data_, sizeof(target_->data_));
95  twin[0]->set_key_count(key_count);
96  twin[1]->set_key_count(0);
97  twin[0]->consecutive_inserts_ = target_->is_consecutive_inserts();
98  twin[1]->consecutive_inserts_ = true;
99  twin[0]->next_offset_ = target_->get_next_offset();
100  twin[1]->next_offset_ = 0;
101  for (SlotIndex i = 0; i < key_count; ++i) {
102  xct::RwLockableXctId* owner_id = twin[0]->get_owner_id(i);
103  ASSERT_ND(owner_id->is_keylocked());
104  owner_id->get_key_lock()->reset(); // no race
105  }
106  } else {
108  strategy.smallest_slice_,
109  strategy.mid_slice_ - 1, // to make it inclusive
110  twin[0]);
112  strategy.mid_slice_,
113  strategy.largest_slice_, // this is inclusive (to avoid supremum hassles)
114  twin[1]);
115  }
116 
117  // Now we will install the new pages. **From now on no error-return allowed**
119  // We install pointers to the pages AFTER we initialize the pages.
120  target_->install_foster_twin(new_page_ids[0], new_page_ids[1], strategy.mid_slice_);
121  free_pages_scope.dispatch(0);
122  free_pages_scope.dispatch(1);
124 
125  // invoking set_moved is the point we announce all of these changes. take fence to make it right
128 
129  // set the "moved" bit so that concurrent transactions
130  // check foster-twin for read-set/write-set checks.
131  for (SlotIndex i = 0; i < key_count; ++i) {
132  xct::RwLockableXctId* owner_id = target_->get_owner_id(i);
133  owner_id->xct_id_.set_moved();
134  }
135 
137 
138  watch.stop();
139  DVLOG(1) << "Costed " << watch.elapsed() << " cycles to split a page. original page physical"
140  << " record count: " << static_cast<int>(key_count)
141  << "->" << static_cast<int>(twin[0]->get_key_count())
142  << " + " << static_cast<int>(twin[1]->get_key_count());
143  return kErrorCodeOk;
144 }
void migrate_records(KeySlice inclusive_from, KeySlice inclusive_to, MasstreeBorderPage *dest) const
Subroutine to construct a new page.
void decide_strategy(SplitStrategy *out) const
Subroutine to decide how we will split this page.
SlotIndex get_key_count() const __attribute__((always_inline))
physical key count (those keys might be deleted) in this page.
void set_moved() __attribute__((always_inline))
Definition: xct_id.hpp:1029
bool is_empty_range() const __attribute__((always_inline))
An empty-range page, either intermediate or border, never has any entries.
uint16_t SlotIndex
Index of a record in a (border) page.
uint32_t PagePoolOffset
Offset in PagePool that compactly represents the page address (unlike 8 bytes pointer).
Definition: memory_id.hpp:44
StorageId storage_id_
ID of the storage this page belongs to.
Definition: page.hpp:196
bool is_consecutive_inserts() const
Whether this page is receiving only sequential inserts.
XctId xct_id_
the second 64bit: Persistent status part of TID.
Definition: xct_id.hpp:1137
uint64_t KeySlice
Each key slice is an 8-byte integer.
ErrorCode lock_existing_records(xct::SysxctWorkspace *sysxct_workspace)
Subroutine to lock existing records in target_.
0 means no-error.
Definition: error_code.hpp:87
ErrorCode sysxct_page_lock(xct::SysxctWorkspace *sysxct_workspace, storage::Page *page)
Takes a page lock in the same page for a sysxct running under this thread.
void install_foster_twin(VolatilePagePointer minor, VolatilePagePointer major, KeySlice foster_fence)
uint8_t get_layer() const __attribute__((always_inline))
Layer-0 stores the first 8 byte slice, Layer-1 next 8 byte...
KeySlice get_high_fence() const __attribute__((always_inline))
const memory::LocalPageResolver & get_local_volatile_page_resolver() const
Returns page resolver to convert only local page ID to page pointer.
Definition: thread.cpp:80
bool has_foster_child() const __attribute__((always_inline))
KeySlice get_low_fence() const __attribute__((always_inline))
void set_moved() __attribute__((always_inline))
Definition: page.hpp:146
#define CHECK_ERROR_CODE(x)
This macro calls x and checks its returned error code.
Definition: error_code.hpp:155
const PageVersion * get_version_address() const __attribute__((always_inline))
MasstreeBorderPage *const target_
The page to split.
ThreadGroupId get_numa_node() const
Definition: thread.hpp:66
xct::RwLockableXctId * get_owner_id(SlotIndex index) __attribute__((always_inline))
bool snapshot_
Whether this page image is of a snapshot page.
Definition: page.hpp:211
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72
const bool disable_no_record_split_
If true, we never do no-record-split (NRS).
void memory_fence_release()
Equivalent to std::atomic_thread_fence(std::memory_order_release).
thread::Thread *const context_
Thread context.

Here is the call graph for this function:

Member Data Documentation

thread::Thread* const foedus::storage::masstree::SplitBorder::context_

Thread context.

Definition at line 53 of file masstree_split_impl.hpp.

Referenced by lock_existing_records(), and run().

const bool foedus::storage::masstree::SplitBorder::disable_no_record_split_

If true, we never do no-record-split (NRS).

This is useful for example when we want to make room for record-expansion. Otherwise, we get stuck when the record-expansion causes a page-split that is eligible for NRS.

Definition at line 66 of file masstree_split_impl.hpp.

Referenced by decide_strategy(), and run().

const PayloadLength foedus::storage::masstree::SplitBorder::piggyback_payload_count_

Definition at line 79 of file masstree_split_impl.hpp.

const KeyLength foedus::storage::masstree::SplitBorder::piggyback_remainder_length_

Definition at line 78 of file masstree_split_impl.hpp.

const bool foedus::storage::masstree::SplitBorder::piggyback_reserve_

An optimization to also make room for a record.

Whether to do that, key length, payload length, and the key (well, suffix). In this case, trigger_ is implicitly the slice for piggyback_reserve_.

This optimization is best-effort. The caller must check afterwards whether the space is actually reserved. For example, a concurrent thread might have newly reserved a (might be too small) space for the key right before the call.

Definition at line 77 of file masstree_split_impl.hpp.

const void* foedus::storage::masstree::SplitBorder::piggyback_suffix_

Definition at line 80 of file masstree_split_impl.hpp.

MasstreeBorderPage* const foedus::storage::masstree::SplitBorder::target_

The page to split.

Precondition
!header_.snapshot_ (split happens to only volatile pages)

Definition at line 58 of file masstree_split_impl.hpp.

Referenced by decide_strategy(), lock_existing_records(), migrate_records(), and run().

const KeySlice foedus::storage::masstree::SplitBorder::trigger_

The key that triggered this split.

A hint for NRS

Definition at line 60 of file masstree_split_impl.hpp.

Referenced by decide_strategy().


The documentation for this struct was generated from the following files: