libfoedus-core
FOEDUS Core Library
foedus::storage::sequential::SequentialCursor Class Reference

A cursor interface to read tuples from a sequential storage. More...

Detailed Description

A cursor interface to read tuples from a sequential storage.

Unlike other storages, the only read-access to sequential storages is, as the name implies, a full sequential scan. This cursor interface is thus optimized for cases where we scan millions of records. This implies that, unlike masstree's cursor, we don't have to worry about infrequent overheads, such as new/delete in initialization.

Example first
Use it as follows.
memory::AlignedMemory buffer(1 << 16, 1 << 12, kNumaAllocOnnode, 0);
SequentialCursor cursor(context, storage, buffer.get_block(), 1 << 16);
SequentialRecordIterator it;
while (cursor.is_valid()) {
CHECK_ERROR(cursor.next_batch(&it));
while (it.is_valid()) {
std::cout << std::string(it.get_cur_record_raw(), it.get_cur_record_length());
...
it.next();
}
}
Safe Epoch and Unsafe Epoch
Safe epochs are epochs before the current grace epoch (current global epoch -1). There will be no more transactions in such epochs that might insert new records. Thus, thanks to the append-only nature of sequential storage, reading records in safe epochs does not need any concurrency control. Unsafe epochs, OTOH, are the currrent grace epoch and later. Some transaction in grace-epoch might be now in apply-phase to insert records, and furthermore some transaction might newly start in current-epoch. This cursor might do expensive synchronization if the user requests to read records from unsafe epochs.
Optimistic vs pessimistic
Reading unsafe epochs should be protected by lock (pessimistic) because 1) this happens rarely, and 2) quite likely that OCC will abort because all accesses are at the tail. So far, the implemention is OCC, just taking page version of tail page and read-set of last record in tail page. Frankly speaking to save coding. We should measure OCC vs lock in this case and most likely implement lock. The lock must be a bit more complicated than usual because insertion threads should not take locks frequently (too expensive then).

Definition at line 91 of file sequential_cursor.hpp.

#include <sequential_cursor.hpp>

Public Types

enum  OrderMode { kNodeFirstMode, kLooseEpochSortMode }
 The order this cursor returns tuples. More...
 

Public Member Functions

 SequentialCursor (thread::Thread *context, const sequential::SequentialStorage &storage, void *buffer, uint64_t buffer_size, OrderMode order_mode=kNodeFirstMode, Epoch from_epoch=INVALID_EPOCH, Epoch to_epoch=INVALID_EPOCH, int32_t node_filter=-1)
 Constructs a cursor to read tuples from this storage. More...
 
 ~SequentialCursor ()
 
thread::Threadget_context () const
 
const sequential::SequentialStorageget_storage () const
 
Epoch get_from_epoch () const
 
Epoch get_to_epoch () const
 
ErrorCode next_batch (SequentialRecordIterator *out)
 Returns a batch of records as an iterator. More...
 
bool is_valid () const
 
bool is_finished_snapshots () const
 Followings are rather implementation details. Used only from testcases. More...
 
bool is_finished_safe_volatiles () const
 
bool is_finished_unsafe_volatiles () const
 

Friends

std::ostream & operator<< (std::ostream &o, const SequentialCursor &v)
 

Member Enumeration Documentation

The order this cursor returns tuples.

Enumerator
kNodeFirstMode 

Returns as many records as possible from node-0's core-0, core-1, do the same from node-1,...

Note that even this mode might return unsafe epoch at last because we delay reading unsafe epochs as much as possible.

kLooseEpochSortMode 

Returns records loosely ordered by epochs.

We don't guarantee true ordering even in this case, which is too expensive. TASK(Hideaki) Not implemented yet.

Definition at line 94 of file sequential_cursor.hpp.

94  {
107  };
Returns as many records as possible from node-0's core-0, core-1, do the same from node-1...

Constructor & Destructor Documentation

foedus::storage::sequential::SequentialCursor::SequentialCursor ( thread::Thread context,
const sequential::SequentialStorage storage,
void *  buffer,
uint64_t  buffer_size,
OrderMode  order_mode = kNodeFirstMode,
Epoch  from_epoch = INVALID_EPOCH,
Epoch  to_epoch = INVALID_EPOCH,
int32_t  node_filter = -1 
)

Constructs a cursor to read tuples from this storage.

Parameters
[in]contextThread context of the transaction
[in]storageThe sequential storage to read from
[in,out]bufferThe buffer to read a number of snapshot pages in a batch. This buffer must be aligned for direct-IO.
[in]buffer_sizeByte size of buffer. Must be at least 4kb.
[in]order_modeThe order this cursor returns tuples
[in]from_epochInclusive beginning of epochs to read. If not specified, all epochs.
[in]to_epochExclusive end of epochs to read. To read records in unsafe epochs, specify a future epoch, larger than the current grace epoch (remember, it's exclusive end). If not specified, all safe epochs (fast, but does not return records being added).
[in]node_filterIf specified, returns records only in the given node. negative for reading from all nodes. This is especially useful for parallelizing a scan on a large sequential storage.

Default parameter: the system-initial epoch for from_epoch and current-global epoch for to_epoch (thus safe_epoch_only_). Assuming this storage is used for log/archive data, this should be a quite common usecase. order_mode is defaulted to kNodeFirstMode.

Definition at line 50 of file sequential_cursor.cpp.

References ASSERT_ND, foedus::xct::XctManager::get_current_grace_epoch(), foedus::xct::Xct::get_isolation_level(), foedus::Engine::get_xct_manager(), foedus::Epoch::is_valid(), foedus::storage::kPageSize, and foedus::xct::kSnapshot.

59  : context_(context),
60  xct_(&context->get_current_xct()),
61  engine_(context->get_engine()),
63  storage_(storage),
64  from_epoch_(
65  from_epoch.is_valid() ? from_epoch : engine_->get_savepoint_manager()->get_earliest_epoch()),
66  to_epoch_(
67  to_epoch.is_valid() ? to_epoch : engine_->get_xct_manager()->get_current_grace_epoch()),
68  latest_snapshot_epoch_(engine_->get_snapshot_manager()->get_snapshot_epoch()),
69  from_epoch_volatile_(max_from_epoch_snapshot_epoch(from_epoch_, latest_snapshot_epoch_)),
70  node_filter_(node_filter),
71  node_count_(engine_->get_soc_count()),
72  order_mode_(order_mode),
73  buffer_(reinterpret_cast<SequentialRecordBatch*>(buffer)),
74  buffer_size_(buffer_size),
75  buffer_pages_(buffer_size / kPageSize) {
76  ASSERT_ND(buffer_size >= kPageSize);
77  current_node_ = 0;
78  finished_snapshots_ = false;
79  finished_safe_volatiles_ = false;
80  finished_unsafe_volatiles_ = false;
81  states_.clear();
82 
83  grace_epoch_ = engine_->get_xct_manager()->get_current_grace_epoch();
84  ASSERT_ND(from_epoch_.is_valid());
85  ASSERT_ND(to_epoch_.is_valid());
86  ASSERT_ND(from_epoch_ <= to_epoch_);
87 
88  if (xct_->get_isolation_level() == xct::kSnapshot
89  || (latest_snapshot_epoch_.is_valid() && to_epoch_ <= latest_snapshot_epoch_)) {
90  snapshot_only_ = true;
91  safe_epoch_only_ = true;
92  finished_safe_volatiles_ = true;
93  finished_unsafe_volatiles_ = true;
94  } else {
95  snapshot_only_ = false;
96  if (to_epoch_ <= grace_epoch_) {
97  safe_epoch_only_ = true;
98  // We do NOT rule out reading unsafe pages yet.
99  // Even a safe page might be conservatively deemed as unsafe in our logic,
100  // so we must make sure we go on to unsafe-volatile phase too.
101  // Real-check happens in next_batch_unsafe_volatiles().
102  // finished_unsafe_volatiles_ = true;
103  } else {
104  // only in this case, we have to take a lock
105  safe_epoch_only_ = false;
106  }
107  }
108 
109  if (!latest_snapshot_epoch_.is_valid() || latest_snapshot_epoch_ < from_epoch_) {
110  finished_snapshots_ = true;
111  }
112 }
Epoch get_current_grace_epoch() const
Returns the current grace-period epoch (global epoch - 1), the epoch some transaction might be still ...
const GlobalVolatilePageResolver & get_global_volatile_page_resolver() const
Returns the page resolver to convert volatile page ID to page pointer.
Snapshot isolation (SI), meaning the transaction reads a consistent and complete image of the databas...
Definition: xct_id.hpp:78
savepoint::SavepointManager * get_savepoint_manager() const
See Savepoint Manager.
Definition: engine.cpp:53
soc::SocId get_soc_count() const
Shorthand for get_options().thread_.group_count_.
Definition: engine.cpp:74
Epoch max_from_epoch_snapshot_epoch(Epoch from_epoch, Epoch latest_snapshot_epoch)
xct::XctManager * get_xct_manager() const
See Transaction Manager.
Definition: engine.cpp:61
bool is_valid() const
Definition: epoch.hpp:96
IsolationLevel get_isolation_level() const
Returns the level of isolation for this transaction.
Definition: xct.hpp:149
snapshot::SnapshotManager * get_snapshot_manager() const
See Snapshot Manager.
Definition: engine.cpp:56
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72
Epoch get_snapshot_epoch() const
Returns the most recently snapshot-ed epoch, all logs upto this epoch is safe to delete.
memory::EngineMemory * get_memory_manager() const
See Memory Manager.
Definition: engine.cpp:50
const uint16_t kPageSize
A constant defining the page size (in bytes) of both snapshot pages and volatile pages.
Definition: storage_id.hpp:45

Here is the call graph for this function:

foedus::storage::sequential::SequentialCursor::~SequentialCursor ( )

Definition at line 114 of file sequential_cursor.cpp.

114  {
115  states_.clear();
116 }

Member Function Documentation

thread::Thread* foedus::storage::sequential::SequentialCursor::get_context ( ) const
inline

Definition at line 142 of file sequential_cursor.hpp.

142 { return context_;}
Epoch foedus::storage::sequential::SequentialCursor::get_from_epoch ( ) const
inline
Returns
Inclusive beginning of epochs to read.

Definition at line 146 of file sequential_cursor.hpp.

Referenced by foedus::storage::sequential::operator<<().

146 { return from_epoch_; }

Here is the caller graph for this function:

const sequential::SequentialStorage& foedus::storage::sequential::SequentialCursor::get_storage ( ) const
inline

Definition at line 143 of file sequential_cursor.hpp.

Referenced by foedus::storage::sequential::operator<<().

143 { return storage_; }

Here is the caller graph for this function:

Epoch foedus::storage::sequential::SequentialCursor::get_to_epoch ( ) const
inline
Returns
Exclusive end of epochs to read.

Definition at line 148 of file sequential_cursor.hpp.

Referenced by foedus::storage::sequential::operator<<().

148 { return to_epoch_; }

Here is the caller graph for this function:

bool foedus::storage::sequential::SequentialCursor::is_finished_safe_volatiles ( ) const
inline

Definition at line 170 of file sequential_cursor.hpp.

170 { return finished_safe_volatiles_; }
bool foedus::storage::sequential::SequentialCursor::is_finished_snapshots ( ) const
inline

Followings are rather implementation details. Used only from testcases.

Definition at line 169 of file sequential_cursor.hpp.

169 { return finished_snapshots_; }
bool foedus::storage::sequential::SequentialCursor::is_finished_unsafe_volatiles ( ) const
inline

Definition at line 171 of file sequential_cursor.hpp.

171 { return finished_unsafe_volatiles_; }
bool foedus::storage::sequential::SequentialCursor::is_valid ( ) const
inline
Returns
false if there is no chance that this cursor returns any more record. As a very rare case, this might return true though there is no more matching record.

Definition at line 164 of file sequential_cursor.hpp.

Referenced by next_batch().

164  {
165  return !(finished_snapshots_ && finished_safe_volatiles_ && finished_unsafe_volatiles_);
166  }

Here is the caller graph for this function:

ErrorCode foedus::storage::sequential::SequentialCursor::next_batch ( SequentialRecordIterator out)

Returns a batch of records as an iterator.

Parameters
[out]outan iterator over returned records.

It might return an empty batch even when this cursor has more records to return. Invoke is_valid() to check it. This method does nothing if is_valid() is already false. Each batch is guaranteed to be from one node, and actually from one page.

Definition at line 159 of file sequential_cursor.cpp.

References ASSERT_ND, CHECK_ERROR_CODE, is_valid(), foedus::kErrorCodeOk, and foedus::storage::sequential::SequentialRecordIterator::reset().

159  {
160  out->reset();
161  if (states_.empty()) {
162  CHECK_ERROR_CODE(init_states());
163  }
164 
165  bool found = false;
166  if (!finished_snapshots_) {
167  CHECK_ERROR_CODE(next_batch_snapshot(out, &found));
168  if (found) {
169  return kErrorCodeOk;
170  } else {
171  DVLOG(1) << "Finished reading snapshot pages:";
172  DVLOG(2) << *this;
173  ASSERT_ND(finished_snapshots_);
174  refresh_grace_epoch();
175  }
176  }
177 
178  if (!finished_safe_volatiles_) {
179  CHECK_ERROR_CODE(next_batch_safe_volatiles(out, &found));
180  if (found) {
181  return kErrorCodeOk;
182  } else {
183  DVLOG(1) << "Finished reading safe volatile pages:";
184  DVLOG(2) << *this;
185  ASSERT_ND(finished_safe_volatiles_);
186  refresh_grace_epoch();
187  }
188  }
189 
190  if (!finished_unsafe_volatiles_) {
191  CHECK_ERROR_CODE(next_batch_unsafe_volatiles(out, &found));
192  if (found) {
193  return kErrorCodeOk;
194  } else {
195  DVLOG(1) << "Finished reading unsafe volatile pages:";
196  DVLOG(2) << *this;
197  ASSERT_ND(finished_unsafe_volatiles_);
198  }
199  }
200 
201  ASSERT_ND(!is_valid());
202  return kErrorCodeOk;
203 }
0 means no-error.
Definition: error_code.hpp:87
#define CHECK_ERROR_CODE(x)
This macro calls x and checks its returned error code.
Definition: error_code.hpp:155
#define ASSERT_ND(x)
A warning-free wrapper macro of assert() that has no performance effect in release mode even when 'x'...
Definition: assert_nd.hpp:72

Here is the call graph for this function:

Friends And Related Function Documentation

std::ostream& operator<< ( std::ostream &  o,
const SequentialCursor v 
)
friend

Definition at line 652 of file sequential_cursor.cpp.

652  {
653  o << "<SequentialCursor>" << std::endl;
654  o << " " << v.get_storage() << std::endl;
655  o << " <from_epoch>" << v.get_from_epoch() << "</from_epoch>" << std::endl;
656  o << " <to_epoch>" << v.get_to_epoch() << "</to_epoch>" << std::endl;
657  o << " <order_mode>" << v.order_mode_ << "</order_mode>" << std::endl;
658  o << " <node_filter>" << v.node_filter_ << "</node_filter>" << std::endl;
659  o << " <snapshot_only_>" << v.snapshot_only_ << "</snapshot_only_>" << std::endl;
660  o << " <safe_epoch_only_>" << v.safe_epoch_only_ << "</safe_epoch_only_>" << std::endl;
661  o << " <buffer_>" << v.buffer_ << "</buffer_>" << std::endl;
662  o << " <buffer_size>" << v.buffer_size_ << "</buffer_size>" << std::endl;
663  o << " <buffer_pages_>" << v.buffer_pages_ << "</buffer_pages_>" << std::endl;
664  o << " <current_node_>" << v.current_node_ << "</current_node_>" << std::endl;
665  o << " <finished_snapshots_>" << v.finished_snapshots_ << "</finished_snapshots_>" << std::endl;
666  o << " <finished_safe_volatiles_>" << v.finished_safe_volatiles_
667  << "</finished_safe_volatiles_>" << std::endl;
668  o << " <finished_unsafe_volatiles_>" << v.finished_unsafe_volatiles_
669  << "</finished_unsafe_volatiles_>" << std::endl;
670  o << "</SequentialCursor>";
671  return o;
672 }

The documentation for this class was generated from the following files: