libfoedus-core
FOEDUS Core Library

Hashtable Storage, a concurrent hashtable. More...

Detailed Description

Hashtable Storage, a concurrent hashtable.

What we have here is a quite simple hash storage. Allthemore simple, more robust and scalable.

Hash Interemediate page and Data page

The hash storage has two types of pages: HashIntermediatePage and HashDataPage. HashIntermediatePage, starting from the root page, points down to other intermediate pages, and the level-0 intermediate pages point to HashDataPage. HashDataPage has a next-pointer. One entry (pointer) in a level-0 intermediate page corresponds to a hash bin. Usually one hash bin corresponds to a very small number of data pages, zero, one or rarely two.

foedus::storage::hash::HashIntermediatePage foedus::storage::hash::HashDataPage

Singly-linked data pages, hopefully rarely used

When a hash bin receives many records or very long key or values, a data page contains a link to next page that forms a singly-linked list of pages. Although this is a dual page pointer, we use only one of them. Snapshot hash pages of course point to only snapshot hash pages. Volatile hash pages, to simplify, also point to only volatile hash pages.

In other words, when we make a volatile version of any page in a hash bin, we instantiate volatile pages for all pages in the hash bin. This might sound costly, but there should be only 1 or only occasionaly a few pages in the bin, assuming 1) bin-bits that is large enough, and 2) good hash function.

Locking Protocol

Intermediate pages are completely lock-free. Data pages do have per-page and per-tuple locks. Hopefully most locks are per-tuple, meaning just taking the TID of the tuple as read-set or write-set. However, there are a few cases the thread must take an exclusive lock in page-header.

All of the above are quite similar to masstree package's insertion/splits/miss-search. Like masstree package, the full-hash/key of a physical tuple is immutable once installed. We only use logical deletion flags and "moved" bits to allow record expansion. Also like masstree, other operations proceed without locks or in many cases even atomic operations, just a few memory barriers. For example, deletion just puts a logical deletion flag to the record without page-lock. Reading is completely atomics-free as far as you find the exact record (miss-search is a bit more complicated as above).

Flags in TID of Hash storages

To implement the above protocol, the moved bit in TID has a bit different meaning from masstree package. It means that the corresponding record for the full-hash/key is somewhere later in the list. When we need to expand a record space that is not at the end, we insert a new physical tuple for the record with large space, then put the "moved" bit in the original physical record. If threads see the flag while processing, it skips such a record. If pre-commit sees the flag because of concurrent migration, it tracks the moved record just like Master-Tree protocol. The idea is almost the same, but here the tracking is much simpler. Just follow the linked list.

History note, or a tombstone
We initially considered a bit more fancy hash storages (namely Cuckoo Hashing), but we didn't see enough benefits to justify its limitations (eg, how to structure snapshot pages if a tuple might be placed in two bins!). The only case it is useful is when the hash bins are really close to full, but we'll just have larger hash storages to avoid such a case. On huge NVRAM (whose density will also quickly grow), 100% fill-factor doesn't buy us anything significant over 50%. Hence discarded that plan. A sad war story, but ain't gonna win all wars. Keep it simple stupid, men.
Collaboration diagram for Hashtable Storage:

Files

file  fwd.hpp
 Forward declarations of classes in hash storage package.
 
file  hash_hashinate.hpp
 Independent utility methods/classes for hashination, or hash functions.
 
file  hash_id.hpp
 Definitions of IDs in this package and a few related constant values.
 
file  hash_log_types.hpp
 Declares all log types used in this storage type.
 

Classes

struct  foedus::storage::hash::HashCombo
 A set of information that are used in many places, extracted from the given key. More...
 
struct  foedus::storage::hash::ComposedBin
 Represents an output of composer on one bin. More...
 
struct  foedus::storage::hash::HashComposedBinsPage
 A page to pack many ComposedBin as an output of composer. More...
 
struct  foedus::storage::hash::ComposedBinsBuffer
 Abstracts how we batch-read several HashComposedBinsPage emit from individual composers. More...
 
struct  foedus::storage::hash::ComposedBinsMergedStream
 Packages all ComposedBinsBuffer to easily extract bins of current interest. More...
 
class  foedus::storage::hash::HashComposer
 Composer for a hash storage. More...
 
struct  foedus::storage::hash::BloomFilterFingerprint
 A fingerprint for bloom filter in each HashDataPage. More...
 
struct  foedus::storage::hash::DataPageBloomFilter
 To quickly check whether a HashDataPage might contain a specific hash value, we maintain a non-counting bloom filter in each page. More...
 
union  foedus::storage::hash::IntermediateRoute
 Compactly represents the route of intermediate pages to reach the given hash bin. More...
 
struct  foedus::storage::hash::HashBinRange
 Represents a range of hash bins in a hash storage, such as what an intermediate page is responsible for. More...
 
struct  foedus::storage::hash::HashCreateLogType
 Log type of CREATE HASH STORAGE operation. More...
 
struct  foedus::storage::hash::HashCommonLogType
 A base class for HashInsertLogType/HashDeleteLogType/HashOverwriteLogType. More...
 
struct  foedus::storage::hash::HashInsertLogType
 Log type of hash-storage's insert operation. More...
 
struct  foedus::storage::hash::HashDeleteLogType
 Log type of hash-storage's delete operation. More...
 
struct  foedus::storage::hash::HashUpdateLogType
 Log type of hash-storage's update operation. More...
 
struct  foedus::storage::hash::HashOverwriteLogType
 Log type of hash-storage's overwrite operation. More...
 
struct  foedus::storage::hash::HashMetadata
 Metadata of an hash storage. More...
 
class  foedus::storage::hash::HashIntermediatePage
 Represents an intermediate page in Hashtable Storage. More...
 
struct  foedus::storage::hash::HashDataPage::Slot
 Fix-sized slot for each record, which is placed at the end of data region. More...
 
class  foedus::storage::hash::HashDataPage
 Represents an individual data page in Hashtable Storage. More...
 
class  foedus::storage::hash::HashPartitioner
 Partitioner for a hash storage. More...
 
struct  foedus::storage::hash::ReserveRecords
 A system transaction to reserve a physical record(s) in a hash data page. More...
 
class  foedus::storage::hash::HashStorage
 Represents a key-value store based on a dense and regular hash. More...
 
class  foedus::storage::hash::HashStoragePimpl
 Pimpl object of HashStorage. More...
 
struct  foedus::storage::hash::HashTmpBin::Record
 Represents a record with a unique key. More...
 
class  foedus::storage::hash::HashTmpBin
 An in-memory single-threaded data structure to compose tuples in a hash bin. More...
 

Typedefs

typedef HashIntermediatePage foedus::storage::hash::HashRootInfoPage
 Output of one compose() call, which are then combined in construct_root(). More...
 
typedef uint64_t foedus::storage::hash::HashValue
 Represents a full 64-bit hash value calculated from a key. More...
 
typedef uint64_t foedus::storage::hash::HashBin
 Represents a bin of a hash value. More...
 
typedef uint16_t foedus::storage::hash::KeyLength
 Represents a byte-length of a key in this package. More...
 
typedef uint16_t foedus::storage::hash::PayloadLength
 Represents a byte-length of a payload in this package. More...
 

Functions

HashValue foedus::storage::hash::hashinate (const void *key, uint16_t key_length)
 Calculates hash value for general input. More...
 
template<typename T >
HashValue foedus::storage::hash::hashinate (T key)
 Calculates hash value for a primitive type. More...
 
uint64_t foedus::storage::hash::fanout_power (uint8_t exponent)
 
uint8_t foedus::storage::hash::bins_to_level (uint64_t bins)
 

Variables

const uint64_t foedus::storage::hash::kXxhashKeySeed = 0x3f119e0435262a17ULL
 Default seed value used for xxhash's xxh32/64 to hashinate keys. More...
 
const uint16_t foedus::storage::hash::kHashDataPageBloomFilterBytes = 64
 Byte size of bloom filter in each HashDataPage. More...
 
const uint16_t foedus::storage::hash::kHashDataPageBloomFilterBits = kHashDataPageBloomFilterBytes * 8
 Bit size of bloom filter in each HashDataPage. More...
 
const uint8_t foedus::storage::hash::kHashDataPageBloomFilterIndexSize = 9U
 How many bits we need in order to represent an index in bloom filter in each HashDataPage. More...
 
const uint8_t foedus::storage::hash::kHashDataPageBloomFilterHashes = 3
 Number of hash functions (k) of bloom filter in each HashDataPage. More...
 
const uint16_t foedus::storage::hash::kHashIntermediatePageHeaderSize = 64
 Byte size of header in an intermediate page of hash storage. More...
 
const uint8_t foedus::storage::hash::kHashIntermediatePageFanout
 Number of pointers in an intermediate page of hash storage. More...
 
const uint64_t foedus::storage::hash::kHashMaxBins []
 kHashTotalBins[n] gives the maximum number of hash bins n-level hash can hold. More...
 
const uint8_t foedus::storage::hash::kHashMaxLevels = 8
 Max level of intermediate pages. More...
 
const uint16_t foedus::storage::hash::kHashDataPageHeaderSize = 128
 Byte size of header in data page of hash storage. More...
 
const uint16_t foedus::storage::hash::kHashDataPageDataSize = kPageSize - kHashDataPageHeaderSize
 Body data byte size in data page of hash storage. More...
 
const uint8_t foedus::storage::hash::kHashMinBinBits = 7U
 Minimum number allowed for bin-bits. More...
 
const uint8_t foedus::storage::hash::kHashMaxBinBits = 48U
 Maximum number allowed for bin-bits. More...
 

Class Documentation

struct foedus::storage::hash::ComposedBin

Represents an output of composer on one bin.

Definition at line 85 of file hash_composed_bins_impl.hpp.

Class Members
HashBin bin_
SnapshotPagePointer page_id_
struct foedus::storage::hash::HashComposedBinsPage

A page to pack many ComposedBin as an output of composer.

Output of a composer consists of many of this pages. Each HashComposedBinsPage contains lots of snapshotted-bins. This is simply a linked list connected by the next-page pointer. All bins stored here are sorted by bin.

Definition at line 101 of file hash_composed_bins_impl.hpp.

Collaboration diagram for foedus::storage::hash::HashComposedBinsPage:
Class Members
uint32_t bin_count_
HashBinRange bin_range_
ComposedBin bins_[kHashComposedBinsPageMaxBins]
uint32_t dummy_
PageHeader header_
SnapshotPagePointer next_page_
uint64_t padding_

Typedef Documentation

Represents a bin of a hash value.

As a primitive type, this is same as HashValue, but this type represents an extracted hash bin. For example, if the storage uses 20 bin-bits, bin = hash >> (64-20). Each bin, even empty one, consumes a 16-byte space. So, 32 bin-bits consumes 64 GB even without tuple data. Most usecase would be within 32 bits. However, there might be a biiiiigggggggg hash storage (NVRAM! PBs!), so let's reserve 64 bits. But, this is guaranteed to be within 48 bits.

Definition at line 142 of file hash_id.hpp.

typedef HashIntermediatePage foedus::storage::hash::HashRootInfoPage

Output of one compose() call, which are then combined in construct_root().

The format is exactly same as HashIntermediatePage, but typedef-ed for clarity. The only differences are:

  • Snapshot pointer points to HashComposedBinsPage instead of HashDataPage/HashIntermediatePage.
  • Volatile pointer (which of course is never used in snapshot page) instead represents the number of contiguously-written HashComposedBinsPage in the pointed sub-tree. This allows the combiner to read the entire linked-list in one-shot.
Linked list of HashComposedBinsPage
In hash storage, what compose() returns are just a list of bins. There essentially is no structure of them. The only reason we have this "root" page is to allow parallel combination by the gleaner. Each child-pointer from the root page is assigned to one thread on its own node, then written out in that node. So, after following the first child pointer, we just need a linked list, which is HashComposedBinsPage.
2-path composition
Each reducer returns a linked-list of HashComposedBinsPage for every direct-child of the root page. Then, the gleaner combines them in a parallel fashion, resulting in snapshot interemediate pages that will be installed to the root page. The second phase consists of reading the HashComposedBinsPage from each composer (reducer), collecting updated bins on each page, and creating a new interemediate page hierarchically. We have several classes decoupled from the composer class itself as listed below:
1-level hashtable
It works the same way even when the hash table has only one level (root directly points to data pages). Even in this case, HashRootInfoPage still points to HashComposedBinsPage, which contains only one entry per page. Kind of wasteful, but it has negligible overheads compared to the entire mapper/reducer. Unlike array package, we don't do special optimization for this case in order to keep the code simpler, and because the storage could benefit from parallelization even in this case (the root page still contains hundreds of hash bins).

Definition at line 79 of file hash_composed_bins_impl.hpp.

Represents a full 64-bit hash value calculated from a key.

In addition to use it as usual hash value, we extract its higher bits as bin. Each hash storage has a static configuration that determines how many bits are used for bins. Each bin represents a range of hash values, such as 0x1234560000000000 (inclusive) to 0x1234570000000000 (exclusive) where bins use the high 24 bits.

See also
HashBin
HashRange

Definition at line 129 of file hash_id.hpp.

Represents a byte-length of a key in this package.

Definition at line 203 of file hash_id.hpp.

Represents a byte-length of a payload in this package.

Definition at line 209 of file hash_id.hpp.

Function Documentation

uint8_t foedus::storage::hash::bins_to_level ( uint64_t  bins)
inline
Returns
levels appropriate to hold at least the given number of bins.

Definition at line 90 of file hash_id.hpp.

Referenced by foedus::storage::hash::HashStoragePimpl::create(), and foedus::storage::hash::HashStoragePimpl::load().

90  {
91  uint8_t level;
92  for (level = 1U; kHashMaxBins[level] < bins; ++level) {
93  continue;
94  }
95  return level;
96 }
const uint64_t kHashMaxBins[]
kHashTotalBins[n] gives the maximum number of hash bins n-level hash can hold.
Definition: hash_id.hpp:74

Here is the caller graph for this function:

uint64_t foedus::storage::hash::fanout_power ( uint8_t  exponent)
inline
Returns
kHashIntermediatePageFanout^exponent.

Definition at line 56 of file hash_id.hpp.

References foedus::storage::hash::kHashIntermediatePageFanout.

Referenced by foedus::storage::hash::HashStoragePimpl::create().

56  {
57  uint64_t ret = 1U;
58  for (uint8_t i = 0; i < exponent; ++i) {
60  }
61  return ret;
62 }
const uint8_t kHashIntermediatePageFanout
Number of pointers in an intermediate page of hash storage.
Definition: hash_id.hpp:49

Here is the caller graph for this function:

template<typename T >
HashValue foedus::storage::hash::hashinate ( key)

Calculates hash value for a primitive type.

Parameters
[in]keyPrimitive key to hash
Template Parameters
TPrimitive type. All primitive types are explicitly instantiated.
Parameters
[in]keyPrimitive key to hash
Template Parameters
TPrimitive type.

Definition at line 42 of file hash_hashinate.cpp.

References foedus::storage::hash::kXxhashKeySeed.

42  {
43  return ::XXH64(&key, sizeof(T), kXxhashKeySeed);
44 }
const uint64_t kXxhashKeySeed
Default seed value used for xxhash's xxh32/64 to hashinate keys.

Variable Documentation

const uint16_t foedus::storage::hash::kHashDataPageBloomFilterBits = kHashDataPageBloomFilterBytes * 8

Bit size of bloom filter in each HashDataPage.

Definition at line 75 of file hash_hashinate.hpp.

const uint16_t foedus::storage::hash::kHashDataPageBloomFilterBytes = 64

Byte size of bloom filter in each HashDataPage.

must be a power of 2 to slightly simplify the index calculation.

Definition at line 69 of file hash_hashinate.hpp.

Referenced by foedus::storage::hash::operator<<().

const uint8_t foedus::storage::hash::kHashDataPageBloomFilterHashes = 3

Number of hash functions (k) of bloom filter in each HashDataPage.

We consume kHashDataPageBloomFilterHashes * kHashDataPageBloomFilterIndexSize bits out of the full hash value as the fingerprint.

Definition at line 97 of file hash_hashinate.hpp.

Referenced by foedus::storage::hash::DataPageBloomFilter::add(), foedus::storage::hash::DataPageBloomFilter::contains(), foedus::storage::hash::DataPageBloomFilter::extract_fingerprint(), foedus::storage::hash::operator<<(), and foedus::storage::hash::BloomFilterFingerprint::operator==().

const uint8_t foedus::storage::hash::kHashDataPageBloomFilterIndexSize = 9U

How many bits we need in order to represent an index in bloom filter in each HashDataPage.

Invariant
2^kHashDataPageBloomFilterIndexSize == kHashDataPageBloomFilterBits

Definition at line 82 of file hash_hashinate.hpp.

Referenced by foedus::storage::hash::DataPageBloomFilter::extract_fingerprint().

const uint16_t foedus::storage::hash::kHashDataPageDataSize = kPageSize - kHashDataPageHeaderSize

Body data byte size in data page of hash storage.

Definition at line 116 of file hash_id.hpp.

const uint16_t foedus::storage::hash::kHashDataPageHeaderSize = 128

Byte size of header in data page of hash storage.

Definition at line 110 of file hash_id.hpp.

Referenced by foedus::storage::hash::HashDataPage::assert_entries_impl().

const uint8_t foedus::storage::hash::kHashIntermediatePageFanout
const uint16_t foedus::storage::hash::kHashIntermediatePageHeaderSize = 64

Byte size of header in an intermediate page of hash storage.

Definition at line 40 of file hash_id.hpp.

const uint8_t foedus::storage::hash::kHashMaxBinBits = 48U

Maximum number allowed for bin-bits.

Restricting to this size should be reasonable (2^48 is big!) and allows optimization for 128-bit sorting.

Definition at line 159 of file hash_id.hpp.

Referenced by foedus::storage::hash::SortEntry::set(), and foedus::storage::hash::HashMetadata::set_capacity().

const uint64_t foedus::storage::hash::kHashMaxBins[]
Initial value:
= {
1ULL,
kFanout64 * kFanout64,
kFanout64 * kFanout64 * kFanout64,
kFanout64 * kFanout64 * kFanout64 * kFanout64,
kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64,
kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64,
kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64,
kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64 * kFanout64,
}
const uint64_t kFanout64
just to write the following concisely
Definition: hash_id.hpp:65

kHashTotalBins[n] gives the maximum number of hash bins n-level hash can hold.

n is at most 7, which should be a reasonable assumption. almost 2^56. Depending on the bin-bits config of the storage, it uses a smaller number of bins. In other words, kHashMaxBins[level - 1] < 2^bin_bits <= kHashMaxBins[level].

Definition at line 74 of file hash_id.hpp.

Referenced by foedus::storage::hash::HashIntermediatePage::assert_range(), foedus::storage::hash::IntermediateRoute::convert_back(), foedus::storage::hash::HashPartitioner::design_partition(), foedus::storage::hash::HashPartitioner::design_partition_task(), foedus::storage::hash::HashStorageControlBlock::get_root_children(), foedus::storage::hash::hash_intermediate_volatile_page_init(), foedus::storage::hash::HashIntermediatePage::initialize_snapshot_page(), foedus::storage::hash::HashIntermediatePage::initialize_volatile_page(), foedus::storage::hash::ComposedBinsMergedStream::open_path(), foedus::storage::hash::operator<<(), and foedus::storage::hash::HashStoragePimpl::verify_single_thread_intermediate().

const uint8_t foedus::storage::hash::kHashMaxLevels = 8

Max level of intermediate pages.

In reality 7, but let's make it 8 so that it's good for alignments in several places.

Definition at line 104 of file hash_id.hpp.

Referenced by foedus::storage::hash::HashComposer::compose(), foedus::storage::hash::IntermediateRoute::convert_back(), and foedus::storage::hash::HashComposeContext::HashComposeContext().

const uint8_t foedus::storage::hash::kHashMinBinBits = 7U

Minimum number allowed for bin-bits.

128 pointers fit in one intermediate page, so there is no point to use less than 2^7 bins.

Definition at line 150 of file hash_id.hpp.

Referenced by foedus::storage::hash::HashMetadata::set_capacity().

const uint64_t foedus::storage::hash::kXxhashKeySeed = 0x3f119e0435262a17ULL

Default seed value used for xxhash's xxh32/64 to hashinate keys.

Definition at line 46 of file hash_hashinate.hpp.

Referenced by foedus::storage::hash::hashinate().