Data Types#

The genogrove::data_type namespace contains genomic data type definitions and utilities.

key_type_base Concept#

The key_type_base concept defines the requirements for custom key types used with the grove:

  • a < b, a > b, a == b — Comparison operators

  • T::overlaps(a, b) — Static overlap detection returning bool

  • T::aggregate(a, b) — Static pairwise aggregation returning T

  • a.to_string() — String representation

All built-in key types (interval, genomic_coordinate, numeric, kmer) satisfy this concept.

interval#

class interval#

Genomic interval representing a contiguous region with start and end positions.

This class represents basic genomic intervals without strand information, satisfying the key_type_base concept for use in grove structures. It provides simple range-based semantics for interval storage, overlap detection, and aggregation.

Public Functions

inline constexpr interval()#

Default constructor creating an uninitialized interval.

inline constexpr interval(size_t start, size_t end)#

Construct an interval with specified start and end positions.

Parameters:
  • start – Starting position (0-based, inclusive)

  • end – Ending position (0-based, inclusive)

Throws:

std::invalid_argument – if start > end

~interval() = default#
inline constexpr bool operator<(const interval &other) const#

Less-than comparison based on start position, then end position.

Intervals are ordered first by start position (ascending), then by end position (ascending) if start positions are equal.

Parameters:

other – Interval to compare against

Returns:

true if this interval is less than other

inline constexpr bool operator>(const interval &other) const#

Greater-than comparison based on start position, then end position.

Parameters:

other – Interval to compare against

Returns:

true if this interval is greater than other

inline constexpr bool operator==(const interval &other) const#

Equality comparison (both start and end must match).

Parameters:

other – Interval to compare against

Returns:

true if start and end positions are both equal

std::string to_string() const#

Convert interval to string representation.

Format: “[start,end]” (e.g., “[100,200]”)

Note

Required by key_type_base concept for debugging/display

Returns:

String representation of the interval

inline constexpr size_t get_start() const noexcept#

Get the start position (0-based, inclusive).

Returns:

Start position

inline constexpr void set_range(size_t start, size_t end)#

Set both start and end positions atomically.

Parameters:
  • start – Start position (0-based, inclusive)

  • end – End position (0-based, inclusive)

Throws:

std::invalid_argument – if start > end

inline constexpr size_t get_end() const noexcept#

Get the end position (0-based, inclusive).

Returns:

End position

void serialize(std::ostream &os) const#

Serialize the interval to an output stream.

Writes the interval in binary format for persistence.

Parameters:

os – Output stream to write to

Public Static Functions

static inline constexpr bool overlaps(const interval &a, const interval &b)#

Determine if two intervals overlap.

Two intervals overlap if they share any positions in their ranges. Uses the standard range intersection test.

static inline constexpr interval aggregate(const interval &a, const interval &b)#

Aggregate two intervals into a bounding interval.

Returns the minimal bounding interval encompassing both inputs.

Note

Required by key_type_base concept for internal node construction

Parameters:
  • a – First interval

  • b – Second interval

Returns:

Bounding interval with min start and max end

static interval deserialize(std::istream &is)#

Deserialize an interval from an input stream.

Reads the interval from binary format and returns it.

Parameters:

is – Input stream to read from

Returns:

Deserialized interval

Public Static Attributes

static constexpr size_t INVALID_POSITION = std::numeric_limits<size_t>::max()#
static constexpr bool is_interval = true#

Indicates this is an interval type (enables interval-aware operations).

genomic_coordinate#

class genomic_coordinate#

Stranded genomic interval representing a region on a specific strand.

This class represents genomic intervals with start/end positions and strand information, satisfying the key_type_base concept for use in grove structures. It extends the basic interval type with strand-awareness, enabling strand-specific queries and operations.

Public Functions

inline constexpr genomic_coordinate()#

Default constructor creating an invalid coordinate (strand=’.’, start=0, end=0).

inline constexpr genomic_coordinate(char strand, std::size_t start, std::size_t end)#

Construct a genomic coordinate with specified strand and position.

Parameters:
  • strand – Strand indicator (‘+’, ‘-’, ‘.’, or ‘*’)

  • start – Starting position (0-based, inclusive)

  • end – Ending position (0-based, inclusive)

Throws:
  • std::invalid_argument – if strand is not one of ‘+’, ‘-’, ‘.’, ‘*’

  • std::invalid_argument – if start > end

~genomic_coordinate() = default#
inline constexpr bool operator<(const genomic_coordinate &other) const#

Less-than comparison using coordinate-first sorting.

Comparison order: start → end → strand (with strand order: * < . < + < -)

Parameters:

other – Coordinate to compare against

Returns:

true if this coordinate is less than other

inline constexpr bool operator>(const genomic_coordinate &other) const#

Greater-than comparison using coordinate-first sorting.

Parameters:

other – Coordinate to compare against

Returns:

true if this coordinate is greater than other

inline constexpr bool operator==(const genomic_coordinate &other) const#

Equality comparison (all three components must match).

Parameters:

other – Coordinate to compare against

Returns:

true if strand, start, and end are all equal

std::string to_string() const#

Convert coordinate to string representation.

Format: “strand:start-end” (e.g., “+:100-200”)

Note

Required by key_type_base concept for debugging/display

Returns:

String representation of the coordinate

inline constexpr char get_strand() const noexcept#

Get the strand indicator.

Returns:

Strand character (‘+’, ‘-’, ‘.’, or ‘*’)

inline constexpr std::size_t get_start() const noexcept#

Get the start position (0-based, inclusive).

Returns:

Start position

inline constexpr std::size_t get_end() const noexcept#

Get the end position (0-based, inclusive).

Returns:

End position

inline constexpr void set_strand(char strand)#

Set the strand indicator.

Parameters:

strand – Strand character (‘+’, ‘-’, ‘.’, or ‘*’)

Throws:

std::invalid_argument – if strand is not one of ‘+’, ‘-’, ‘.’, ‘*’

inline constexpr void set_range(std::size_t start, std::size_t end)#

Set both start and end positions atomically.

Parameters:
  • start – Start position (0-based, inclusive)

  • end – End position (0-based, inclusive)

Throws:

std::invalid_argument – if start > end

void serialize(std::ostream &os) const#

Serialize the genomic coordinate to an output stream.

Writes the coordinate in binary format for persistence.

Parameters:

os – Output stream to write to

Public Static Functions

static inline constexpr bool overlaps(const genomic_coordinate &a, const genomic_coordinate &b)#

Determine if two genomic coordinates overlap.

Overlap requires both spatial overlap AND strand compatibility:

  • Coordinates overlap if: a.start <= b.end AND b.start <= a.end

  • Strands must match exactly, EXCEPT wildcard ‘*’ matches any strand

static inline constexpr genomic_coordinate aggregate(const genomic_coordinate &a, const genomic_coordinate &b)#

Aggregate two coordinates into a bounding coordinate.

Returns the minimal bounding coordinate encompassing both inputs:

  • Start: minimum start position

  • End: maximum end position

  • Strand: ‘*’ (wildcard) if strands differ, otherwise common strand

Note

Required by key_type_base concept for internal node construction

Parameters:
  • a – First coordinate

  • b – Second coordinate

Returns:

Bounding coordinate with min start, max end, and merged strand

static genomic_coordinate deserialize(std::istream &is)#

Deserialize a genomic coordinate from an input stream.

Reads the coordinate from binary format and returns it.

Parameters:

is – Input stream to read from

Returns:

Deserialized genomic coordinate

Public Static Attributes

static constexpr bool is_interval = true#

Indicates this is an interval type (enables interval-aware operations).

key#

template<key_type_base key_t, typename data_t = void>
class key#

Wrapper class combining a key value with optional associated data.

This template class wraps a key_t (e.g., interval, genomic_coordinate, numeric) with an optional data_t payload. It serves as the fundamental storage unit in grove structures, enabling efficient indexing while maintaining arbitrary metadata.

Public Functions

inline key()#

Default constructor initializing both value and data with defaults.

Only available when both key_t and data_t are default-initializable.

Note

Constrained by requires clause - will not compile if types are not default-initializable

inline key()

Default constructor for the data-less form (data_t = void).

The requires clause above needs data_t to be default-initializable, which void is not — so key<T, void> needs its own default ctor. data is std::monostate here and is value-initialized implicitly.

inline explicit key(key_t kvalue)#

Construct a key with the specified key value.

When data_t is void: Creates a key with only the value. When data_t is non-void: Creates a key with value and default-constructed data.

Parameters:

kvalue – The key value (moved into the key)

template<typename D = data_t>
inline key(key_t key_value, D &&data_value)#

Construct a key with both key value and associated data.

Only available when data_t is not void (enforced by requires clause). Uses perfect forwarding to efficiently transfer the data value.

Note

This constructor only exists when data_t != void

Template Parameters:

D – data_t type (deduced, should match data_t)

Parameters:
  • key_value – The key value (moved into the key)

  • data_value – The associated data (forwarded)

key(const key&) = default#

Copy constructor (defaulted).

key(key&&) noexcept = default#

Move constructor (defaulted, noexcept).

key &operator=(const key&) = default#

Copy assignment operator (defaulted).

Returns:

Reference to this key

key &operator=(key&&) noexcept = default#

Move assignment operator (defaulted, noexcept).

Returns:

Reference to this key

~key() = default#

Destructor (defaulted).

inline const key_t &get_value() const noexcept#

Get the key value (const reference).

Returns:

Const reference to the underlying key_t value

inline void set_value(key_t new_value)#

Set the key value.

Parameters:

new_value – The new key value (moved)

template<typename D = data_t>
inline const D &get_data() const noexcept#

Get the associated data (const reference).

Only available when data_t is not void (enforced by requires clause). Provides read-only access to the associated data.

Note

This method only exists when data_t != void

Note

Returns by const reference for efficiency

Template Parameters:

D – data_t type (deduced, should match data_t)

Returns:

Const reference to the associated data

template<typename D = data_t>
inline D &get_data() noexcept#

Get mutable reference to associated data.

Only available when data_t is not void (enforced by requires clause). Allows in-place modification of the data without copying.

Note

This method only exists when data_t != void

Note

Useful for efficient in-place updates

Template Parameters:

D – data_t type (deduced, should match data_t)

Returns:

Mutable reference to the associated data

template<typename D = data_t>
inline void set_data(D new_data)#

Set the associated data.

Only available when data_t is not void (enforced by requires clause).

Note

This method only exists when data_t != void

Template Parameters:

D – data_t type (deduced, should match data_t)

Parameters:

new_data – The new data value (moved)

inline constexpr bool has_data() const noexcept#

Check if this key has associated data.

Compile-time constant determined by template parameter.

Returns:

true if data_t is not void, false otherwise

inline std::string to_string() const#

Convert key to string representation.

Delegates to the key_t’s to_string() method. Does not include data in the string representation.

Returns:

String representation of the key value

inline void serialize(std::ostream &os) const#

Serialize the key to an output stream.

Writes the key in binary format for persistence:

  • Always serializes the key_t value

  • Serializes data_t only when non-void

Uses type-specific serialization_traits for both key and data.

Note

Serialization format depends on serialization_traits specializations

Parameters:

os – Output stream to write to

inline bool operator==(const key &other) const#

Comparison operators.

Comparisons are delegated to the wrapped key_t value; data_t is treated as decoration and ignored. This matches the B+ tree’s notion of identity (the tree orders by value) and frees data_t from needing any comparison operators of its own. < and > are unconditionally available because the key_type_base concept already requires them on key_t.

Parameters:

other – key_t to compare against

inline bool operator<(const key &other) const#
inline bool operator>(const key &other) const#

Public Static Functions

static inline key deserialize(std::istream &is)#

Deserialize a key from an input stream.

Reads the key from binary format and reconstructs it:

  • Always deserializes the key_t value

  • Deserializes data_t only when non-void

Note

Must match the format written by serialize()

Note

Static method - creates and returns a new key

Parameters:

is – Input stream to read from

Returns:

Deserialized key object

query_result#

template<key_type_base key_t, typename data_t = void>
class query_result#

Container for query results holding matching keys and the original query.

This class stores the results of intersection/search operations performed on grove structures. It maintains both the original query and a collection of pointers to all keys that matched (overlapped with) the query.

Public Functions

inline explicit query_result(key_t query)#

Construct a query result with the specified query.

Initializes an empty result set for the given query. Keys are added later via add_key() as the search traverses the grove structure.

Parameters:

query – The query used for intersection (stored by value)

inline const key_t &get_query() const noexcept#

Get the original query that produced this result.

Returns a const reference to the query that was used to search the grove.

Returns:

Const reference to the query value

inline const std::vector<key<key_t, data_t>*> &get_keys() const#

Get all matching keys found by the query.

Returns a const reference to the vector of pointers to keys that overlapped with the query. The pointers reference keys owned by the grove and remain valid as long as the grove exists and the keys are not removed.

Note

Pointers remain valid as long as the grove is not modified

Note

Keys are stored in the order they were found during tree traversal

Returns:

Const reference to vector of pointers to matching keys (may be empty)

inline void add_key(key<key_t, data_t> *key)#

Add a matching key to the result set.

Appends a pointer to a matching key to the internal collection. This method is typically called internally by grove search operations as they traverse the tree structure.

Note

This is primarily an internal method used during grove traversal

Note

No ownership is transferred; the pointer is stored as-is

Parameters:

key – Pointer to a matching key (must not be nullptr)

flanking_query_result#

template<key_type_base key_t, typename data_t = void>
class flanking_query_result#

Result of a flanking-key query — the predecessor and successor of a query in the grove’s sort order, restricted to keys that do not overlap the query.

Returned by grove::flanking(). Either field may be null:

  • predecessor == nullptr if no key K satisfies K < query AND !overlaps(K, query)

  • successor == nullptr if no key K satisfies K > query AND !overlaps(K, query)

For interval-like keys (interval, genomic_coordinate), this corresponds to the key with the smallest gap distance to the query on each side. For scalar key types (numeric, kmer), it is the closest key by sort order on either side, excluding any key that satisfies overlaps() with the query.

Distance to a returned key is type-specific and computed by the caller from the key values (e.g., query.start - predecessor.end - 1 for closed-coord intervals; query.value - predecessor.value for numeric).

Public Functions

flanking_query_result() = default#

Default-construct with both flanking keys null.

inline key<key_t, data_t> *get_predecessor() const noexcept#

Get the predecessor: largest non-overlapping key less than the query.

Returns:

Pointer to the predecessor key, or nullptr if none exists

inline key<key_t, data_t> *get_successor() const noexcept#

Get the successor: smallest non-overlapping key greater than the query.

Returns:

Pointer to the successor key, or nullptr if none exists

inline void set_predecessor(key<key_t, data_t> *k) noexcept#

Set the predecessor pointer.

Used internally during traversal as candidates are discovered and improved.

Parameters:

k – Pointer to a key, or nullptr

inline void set_successor(key<key_t, data_t> *k) noexcept#

Set the successor pointer.

Parameters:

k – Pointer to a key, or nullptr

numeric#

class numeric#

Simple numeric (integer) key type for basic B+ tree operations.

This class wraps an integer value and satisfies the key_type_base concept, enabling use in grove structures as a simple ordered key without range semantics. Unlike interval types that represent ranges, numeric represents a single point value.

Public Functions

inline constexpr numeric()#

Default constructor initializing to INT_MIN.

Uses the minimum representable value as a sentinel so that max-based aggregation in internal nodes works correctly: any real value will be greater than the default.

Warning

A default-constructed numeric and a numeric holding the real value INT_MIN are indistinguishable, so numeric{} compares equal to (and overlaps) numeric{INT_MIN}. Don’t rely on the default being distinct from stored data.

inline explicit constexpr numeric(int value)#

Construct a numeric with the specified integer value.

Parameters:

value – Integer value to wrap

~numeric() = default#
inline constexpr bool operator<(const numeric &other) const#

Less-than comparison based on integer value.

Parameters:

other – Numeric to compare against

Returns:

true if this value is less than other’s value

inline constexpr bool operator>(const numeric &other) const#

Greater-than comparison based on integer value.

Parameters:

other – Numeric to compare against

Returns:

true if this value is greater than other’s value

inline constexpr bool operator==(const numeric &other) const#

Equality comparison based on integer value.

Parameters:

other – Numeric to compare against

Returns:

true if values are equal

std::string to_string() const#

Convert the numeric value to string representation.

Format: Simple integer string (e.g., “42”, “-7”)

Note

Required by key_type_base concept for debugging/display

Returns:

String representation of the value

inline constexpr int get_value() const noexcept#

Get the integer value.

Returns:

The wrapped integer value

inline constexpr void set_value(int value)#

Set the integer value.

Parameters:

value – New integer value

void serialize(std::ostream &os) const#

Serialize the numeric to an output stream.

Writes the value in binary format for persistence.

Parameters:

os – Output stream to write to

Public Static Functions

static inline constexpr bool overlaps(const numeric &a, const numeric &b)#

Determine if two numeric values overlap.

For point values, overlap occurs only when they are exactly equal. This differs from interval overlap which uses range intersection.

static inline constexpr numeric aggregate(const numeric &a, const numeric &b)#

Aggregate two numerics by returning the maximum.

Internal nodes store the maximum value in their subtree, allowing search operations to correctly traverse to child nodes.

Note

Required by key_type_base concept for internal node construction

Parameters:
  • a – First numeric

  • b – Second numeric

Returns:

The greater of the two values

static numeric deserialize(std::istream &is)#

Deserialize a numeric from an input stream.

Reads the value from binary format and returns it.

Parameters:

is – Input stream to read from

Returns:

Deserialized numeric

kmer#

class kmer#

K-mer key type for sequence-based B+ tree operations.

This class represents a k-mer (substring of length k from a DNA sequence) using a compact 2-bit encoding. It satisfies the key_type_base concept, enabling use in grove structures for k-mer indexing and membership queries.

Public Functions

inline constexpr kmer()#

Default constructor creating an empty k-mer (k=0).

explicit kmer(std::string_view sequence)#

Construct a k-mer from a DNA sequence string.

Converts the sequence to 2-bit encoding. Only A, C, G, T (case-insensitive) are valid characters.

Parameters:

sequence – DNA sequence (must contain only A, C, G, T)

Throws:
  • std::invalid_argument – if sequence contains invalid characters

  • std::invalid_argument – if sequence length exceeds 32

inline constexpr kmer(uint64_t encoding, uint8_t k)#

Construct a k-mer from a pre-computed encoding.

Parameters:
  • encoding – 2-bit encoded k-mer value

  • k – Length of the k-mer (1-32)

~kmer() = default#
inline constexpr bool operator<(const kmer &other) const#

Less-than comparison based on encoding value.

K-mers of different lengths are compared by length first, then by encoding. K-mers are compared by their encoding, which gives lexicographic ordering.

Parameters:

other – K-mer to compare against

Returns:

true if this k-mer is less than other

inline constexpr bool operator>(const kmer &other) const#

Greater-than comparison based on encoding value.

Parameters:

other – K-mer to compare against

Returns:

true if this k-mer is greater than other

inline constexpr bool operator==(const kmer &other) const#

Equality comparison (encoding and k must both match).

Parameters:

other – K-mer to compare against

Returns:

true if both encoding and k are equal

std::string to_string() const#

Convert the k-mer to its DNA sequence string.

Decodes the 2-bit encoding back to A, C, G, T characters.

Note

Required by key_type_base concept for debugging/display

Returns:

DNA sequence string of length k

inline constexpr uint64_t get_encoding() const noexcept#

Get the 2-bit encoding value.

Returns:

The encoded k-mer as a 64-bit integer

inline constexpr uint8_t get_k() const noexcept#

Get the k-mer length.

Returns:

The value of k (1-32)

void serialize(std::ostream &os) const#

Serialize the k-mer to an output stream.

Writes encoding and k in binary format for persistence.

Parameters:

os – Output stream to write to

Public Static Functions

static inline constexpr bool overlaps(const kmer &a, const kmer &b)#

Determine if two k-mers overlap.

For k-mers, overlap occurs only when they are exactly equal (same encoding and same k value).

static inline constexpr kmer aggregate(const kmer &a, const kmer &b)#

Aggregate two k-mers by returning the maximum.

Internal nodes store the maximum k-mer in their subtree for proper B+ tree navigation.

Note

Required by key_type_base concept for internal node construction

Parameters:
  • a – First k-mer

  • b – Second k-mer

Returns:

The greater of the two k-mers

static kmer deserialize(std::istream &is)#

Deserialize a k-mer from an input stream.

Reads encoding and k from binary format.

Parameters:

is – Input stream to read from

Returns:

Deserialized k-mer

static inline constexpr uint8_t encode_base(char base)#

Encode a single nucleotide to its 2-bit representation.

Parameters:

base – Nucleotide character (A, C, G, T - case insensitive)

Throws:

std::invalid_argument – if base is not A, C, G, or T

Returns:

2-bit encoding (0-3)

static inline constexpr char decode_base(uint8_t encoding)#

Decode a 2-bit value to its nucleotide character.

Parameters:

encoding – 2-bit encoding (0-3)

Returns:

Nucleotide character (A, C, G, or T)

static inline constexpr bool is_valid(std::string_view sequence)#

Check if a sequence contains only valid nucleotides.

Parameters:

sequence – DNA sequence to validate

Returns:

true if sequence contains only A, C, G, T (case insensitive)

Public Static Attributes

static constexpr uint8_t BASE_MASK = 0x03#
static constexpr uint8_t max_k = 32#

Maximum supported k-mer length (32 for uint64_t storage).

registry#

template<registry_value Key, typename Tag = void, typename Payload = Key>
class registry#

Singleton registry that interns values into small integer IDs.

Every distinct key gets one stable ID; calling intern() with the same key always returns the same ID. The point is to collapse many references to the same identity down to a 4-byte ID stored elsewhere — useful when the same identity appears thousands of times across grove entries.

First-write-wins on payload. When Payload != Key and a caller intern(k, p) against a key that is already present, the existing payload is preserved and the new p

is silently dropped. This matches the typical “first source has the canonical record; later sources may carry placeholder

fields” pattern (e.g. annotations sorted first, downstream entries reusing the id).

Each (Key, Tag, Payload) triple has its own singleton with an independent ID space. Use the Tag parameter when two unrelated pools share the same value type and must not collide:

using transcript_registry = registry<std::string, struct transcript_tag>;
using source_registry     = registry<std::string, struct source_tag>;

transcript_registry::instance().intern("ENST00000001"); // 0 in transcript pool
source_registry::instance().intern("HAVANA");           // 0 in source pool (separate)

Example (identity is the whole value):

auto& reg = registry<std::string>::instance();
uint32_t a = reg.intern("chr1");   // 0 (new)
uint32_t b = reg.intern("chr1");   // 0 (existing — deduplicated)
uint32_t c = reg.intern("chr2");   // 1 (new)
const std::string& s = reg.get(a); // "chr1"

Example (identity is a subset of the payload):

struct gene_info {
    std::string gene_name;
    std::string gene_biotype;
};
using gene_reg = registry<std::string, void, gene_info>;

auto id1 = gene_reg::instance().intern("ENSG001", {"FOO", "protein_coding"});
auto id2 = gene_reg::instance().intern("ENSG001", {"placeholder", ""});
// id1 == id2; the placeholder payload is dropped (first-write-wins).
const gene_info& g = gene_reg::instance().get(id1); // {"FOO", "protein_coding"}

Note

Thread safety: intern(), find(), clear(), serialize(), and deserialize() are protected by an internal mutex. get(), contains(), size(), empty() are unlocked fast paths. get(id) is safe under concurrent intern() iff the caller obtained id from a prior intern() that happens-before this thread (e.g. via thread join, mutex, atomic publication, queue). size()/empty()/contains() return best-effort snapshots under concurrent writes.

Note

Singleton lifetime: Data persists for program duration. Call reset() in tests to clear state between cases.

Template Parameters:
  • Key – The identity type used for deduplication. Must be hashable and equality-comparable.

  • Tag – Phantom type used only to discriminate singletons. Different Tag arguments produce distinct types with independent ID pools; the default void

    preserves the original “one

    singleton per (Key, Payload)” behavior.

    Tag never appears in the body — no storage, no serialization, no runtime cost.

  • Payload – The value type stored against each ID. Defaults to Key (the common case: identity is the whole value, like std::string intern pools). When Payload != Key, the registry stores Payload values keyed on Key — useful when identity is a subset of a larger record (e.g. gene_id keying a gene_info{ id, name, biotype } blob). No constraint on Payload itself; serialize() / deserialize() additionally require Payload to be readable/writable via serializer<Payload>.

Public Types

using id_type = uint32_t#

Type used for registry IDs.

Public Functions

registry(const registry&) = delete#
registry &operator=(const registry&) = delete#
registry(registry&&) = delete#
registry &operator=(registry&&) = delete#
~registry() = default#
inline id_type intern(const Key &key, const Payload &payload)#

Intern a (key, payload) pair, returning its stable ID.

Note

Idempotent on key: intern(k, _) always returns the same id for k.

Note

Thread-safe.

Parameters:
  • key – The identity used to deduplicate.

  • payload – The value to store under that identity.

Throws:

std::runtime_error – if the registry has reached maximum capacity.

Returns:

The ID for key. If key is already interned, returns the existing ID and silently drops (first-write-wins); otherwise allocates a new ID and stores payload.

inline id_type intern(const Key &value)#

Intern a value (single-arg form when key and payload are the same type).

Note

Only available when Key == Payload (the default). For Payload != Key, use the two-arg form.

Parameters:

value – The value to intern; used as both identity and payload.

Returns:

The ID for value.

inline std::optional<id_type> find(const Key &key) const#

Look up the ID for a key without inserting.

Note

Thread-safe.

Parameters:

key – The identity to look up.

Returns:

The ID if key is interned, std::nullopt otherwise.

inline const Payload &get(id_type id) const#

Get the payload for a given ID (const access).

Note

Unlocked. Safe under concurrent intern() iff id was obtained from an intern() that happens-before this call.

Parameters:

id – The ID returned from intern().

Throws:

std::out_of_range – if id is not a valid ID.

Returns:

Const reference to the stored payload.

inline bool contains(id_type id) const noexcept#

Check whether an ID refers to a valid entry.

Note

Unlocked best-effort read; size may be observed stale under concurrent writes.

Parameters:

id – The ID to check.

Returns:

true if valid, false otherwise.

inline std::size_t size() const noexcept#

Number of interned entries.

Note

Unlocked best-effort read under concurrent writes.

inline bool empty() const noexcept#

Whether the registry has any entries.

Note

Unlocked best-effort read under concurrent writes.

inline void clear()#

Clear all interned data.

Note

Primarily intended for testing; use with caution in production.

Note

Thread-safe.

Warning

Invalidates all previously returned IDs.

inline void serialize(std::ostream &os) const#

Serialize the registry to an output stream.

Note

Wire format depends on Key == Payload:

  • When Key == Payload (default): uint64_t count followed by each payload via serializer<Payload>. The lookup map is reconstructed on deserialize() by treating each payload as its own key. This matches the historical format.

  • When Key != Payload: uint64_t count followed by (key, payload) pairs written in ID order. Both serializer<Key> and serializer<Payload> are required.

Note

Thread-safe (acquires the mutex for a coherent snapshot).

Parameters:

os – Output stream to write to.

Public Static Functions

static inline registry &instance()#

Get the singleton instance for this type.

Note

Uses Meyer’s singleton pattern for thread-safe initialization

Returns:

Reference to the singleton registry instance

static inline void reset()#

Reset the singleton by clearing all data.

Note

Convenience for tests; equivalent to instance().clear().

static inline registry &deserialize(std::istream &is)#

Deserialize registry data from an input stream into the singleton.

Note

Replaces existing data on success; all previous IDs become invalid.

Note

Loaded entries keep their original IDs.

Note

Thread-safe.

Note

Strong exception guarantee: if the stream throws or contains truncated data, the singleton is left exactly as it was before the call. The new state is built into local containers and only move-assigned into the singleton after the read loop completes.

Parameters:

is – Input stream to read from.

Returns:

Reference to the singleton (now populated with deserialized data).

Public Static Attributes

static constexpr id_type null_id = std::numeric_limits<id_type>::max()#

Sentinel value representing an invalid/unset ID.

static constexpr bool key_is_payload = std::is_same_v<Key, Payload>#

True iff the key type equals the payload type (single-arg intern() form).

Serialization Utilities#

serialization_traits#

template<typename T>
struct serialization_traits#

Public Static Functions

static inline void serialize(std::ostream &os, const T &value)#
static inline T deserialize(std::istream &is)#

serializer#

template<typename T>
struct serializer#

Trait-based serialization dispatcher.

Dispatches serialization calls based on type capabilities:

  1. If type has member serialize()/static deserialize() → use those

  2. Otherwise → fall back to serialization_traits<T>

Template Parameters:

T – The type to serialize/deserialize

Public Static Functions

static inline void write(std::ostream &os, const T &value)#
static inline T read(std::istream &is)#