Key#
The key class is a wrapper that combines a key value (e.g., interval, genomic_coordinate, kmer) with optional associated data. It serves as the fundamental storage unit in grove structures, enabling efficient indexing while maintaining arbitrary metadata.
Template Parameters#
The key class takes two template parameters:
key_type: The core key value type (must satisfykey_type_baseconcept)data_type: Optional associated data type (default:voidfor keys without data)
// genogrove
#include <genogrove/data_type/key.hpp>
#include <genogrove/data_type/interval.hpp>
// STL
#include <iostream>
#include <sstream>
namespace gdt = genogrove::data_type;
// Key without data (data_type = void)
gdt::key<gdt::interval> k1{gdt::interval{100, 200}};
// Key with data
struct GeneInfo {
std::string name;
double score;
};
gdt::key<gdt::interval, GeneInfo> k2{
gdt::interval{100, 200},
GeneInfo{"BRCA1", 0.95}
};
Basic Usage#
#include <genogrove/data_type/key.hpp>
#include <genogrove/data_type/interval.hpp>
namespace gdt = genogrove::data_type;
struct GeneInfo {
std::string name;
double expression;
};
int main() {
// 1. Create a key with value and data
gdt::key<gdt::interval, GeneInfo> gene_key{
gdt::interval{100, 200},
GeneInfo{"BRCA1", 45.3}
};
// 2. Access the key value
const auto& interval = gene_key.get_value();
std::cout << interval.to_string() << "\n"; // "[100, 200]"
// 3. Access associated data (const)
const auto& info = gene_key.get_data();
std::cout << info.name << "\n"; // "BRCA1"
// 4. Modify data in place (mutable access)
gene_key.get_data().expression = 50.0;
// 5. Replace data entirely
gene_key.set_data(GeneInfo{"TP53", 32.1});
// 6. Replace the key value
gene_key.set_value(gdt::interval{300, 400});
// 7. Check if key has data (compile-time constant)
if (gene_key.has_data()) {
std::cout << "Key has associated data\n";
}
// 8. String representation (delegates to key_type)
std::cout << gene_key.to_string() << "\n"; // "[300, 400]"
return 0;
}
Keys Without Data#
When data_type is void, the key contains only the value with zero memory overhead:
#include <genogrove/data_type/key.hpp>
#include <genogrove/data_type/interval.hpp>
namespace gdt = genogrove::data_type;
int main() {
// Key without data - just the interval
gdt::key<gdt::interval> simple_key{gdt::interval{100, 200}};
// Access value works the same way
const auto& interval = simple_key.get_value();
// has_data() returns false (compile-time)
static_assert(!gdt::key<gdt::interval>::has_data());
// get_data() and set_data() do not compile - disabled at compile time
// simple_key.get_data(); // Error: method doesn't exist
return 0;
}
Comparison Operators#
Keys support equality and ordering comparisons. All comparisons are
value-only — the data payload is ignored. This matches the B+ tree’s
notion of identity (the tree orders and searches by value) and means
data_type is never required to be equality- or order-comparable.
#include <genogrove/data_type/key.hpp>
#include <genogrove/data_type/interval.hpp>
namespace gdt = genogrove::data_type;
int main() {
// Keys without data: compare values
gdt::key<gdt::interval> k1{gdt::interval{100, 200}};
gdt::key<gdt::interval> k2{gdt::interval{100, 200}};
gdt::key<gdt::interval> k3{gdt::interval{100, 300}};
k1 == k2; // true: same interval
k1 == k3; // false: different end
k1 != k3; // true (auto-generated from operator==)
k1 < k3; // true: ordered by interval
// Keys with data: data is ignored — only the value matters
gdt::key<gdt::interval, std::string> kd1{gdt::interval{100, 200}, "gene1"};
gdt::key<gdt::interval, std::string> kd2{gdt::interval{100, 200}, "gene2"};
kd1 == kd2; // true: same interval, different data is ignored
kd1 < kd2; // false: same interval ⇒ neither is less than the other
return 0;
}
Available operators:
operator==— value-equality. Requireskey_typeto satisfystd::equality_comparable;data_typehas no equality requirement.operator!=— auto-generated by the compiler fromoperator==.operator</operator>— value-ordering. Unconditionally available because thekey_type_baseconcept already requires<and>onkey_type. C++20 does not auto-generate<=/>=from</>, so callers that need them should spell!(a > b)/!(a < b).
Serialization#
Keys support binary serialization for persistence:
#include <genogrove/data_type/key.hpp>
#include <genogrove/data_type/interval.hpp>
#include <sstream>
namespace gdt = genogrove::data_type;
int main() {
gdt::key<gdt::interval, std::string> original{
gdt::interval{100, 200},
"gene1"
};
// Serialize to binary stream
std::ostringstream oss(std::ios::binary);
original.serialize(oss);
// Deserialize from binary stream
std::istringstream iss(oss.str(), std::ios::binary);
auto restored = gdt::key<gdt::interval, std::string>::deserialize(iss);
// restored == original
std::cout << restored.get_value().to_string() << "\n"; // "[100, 200]"
std::cout << restored.get_data() << "\n"; // "gene1"
return 0;
}
Using Keys with Grove#
The key class is the internal storage type used by grove. When you insert data into a grove, it creates keys internally:
#include <genogrove/structure/grove/grove.hpp>
#include <genogrove/data_type/interval.hpp>
namespace gdt = genogrove::data_type;
namespace gst = genogrove::structure;
int main() {
// Grove stores key<interval, std::string> internally
gst::grove<gdt::interval, std::string> my_grove(100);
// insert_data returns a pointer to the internal key
auto* key_ptr = my_grove.insert_data("chr1",
gdt::interval{100, 200},
"gene1");
// Access via key pointer
std::cout << key_ptr->get_value().to_string() << "\n"; // "[100, 200]"
std::cout << key_ptr->get_data() << "\n"; // "gene1"
// Query results return key pointers
auto results = my_grove.intersect(gdt::interval{150, 175}, "chr1");
for (auto* k : results.get_keys()) {
std::cout << k->get_value().to_string() << ": "
<< k->get_data() << "\n";
}
return 0;
}
Memory Optimization#
The key class uses several C++ techniques to minimize memory overhead:
Zero-overhead for void data: When
data_typeisvoid, the key storesstd::monostatewith[[no_unique_address]], resulting in zero additional memoryMove semantics: Constructors and setters use move semantics to avoid unnecessary copies
Compile-time method elimination: Methods like
get_data()don’t exist whendata_typeisvoid(usingrequiresclauses)
#include <genogrove/data_type/key.hpp>
#include <genogrove/data_type/interval.hpp>
namespace gdt = genogrove::data_type;
// Key without data has same size as interval alone
static_assert(sizeof(gdt::key<gdt::interval>) == sizeof(gdt::interval));
// Key with data adds only the data size
static_assert(sizeof(gdt::key<gdt::interval, int>) ==
sizeof(gdt::interval) + sizeof(int));
Key Features Summary:
get_value(),set_value(): Access/modify the key valueget_data(),set_data(): Access/modify associated data (only whendata_type != void)has_data(): Compile-time check for data presenceto_string(): String representation (delegates to key_type)serialize(os),deserialize(is): Binary persistenceoperator==,operator!=,operator<,operator>: Value-only comparison (data ignored)