# Key The `key` class is a wrapper that combines a key value (e.g., interval, genomic_coordinate, kmer) with optional associated data. It serves as the fundamental storage unit in grove structures, enabling efficient indexing while maintaining arbitrary metadata. ## Template Parameters The key class takes two template parameters: - `key_type`: The core key value type (must satisfy `key_type_base` concept) - `data_type`: Optional associated data type (default: `void` for keys without data) ```cpp // genogrove #include #include // STL #include #include namespace gdt = genogrove::data_type; // Key without data (data_type = void) gdt::key k1{gdt::interval{100, 200}}; // Key with data struct GeneInfo { std::string name; double score; }; gdt::key k2{ gdt::interval{100, 200}, GeneInfo{"BRCA1", 0.95} }; ``` ## Basic Usage ```cpp #include #include namespace gdt = genogrove::data_type; struct GeneInfo { std::string name; double expression; }; int main() { // 1. Create a key with value and data gdt::key gene_key{ gdt::interval{100, 200}, GeneInfo{"BRCA1", 45.3} }; // 2. Access the key value const auto& interval = gene_key.get_value(); std::cout << interval.to_string() << "\n"; // "[100, 200]" // 3. Access associated data (const) const auto& info = gene_key.get_data(); std::cout << info.name << "\n"; // "BRCA1" // 4. Modify data in place (mutable access) gene_key.get_data().expression = 50.0; // 5. Replace data entirely gene_key.set_data(GeneInfo{"TP53", 32.1}); // 6. Replace the key value gene_key.set_value(gdt::interval{300, 400}); // 7. Check if key has data (compile-time constant) if (gene_key.has_data()) { std::cout << "Key has associated data\n"; } // 8. String representation (delegates to key_type) std::cout << gene_key.to_string() << "\n"; // "[300, 400]" return 0; } ``` ## Keys Without Data When `data_type` is `void`, the key contains only the value with zero memory overhead: ```cpp #include #include namespace gdt = genogrove::data_type; int main() { // Key without data - just the interval gdt::key simple_key{gdt::interval{100, 200}}; // Access value works the same way const auto& interval = simple_key.get_value(); // has_data() returns false (compile-time) static_assert(!gdt::key::has_data()); // get_data() and set_data() do not compile - disabled at compile time // simple_key.get_data(); // Error: method doesn't exist return 0; } ``` ## Comparison Operators Keys support equality and ordering comparisons. **All comparisons are value-only — the `data` payload is ignored.** This matches the B+ tree's notion of identity (the tree orders and searches by `value`) and means `data_type` is never required to be equality- or order-comparable. ```cpp #include #include namespace gdt = genogrove::data_type; int main() { // Keys without data: compare values gdt::key k1{gdt::interval{100, 200}}; gdt::key k2{gdt::interval{100, 200}}; gdt::key k3{gdt::interval{100, 300}}; k1 == k2; // true: same interval k1 == k3; // false: different end k1 != k3; // true (auto-generated from operator==) k1 < k3; // true: ordered by interval // Keys with data: data is ignored — only the value matters gdt::key kd1{gdt::interval{100, 200}, "gene1"}; gdt::key kd2{gdt::interval{100, 200}, "gene2"}; kd1 == kd2; // true: same interval, different data is ignored kd1 < kd2; // false: same interval ⇒ neither is less than the other return 0; } ``` Available operators: - `operator==` — value-equality. Requires `key_type` to satisfy `std::equality_comparable`; `data_type` has no equality requirement. - `operator!=` — auto-generated by the compiler from `operator==`. - `operator<` / `operator>` — value-ordering. Unconditionally available because the `key_type_base` concept already requires `<` and `>` on `key_type`. C++20 does **not** auto-generate `<=` / `>=` from `<` / `>`, so callers that need them should spell `!(a > b)` / `!(a < b)`. ## Serialization Keys support binary serialization for persistence: ```cpp #include #include #include namespace gdt = genogrove::data_type; int main() { gdt::key original{ gdt::interval{100, 200}, "gene1" }; // Serialize to binary stream std::ostringstream oss(std::ios::binary); original.serialize(oss); // Deserialize from binary stream std::istringstream iss(oss.str(), std::ios::binary); auto restored = gdt::key::deserialize(iss); // restored == original std::cout << restored.get_value().to_string() << "\n"; // "[100, 200]" std::cout << restored.get_data() << "\n"; // "gene1" return 0; } ``` ## Using Keys with Grove The key class is the internal storage type used by grove. When you insert data into a grove, it creates keys internally: ```cpp #include #include namespace gdt = genogrove::data_type; namespace gst = genogrove::structure; int main() { // Grove stores key internally gst::grove my_grove(100); // insert_data returns a pointer to the internal key auto* key_ptr = my_grove.insert_data("chr1", gdt::interval{100, 200}, "gene1"); // Access via key pointer std::cout << key_ptr->get_value().to_string() << "\n"; // "[100, 200]" std::cout << key_ptr->get_data() << "\n"; // "gene1" // Query results return key pointers auto results = my_grove.intersect(gdt::interval{150, 175}, "chr1"); for (auto* k : results.get_keys()) { std::cout << k->get_value().to_string() << ": " << k->get_data() << "\n"; } return 0; } ``` ## Memory Optimization The key class uses several C++ techniques to minimize memory overhead: - **Zero-overhead for void data**: When `data_type` is `void`, the key stores `std::monostate` with `[[no_unique_address]]`, resulting in zero additional memory - **Move semantics**: Constructors and setters use move semantics to avoid unnecessary copies - **Compile-time method elimination**: Methods like `get_data()` don't exist when `data_type` is `void` (using `requires` clauses) ```cpp #include #include namespace gdt = genogrove::data_type; // Key without data has same size as interval alone static_assert(sizeof(gdt::key) == sizeof(gdt::interval)); // Key with data adds only the data size static_assert(sizeof(gdt::key) == sizeof(gdt::interval) + sizeof(int)); ``` **Key Features Summary:** - `get_value()`, `set_value()`: Access/modify the key value - `get_data()`, `set_data()`: Access/modify associated data (only when `data_type != void`) - `has_data()`: Compile-time check for data presence - `to_string()`: String representation (delegates to key_type) - `serialize(os)`, `deserialize(is)`: Binary persistence - `operator==`, `operator!=`, `operator<`, `operator>`: Value-only comparison (data ignored)