sbepp
Loading...
Searching...
No Matches
Schema representation

In this section I'll try to gradually describe the structure of generated code. It's just a brief description, for detailed documentation see corresponding reference pages.

Representation types vs. traits

For each schema compiled by sbeppc, there will be two main sources of information:

  • representation types, responsible for the actual SBE data encoding/decoding and proper representation of schema entities. Except constants, all the information they provide comes from binary data. These are described in more details below on this page.
  • traits, in combination with tags, they provide access to static/meta schema properties that, unlike representation types, always come from precomputed values obtained from schema XML.

Sometimes, traits and representation types have functionality with common names, but it's very different in nature. For example, sbepp::message_traits<msg_tag>::size_bytes() returns precomputed value based on message structure from the XML, and hence guaranteed to be valid only for the current schema version. On the other hand, sbepp::size_bytes(msg) calculates message size based on the message buffer and the values it holds, it returns valid value even for newer schema versions because message representation type correctly handles schema extension.

It's possible to get a representation type from a tag using value_type member of the traits (e.g. sbepp::message_traits<msg_tag>::value_type), and vice versa, to get a tag from a representation type using sbepp::traits_tag_t. These helpers can be used to avoid explicit mentioning of both tag and representation types at the same time by deducing one from another.


Namespaces

Here's the structure of generated code after compilation:

namespace schema_name{
struct schema{ // see "Traits" section
struct types;
struct messages;
};
namespace messages{
// messages are here...
class message_name;
}
namespace types{
// types are here...
class type_name;
}
namespace detail{} // implementation details, never use it directly
}

Here, detail, schema, messages and types are hardcoded and don't depend on schema names.


Schema names

sbepp preserves names of all schema entities without any modification. It means that messages, types, fields, etc. will have the same class/function names in the public part of the generated code. Of course standard C++ naming rules are still applied and usually you'll get an error from sbeppc if schema uses wrong name.

Names mangling

Note
Things described below are implementation details and are only provided to simplify understanding of compiler error messages. Don't rely on them directly.

Although original names are preserved in public interface, sometimes underlying implementation is located in detail and a public alias is provided for it. Names from detail should never be used explicitly but in case of error, compiler usually uses them in error message so it's useful to know how they are formed.

sbepp tries to preserve schema names for everything but when it's not possible, class name is mangled like <original_name>_<N> where N is a number. Group entries have class names like <group_name>_entry where group_name is a potentially mangled group name. Tag types can be mangled as well and their names always match those of the corresponding implementation types.

For example, public encoding User is always accessible as schema_name::types::User but can actually be an alias to schema_name::detail::types::User_0. Similar, its tag is schema_name::schema::types::User but it can be an alias to schema_name::detail::schema::types::User_0. When schema doesn't have a lot of repetitive names, looking at the trailing class name is enough to understand which schema entity it represents.


Semantics

Most types generated by sbepp have reference semantics. It means they are just pointers to the actual data which they don't own and don't manage in any way.

Note
It's client's responsibility to manage the lifetime and size of the underlying buffer.

They usually contain a single pointer (with one additional pointer in Debug mode) and are cheap to pass by value. Creation of such an object usually invloves no actions/parsing except the pointer intialization. They are templates with Byte template parameter which is a byte type (can be cv-qualified). Another consequence of reference semantics is that making object const doesn't make underlying data const, i.e., you can modify a message via const msg<char> object. You need to use const-qualified Byte type to make a thing read-only.
There are helpers to access raw underlying data which are available for all reference semantics types: sbepp::size_bytes, sbepp::size_bytes_checked, sbepp::addressof.

Non-array types, enums and sets are the only schema entities represented with value semantics types (including constants). They are small, 64 bits at most, types which behave like int.

Safety checks

As was said before, client is responsible to ensure that provided buffer is enough to hold corresponding SBE data. To provide some sort of safety, sbepp inserts assertions in many places. They check only the accessed data, not the whole SBE message or other entity. For example, if a message has 10 fields and provided buffer can only hold 5 of them, you'll get an assertion only when any of last 5 fields will be accessed. By default, these checks are controlled by NDEBUG just as standard assert().

See also
SBEPP_DISABLE_ASSERTS, SBEPP_ASSERT_HANDLER, SBEPP_ENABLE_ASSERTS_WITH_HANDLER

Encoding vs. decoding

sbepp doesn't make a distinction between encoding and decoding. It only provides SBE view of the provided buffer. Functions have no hidden side-effects beyond their main functionality. The only thing which is done implicitly is handling of SBE schema extension mechanism by respecting blockLength. This has several consequences. First, be careful not to change things which affect the offset of the following already-written fields. For example, if you have dynamic-length fields data1 and data2, don't change data1's size after you filled data2 because its offset the data will become a garbage. Simple advice is to fill groups/data fields in-order. Second, when you encode a new message, you need to explicitly fill message/group header via sbepp::fill_message_header or sbepp::fill_group_header and it's better to do this as early as possible.


Messages

Messages and composites are two root things from which any work with SBE data starts. Like any reference semantics type, they are created from pointer and size:

// check out base class documentation for more constructors
template<typename Byte>
public:
// field accessors...
};
std::array<std::byte, 64> buf;
schema_name::messages::msg<std::byte> m{buf.data(), buf.size()};
Base class for messages.
Definition sbepp.hpp:1749

There is a couple of helpers which deduce byte type for you, sbepp::make_view and sbepp::make_const_view. They are applicable to any reference semantics type except group entry.

std::array<std::byte, 64> buf;
// mutable message
auto m1 = sbepp::make_view<schema_name::messages::msg>(buf.data(), buf.size());
// read-only message
auto m2 =
constexpr View< Byte > make_view(Byte *ptr, const std::size_t size) noexcept
Construct view from memory buffer.
Definition sbepp.hpp:4893
constexpr View< typename std::add_const< Byte >::type > make_const_view(Byte *ptr, const std::size_t size) noexcept
Construct read-only view from memory buffer.
Definition sbepp.hpp:4918

The main purpose of messages, composites and group entries is of course to contain fields, their interface will be discussed later in the accessors section.

Encoding

Since sbepp provides only a message's view, message header should be filled explicitly via sbepp::fill_message_header when a new message is created:

std::array<std::byte, 64> buf;
auto m = sbepp::make_view<schema_name::messages::msg>(buf.data(), buf.size());
constexpr auto fill_message_header(Message m) noexcept -> decltype(m(detail::fill_message_header_tag{}))
Fill message header.
Definition sbepp.hpp:3986
Note
It's strongly recommended to fill message header as early as possible because its blockLength is required to correctly interpret underlying data.

You can also fill it by hand using sbepp::get_header:

std::array<std::byte, 64> buf;
auto m = sbepp::make_view<schema_name::messages::msg>(buf.data(), buf.size());
auto header = sbepp::get_header(m);
header.blockLength(10);
// ...
constexpr auto get_header(T v) noexcept -> decltype(v(detail::get_header_tag{}))
Returns the header of a message/group.
Definition sbepp.hpp:1607

Composites

Typically, you need first to create a message header to check the type of the incoming message. Composite has the same form and construction approach as message:

template<typename Byte>
class messageHeader : ::sbepp::detail::composite_base<Byte>{
public:
// field accessors...
};
std::array<char, 64> buf;
buf.data(), buf.size());
if(header.templateId()
{
// handle msg1
}
Base class for composites.
Definition sbepp.hpp:1740
static constexpr message_id_t id() noexcept
Returns id attribute.

Groups

In general, group has a container-like interface with iterators and other members you'd expect from standard container. There are two kinds of groups (message levels in general sense) which provide different interfaces:

  1. Flat - a group which has only a fixed-size fields, no groups or data members. Such a group is represented like a random-access container similar to std::vector. See sbepp::detail::flat_group_base for complete reference.
  2. Nested - a group with has other groups or data members. Due to its nature, it's represented as a forward-only container (it's very expensive to navigate over it in a random fashion). See sbepp::detail::nested_group_base for complete reference.

Encoding

Because sbepp provides only views, there are no functions like push_back, there's nothing to push. To encode a group, one needs first to set its size, then access corresponding group entries. Similar to message header, group header has to filled explicitly either by sbepp::fill_group_header, resize() method or manually by sbepp::get_header:

auto g = msg.group_name();
// set group size
auto group_size = 2;
sbepp::fill_group_header(g, group_size);
// fill entries
for(const auto entry : g)
{
entry.field(1);
}
// change size later if you need
g.clear();
g.resize(1);
constexpr auto fill_group_header(Group g, Size num_in_group) noexcept -> decltype(g(detail::fill_group_header_tag{}, num_in_group))
Fill group header.
Definition sbepp.hpp:4006
Note
As for message header, group header has to be filled as early as possible because its numInGroup and blockLength are used to correctly interpret the underlying data.

Group entries

Group entries have no special properties and normally are never created explicitly. See sbepp::detail::entry_base for details if you are interested.

template<typename Byte>
class entry : public sbepp::detail::entry_base<Byte>{
public:
// field accessors...
};
Base class for group entries.
Definition sbepp.hpp:1791

Data members

Variable-length arrays are represented using sbepp::detail::dynamic_array_ref. This type works like a reference to a vector-like type.

auto d = msg.data();
if(!d.empty())
{
std::cout.write(d.data(), d.size());
}
d.clear();
std::vector<char> v;
// fill somehow...
d.assign_range(v);

Strings

According to SBE standard, strings stored inside data members never have terminating null character. Conversion from sbepp::detail::dynamic_array_ref to a more string-specific type can be done using data() and size() methods:

auto d = msg.data();
std::string s1{d.data(), d.size()};
std::string_view s2{d.data(), d.size()};
// since C++23 it's even simpler using range constructor:
std::string_view s3{d};

There are 2 options for string assignments:

Example:

auto d = msg.data();
d.assign_string("hello");
d.assign_range(std::string{"abc"});
d.assign_range(std::string_view{"def"});

Fixed-size arrays

sbepp treats all <type>s with length != 1 (including 0) as fixed-size arrays. They are implemented in terms of sbepp::detail::static_array_ref which has std::span-like interface with assignment helpers.

auto array = msg.array();
std::cout.write(array.data(), array.size());

Strings

Assignment from a string can be done using sbepp::detail::static_array_ref::assign_string(). It can handle both, raw string pointers and string ranges like std::string or std::string_view. As a second parameter it takes sbepp::eos_null eos_mode that controls how to set trailing null bytes (if any). If a stored string is shorter than the array, SBE standard requires all the remaining bytes to be set to null and sbepp::eos_null::all is the default argument for eos_mode parameter. In practice, however, it's not always required because:

  • a single null character is enough for decoder to calculate string length.
  • underlying memory might be zero-initialized from the start, in this case, sbepp::eos_null::none is enough to correctly encode a string.

Example:

auto array = msg.array();
array.assign_string("abc");
array.assign_string(std::string{"abc"});

To convert sbepp::detail::static_array_ref to a more string-specific type, string length has to be calculated explicitly because the stored string might occupy the entire array without having the terminating null character. There are two ways to do this:

  • sbepp::detail::static_array_ref::strlen(), calculates string length by looking for the first null character from left to right.
  • sbepp::detail::static_array_ref::strlen_r(), calculates string length by looking for the first non-null character from right to left. This reversed approach might be useful when user expects that string end is closer to the end of the array than to its start. For it to work, it requires all padding bytes (if any) to be set to null.

Example:

auto array = msg.array();
std::string_view sv{array.data(), array.strlen()};

Optional/required types

See sbepp::char_t and sbepp::char_opt_t for the example of required and optional type correspondingly.

Note
While optional/required types provide methods like sbepp::detail::required_base::in_range() and sbepp::detail::optional_base::has_value(), they don't enforce any checks on the underlying value.

Enums

Enums are represented using scoped enumerations. For example:

<enum name="enumeration" encodingType="uint8">
<validValue name="One">1</validValue>
<validValue name="Two">2</validValue>
</enum>

is represented like:

enum class enumeration : std::uint8_t
{
One : 1,
Two : 2
};
See also
sbepp::to_underlying, sbepp::enum_to_string

Sets

In set representation, each choice has a corresponding getter and setter, for example:

<set name="bitset" encodingType="uint8">
<choice name="A">0</choice>
<choice name="B">2</choice>
</set>

is represented like:

class bitset : public ::sbepp::detail::bitset_base<::std::uint8_t>
{
public:
// check out base class documentation for inherited methods
using ::sbepp::detail::bitset_base<::std::uint8_t>::bitset_base;
// pair of getter and setter for each choice
constexpr bool A() const noexcept;
bitset& A(const bool v) noexcept;
constexpr bool B() const noexcept;
bitset& B(const bool v) noexcept;
};
Base class for bitsets.
Definition sbepp.hpp:2736
Warning
It's important to remember that sets have value semantics to avoid such mistakes:
// writing a message
schema_name::messages::msg<char> m; // initialize somehow
// does nothing!!! modifies a temporary returned by `m.bitset()`
m.bitset().A(true).B(true);
// also does nothing, assigns to a temporary `bool`
m.bitset().A() = true;
// correct way
m.bitset(schema_name::types::bitset{}.A(true).B(true));
// or, if storage was 0-initialized
m.bitset(m.bitset().A(true).B(true));
See also
sbepp::visit_set

Constants

Constant accessors are represented via static functions. Non-array constants return directly underlying value without any wrapper. Only <field> can return it as enum type. Array-like constants (strings) are represented using sbepp::detail::static_array_ref like a fixed-size array.

<composite name="constants">
<type name="num_const" primitiveType="uint32"
presence="constant">123</type>
<type name="str_const" primitiveType="char"
presence="constant">hello world</type>
</composite>
<sbe:message name="message" id="1">
<field name="enum_const" id="1" type="enumeration"
presence="constant" valueRef="enumeration.Two"/>
</sbe:message>
// somewhere inside `constants` class
static constexpr ::std::uint32_t num_const() noexcept;
static constexpr implementation_defined_alias str_const() noexcept;
// somewhere inside `message` class
static constexpr schema_name::types::enumeration enum_const() noexcept;

Field accessors

There are multiple entities which can hold fields: messages, group entries and composites. They all provide the same interface for accessors:

template<typename Byte>
class FieldContainer
{
public:
FieldRepresentation value_semantics_field();
void value_semantics_field(FieldRepresentation);
View<Byte> reference_semantics_field();
};

That is, for value semantics fields there are pair of getter and setter, for reference semantics fields there is only a getter which returns a view with the same byte type as its enclosing object. When byte type is const-qualified, setters are not available.
Unlike cursor-based accessors, these "normal" accessors can be used in any order.

schema_name::messages::msg<char> m; // init somehow
// value semantics types
auto required = m.required();
m.required(1);
auto optional = m.optional();
m.optional(2);
auto enumeration = m.enumeration();
m.enumeration(schema_name::types::my_enum::A);
auto set = m.set();
m.set(schema_name::types::my_set{}.choice(true));
// reference semantics types, they all have byte type `char`
auto c1 = m.composite();
auto c2 = m.composite().nested_composite();
auto g = m.group();
auto d = m.data();

Cursor-based accessors

Note
The API described in this section makes sense to use only for messages of a complex structure. I recommend always to do benchmarks before using it.

While normal accessors can be used in any order, this is not always efficient. Consider this message:

<sbe:message name="nested_msg" id="1">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<group name="group" id="2">
<field name="number" id="1" type="uint32"/>
<data name="data3" id="2" type="varDataEncoding"/>
</group>
<data name="data1" id="3" type="varDataEncoding"/>
<data name="data2" id="4" type="varDataEncoding"/>
</sbe:message>

Here, access to field1/2/3 is still fast but access to the data1 is not. To get its offset we need:

  • read blockLength from nested_msg header to get group's offset
  • read numInGroup from group header
  • read blockLength from group header
  • for each group entry
    • use blockLength to get offset to data2
    • read data2 length
  • finally, get data1 offset

Moreover, to access next data2 all that work has to be repeated! Now imagine if there were more groups and data in that message. It's a lot of work and current compilers can't optimize normal accessors well even when everything is accessed in order.

The way to solve it is to access things in a forward-only manner to avoid recalculation of the next field's offset each time from the message start. In this way, after we've read the group, offset for data1 is ready for free. This is the core idea behind cursor-based API.

Note
Cursor-based API is used for forward-only access

A cursor (sbepp::cursor<Byte>) is just a pointer wrapper which is passed to field accessors as an additional parameter:

template<typename Byte>
class FieldContainer
{
public:
template<typename Cursor>
FieldRepresentation value_semantics_field(Cursor&&);
template<typename Cursor>
void value_semantics_field(FieldRepresentation, Cursor&&);
template<typename Cursor>
View<Byte> reference_semantics_field(Cursor&&);
};

It's parameterized with Byte type which has the same meaning as for other reference semantics types. Note that it can be more const-qualified than the byte type of an enclosing view, setters are not available for such cursors.

By default, each field assumes that cursor points to the end of the previous field (or to the end of group/message header) so the offset to current field can be calculated efficiently (usually a no-op), then cursor is advanced using these rules:

  • field moves cursor up by field's size
  • last field moves cursor to the end of the block (calculated using blockLength)
  • first variable length member (group or data) of the message/entry unconditionally initializes cursor to the end of the block before using it. It means that it's possible to use uninitialized cursor with them but I don't recommend it
  • group moves cursor to the end of group's header
  • data moves cursor to the end of the data (data() + size())

Cursor has to be initialized before the first usage, it can be done via sbepp::init_cursor/sbepp::init_const_cursor or by using sbepp::cursor_ops::init or even by hand from provided pointer. Only messages and group entries provide cursor-based accessors (composites cannot contain variable-sized fields). Note that cursor-based and normal accessors return the same objects. Those object don't care how they were created.
Here's an example of how to read the above message:

auto m = sbepp::make_view<nested_msg>(buf.data(), buf.size());
auto c = sbepp::init_cursor(m);
m.field1(c);
m.field2(c);
m.field3(c);
for(const auto entry : m.group(c).cursor_range(c))
{
entry.number(c);
entry.data3(c);
}
m.data1(c);
m.data2(c);
constexpr cursor< byte_type_t< View > > init_cursor(View view) noexcept
Initializes cursor from a message/group view with the same byte type.
Definition sbepp.hpp:2846

Note that you need to use cursor_range() to iterate over group entries. That's because now each entry is created from the cursor.

Here's how compiler will see it (simplified of course):

char* ptr = msg_start;
auto field1 = *reinterpret_cast<field1_type*>(ptr);
ptr += sizeof(field1);
auto field2 = *reinterpret_cast<field2_type*>(ptr);
ptr += sizeof(field2);
// ...

This approach is very efficient but the downside is that to access a field, you need to access all previous fields in their schema order.
To provide some sort of flexibility, there are various sbepp::cursor_ops helpers which can control cursor's position. Check out their documentation for examples. Here, I only want to duplicate one tricky case from sbepp::cursor_ops::dont_move, using cursor to write a data member:

auto c = sbepp::init_cursor(msg);
auto d = msg.data(sbepp::cursor_ops::dont_move(c));
d.resize(1); // set data size somehow
msg.data(c); // advance the cursor
constexpr detail::dont_move_cursor_wrapper< Byte > dont_move(cursor< Byte > &c) noexcept
Returns a wrapper which doesn't advance the cursor when it's used.
Definition sbepp.hpp:1694

Recall that data accessors by-default move the cursor to data() + size() so when done in a naive way:

auto d = msg.data(c);
d.resize(1);

at the time we access data, its length can have any value (your best hope is message buffer initialized by 0), thus, cursor will be moved to the unknown position and its furher use is unpredictable or even UB.

As you can see, using cursor-based API might be tricky and requires additional care. Most schemas I saw have only a single flat group and no data, for them normal accessors work great. I recommend to use cursors only for messages with complex structure or when you did a benchmark and know for sure that you'll benefit from it.