sbepp
|
In this section I'll try to gradually describe the structure of generated code. It's just a brief description, for detailed documentation see corresponding reference pages.
For each schema compiled by sbeppc
, there will be two main sources of information:
Sometimes, traits and representation types have functionality with common names, but it's very different in nature. For example, sbepp::message_traits<msg_tag>::size_bytes()
returns precomputed value based on message structure from the XML, and hence guaranteed to be valid only for the current schema version. On the other hand, sbepp::size_bytes(msg)
calculates message size based on the message buffer and the values it holds, it returns valid value even for newer schema versions because message representation type correctly handles schema extension.
It's possible to get a representation type from a tag using value_type
member of the traits (e.g. sbepp::message_traits<msg_tag>::value_type
), and vice versa, to get a tag from a representation type using sbepp::traits_tag_t
. These helpers can be used to avoid explicit mentioning of both tag and representation types at the same time by deducing one from another.
Here's the structure of generated code after compilation:
Here, detail
, schema
, messages
and types
are hardcoded and don't depend on schema names.
sbepp
preserves names of all schema entities without any modification. It means that messages, types, fields, etc. will have the same class/function names in the public part of the generated code. Of course standard C++ naming rules are still applied and usually you'll get an error from sbeppc
if schema uses wrong name.
Although original names are preserved in public interface, sometimes underlying implementation is located in detail
and a public alias is provided for it. Names from detail
should never be used explicitly but in case of error, compiler usually uses them in error message so it's useful to know how they are formed.
sbepp
tries to preserve schema names for everything but when it's not possible, class name is mangled like <original_name>_<N>
where N
is a number. Group entries have class names like <group_name>_entry
where group_name
is a potentially mangled group name. Tag types can be mangled as well and their names always match those of the corresponding implementation types.
For example, public encoding User
is always accessible as schema_name::types::User
but can actually be an alias to schema_name::detail::types::User_0
. Similar, its tag is schema_name::schema::types::User
but it can be an alias to schema_name::detail::schema::types::User_0
. When schema doesn't have a lot of repetitive names, looking at the trailing class name is enough to understand which schema entity it represents.
Most types generated by sbepp
have reference semantics. It means they are just pointers to the actual data which they don't own and don't manage in any way.
They usually contain a single pointer (with one additional pointer in Debug mode) and are cheap to pass by value. Creation of such an object usually invloves no actions/parsing except the pointer intialization. They are templates with Byte
template parameter which is a byte type (can be cv-qualified). Another consequence of reference semantics is that making object const
doesn't make underlying data const
, i.e., you can modify a message via const msg<char>
object. You need to use const
-qualified Byte
type to make a thing read-only.
There are helpers to access raw underlying data which are available for all reference semantics types: sbepp::size_bytes
, sbepp::size_bytes_checked
, sbepp::addressof
.
Non-array types, enums and sets are the only schema entities represented with value semantics types (including constants). They are small, 64 bits at most, types which behave like int
.
As was said before, client is responsible to ensure that provided buffer is enough to hold corresponding SBE data. To provide some sort of safety, sbepp
inserts assertions in many places. They check only the accessed data, not the whole SBE message or other entity. For example, if a message has 10 fields and provided buffer can only hold 5 of them, you'll get an assertion only when any of last 5 fields will be accessed. By default, these checks are controlled by NDEBUG
just as standard assert()
.
sbepp
doesn't make a distinction between encoding and decoding. It only provides SBE view of the provided buffer. Functions have no hidden side-effects beyond their main functionality. The only thing which is done implicitly is handling of SBE schema extension mechanism by respecting blockLength
. This has several consequences. First, be careful not to change things which affect the offset of the following already-written fields. For example, if you have dynamic-length fields data1
and data2
, don't change data1
's size after you filled data2
because its offset the data will become a garbage. Simple advice is to fill groups/data fields in-order. Second, when you encode a new message, you need to explicitly fill message/group header via sbepp::fill_message_header
or sbepp::fill_group_header
and it's better to do this as early as possible.
Messages and composites are two root things from which any work with SBE data starts. Like any reference semantics type, they are created from pointer and size:
There is a couple of helpers which deduce byte type for you, sbepp::make_view
and sbepp::make_const_view
. They are applicable to any reference semantics type except group entry.
The main purpose of messages, composites and group entries is of course to contain fields, their interface will be discussed later in the accessors section.
Since sbepp
provides only a message's view, message header should be filled explicitly via sbepp::fill_message_header
when a new message is created:
blockLength
is required to correctly interpret underlying data.You can also fill it by hand using sbepp::get_header
:
Typically, you need first to create a message header to check the type of the incoming message. Composite has the same form and construction approach as message:
In general, group has a container-like interface with iterators and other members you'd expect from standard container. There are two kinds of groups (message levels in general sense) which provide different interfaces:
std::vector
. See sbepp::detail::flat_group_base
for complete reference.sbepp::detail::nested_group_base
for complete reference.Because sbepp
provides only views, there are no functions like push_back
, there's nothing to push. To encode a group, one needs first to set its size, then access corresponding group entries. Similar to message header, group header has to filled explicitly either by sbepp::fill_group_header
, resize()
method or manually by sbepp::get_header
:
numInGroup
and blockLength
are used to correctly interpret the underlying data.Group entries have no special properties and normally are never created explicitly. See sbepp::detail::entry_base
for details if you are interested.
Variable-length arrays are represented using sbepp::detail::dynamic_array_ref
. This type works like a reference to a vector-like type.
According to SBE standard, strings stored inside data members never have terminating null character. Conversion from sbepp::detail::dynamic_array_ref
to a more string-specific type can be done using data()
and size()
methods:
There are 2 options for string assignments:
sbepp::detail::dynamic_array_ref::assign_string()
to assign a raw string pointer.sbepp::detail::dynamic_array_ref::assign_range()
, a generic method to assign from ranges. Since range requires just begin()
/end()
methods, most string-specific types satisfy this requirement.Example:
sbepp
treats all <type>
s with length != 1
(including 0
) as fixed-size arrays. They are implemented in terms of sbepp::detail::static_array_ref
which has std::span
-like interface with assignment helpers.
Assignment from a string can be done using sbepp::detail::static_array_ref::assign_string()
. It can handle both, raw string pointers and string ranges like std::string
or std::string_view
. As a second parameter it takes sbepp::eos_null eos_mode
that controls how to set trailing null bytes (if any). If a stored string is shorter than the array, SBE standard requires all the remaining bytes to be set to null and sbepp::eos_null::all
is the default argument for eos_mode
parameter. In practice, however, it's not always required because:
sbepp::eos_null::none
is enough to correctly encode a string.Example:
To convert sbepp::detail::static_array_ref
to a more string-specific type, string length has to be calculated explicitly because the stored string might occupy the entire array without having the terminating null character. There are two ways to do this:
sbepp::detail::static_array_ref::strlen()
, calculates string length by looking for the first null character from left to right.sbepp::detail::static_array_ref::strlen_r()
, calculates string length by looking for the first non-null character from right to left. This reversed approach might be useful when user expects that string end is closer to the end of the array than to its start. For it to work, it requires all padding bytes (if any) to be set to null.Example:
See sbepp::char_t
and sbepp::char_opt_t
for the example of required and optional type correspondingly.
sbepp::detail::required_base::in_range()
and sbepp::detail::optional_base::has_value()
, they don't enforce any checks on the underlying value.Enums are represented using scoped enumerations. For example:
is represented like:
In set representation, each choice
has a corresponding getter and setter, for example:
is represented like:
sbepp::visit_set
Constant accessors are represented via static
functions. Non-array constants return directly underlying value without any wrapper. Only <field>
can return it as enum type. Array-like constants (strings) are represented using sbepp::detail::static_array_ref
like a fixed-size array.
There are multiple entities which can hold fields: messages, group entries and composites. They all provide the same interface for accessors:
That is, for value semantics fields there are pair of getter and setter, for reference semantics fields there is only a getter which returns a view with the same byte type as its enclosing object. When byte type is const
-qualified, setters are not available.
Unlike cursor-based accessors, these "normal" accessors can be used in any order.
While normal accessors can be used in any order, this is not always efficient. Consider this message:
Here, access to field1/2/3
is still fast but access to the data1
is not. To get its offset we need:
blockLength
from nested_msg
header to get group
's offsetnumInGroup
from group
headerblockLength
from group
headergroup
entryblockLength
to get offset to data2
data2
lengthdata1
offsetMoreover, to access next data2
all that work has to be repeated! Now imagine if there were more groups and data in that message. It's a lot of work and current compilers can't optimize normal accessors well even when everything is accessed in order.
The way to solve it is to access things in a forward-only manner to avoid recalculation of the next field's offset each time from the message start. In this way, after we've read the group
, offset for data1
is ready for free. This is the core idea behind cursor-based API.
A cursor (sbepp::cursor<Byte>
) is just a pointer wrapper which is passed to field accessors as an additional parameter:
It's parameterized with Byte
type which has the same meaning as for other reference semantics types. Note that it can be more const
-qualified than the byte type of an enclosing view, setters are not available for such cursors.
By default, each field assumes that cursor points to the end of the previous field (or to the end of group/message header) so the offset to current field can be calculated efficiently (usually a no-op), then cursor is advanced using these rules:
field
moves cursor up by field's sizefield
moves cursor to the end of the block (calculated using blockLength
)group
or data
) of the message/entry unconditionally initializes cursor to the end of the block before using it. It means that it's possible to use uninitialized cursor with them but I don't recommend itgroup
moves cursor to the end of group's headerdata
moves cursor to the end of the data (data() + size()
)Cursor has to be initialized before the first usage, it can be done via sbepp::init_cursor
/sbepp::init_const_cursor
or by using sbepp::cursor_ops::init
or even by hand from provided pointer. Only messages and group entries provide cursor-based accessors (composites cannot contain variable-sized fields). Note that cursor-based and normal accessors return the same objects. Those object don't care how they were created.
Here's an example of how to read the above message:
Note that you need to use cursor_range()
to iterate over group entries. That's because now each entry is created from the cursor.
Here's how compiler will see it (simplified of course):
This approach is very efficient but the downside is that to access a field, you need to access all previous fields in their schema order.
To provide some sort of flexibility, there are various sbepp::cursor_ops
helpers which can control cursor's position. Check out their documentation for examples. Here, I only want to duplicate one tricky case from sbepp::cursor_ops::dont_move
, using cursor to write a data member:
Recall that data
accessors by-default move the cursor to data() + size()
so when done in a naive way:
at the time we access data
, its length can have any value (your best hope is message buffer initialized by 0
), thus, cursor will be moved to the unknown position and its furher use is unpredictable or even UB.
As you can see, using cursor-based API might be tricky and requires additional care. Most schemas I saw have only a single flat group and no data, for them normal accessors work great. I recommend to use cursors only for messages with complex structure or when you did a benchmark and know for sure that you'll benefit from it.