- Note
- Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.
To reason about the performance of generated code I've made a set of benchmarks around this message:
<sbe:message name="msg1" id="1">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="flat_group" id="10">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
</group>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
<group name="nested_group2" id="30">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
</group>
<data name="data" id="6" type="varDataEncoding"/>
</sbe:message>
They all use the same scenario: read all message fields in-order up to a certain point. For example, top_level_fields_benchmark
reads only 5 top-level fields, flat_group_benchmark
reads top-level fields and all fields in all entries of flat_group
and so on.
There are 4 different reading methods:
raw_reader
, a reader written by hand which uses pointer arithmetic and casts
sbepp_reader
, a reader which uses normal accessors of sbepp
generated code
sbepp_cursor_reader
, a reader which uses cursor-based accessors of sbepp
generated code
real_logic_reader
, a reader which uses code generated by RealLogic which provides a forward-only access
The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:
Fixed group size and data field size to 10
. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.
Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
Benchmark Time CPU Iterations
sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10 1515 ns 1515 ns 439318
sbepp_reader::flat_group_benchmark/1000/10/10/10/10 23784 ns 23783 ns 29424
sbepp_reader::nested_group_benchmark/1000/10/10/10/10 60434 ns 60431 ns 11524
sbepp_reader::nested_group2_benchmark/1000/10/10/10/10 580107 ns 580068 ns 1208
sbepp_reader::whole_message_benchmark/1000/10/10/10/10 822789 ns 822741 ns 848
sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10 1516 ns 1516 ns 462815
sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10 23767 ns 23765 ns 29446
sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10 59644 ns 59642 ns 11640
sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10 397326 ns 397305 ns 1732
sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10 412343 ns 412322 ns 1716
raw_reader::top_level_fields_benchmark/1000/10/10/10/10 1518 ns 1517 ns 460772
raw_reader::flat_group_benchmark/1000/10/10/10/10 23761 ns 23759 ns 29490
raw_reader::nested_group_benchmark/1000/10/10/10/10 62226 ns 62219 ns 11198
raw_reader::nested_group2_benchmark/1000/10/10/10/10 431421 ns 431394 ns 1617
raw_reader::whole_message_benchmark/1000/10/10/10/10 423216 ns 423194 ns 1654
real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10 1524 ns 1524 ns 462506
real_logic_reader::flat_group_benchmark/1000/10/10/10/10 23044 ns 23042 ns 30361
real_logic_reader::nested_group_benchmark/1000/10/10/10/10 60635 ns 60632 ns 11447
real_logic_reader::nested_group2_benchmark/1000/10/10/10/10 422053 ns 422028 ns 1642
real_logic_reader::whole_message_benchmark/1000/10/10/10/10 431510 ns 431489 ns 1642
Randomized group size in range [0; 20]
and data size in range [0; 32]
. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.
Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
Benchmark Time CPU Iterations
sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32 1520 ns 1520 ns 460833
sbepp_reader::flat_group_benchmark/1000/0/20/0/32 21984 ns 21983 ns 29613
sbepp_reader::nested_group_benchmark/1000/0/20/0/32 139916 ns 139912 ns 4900
sbepp_reader::nested_group2_benchmark/1000/0/20/0/32 1507963 ns 1507874 ns 481
sbepp_reader::whole_message_benchmark/1000/0/20/0/32 1818439 ns 1818343 ns 388
sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463569
sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32 22442 ns 22442 ns 30635
sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32 137442 ns 137438 ns 5036
sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32 1251388 ns 1251352 ns 540
sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32 1304626 ns 1304581 ns 538
raw_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463647
raw_reader::flat_group_benchmark/1000/0/20/0/32 22794 ns 22793 ns 29730
raw_reader::nested_group_benchmark/1000/0/20/0/32 137293 ns 137289 ns 4893
raw_reader::nested_group2_benchmark/1000/0/20/0/32 1307361 ns 1307269 ns 533
raw_reader::whole_message_benchmark/1000/0/20/0/32 1296803 ns 1296747 ns 544
real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32 1510 ns 1510 ns 463907
real_logic_reader::flat_group_benchmark/1000/0/20/0/32 23054 ns 23053 ns 30689
real_logic_reader::nested_group_benchmark/1000/0/20/0/32 141231 ns 141225 ns 5048
real_logic_reader::nested_group2_benchmark/1000/0/20/0/32 1301144 ns 1301107 ns 524
real_logic_reader::whole_message_benchmark/1000/0/20/0/32 1371855 ns 1371795 ns 539
We can see that when message structure is simple, like in top_level_fields_benchmark
and flat_group_benchmark
, there's no reason to use more complex cursor-based accessors. Even in nested_group_benchmark
there's no significant gain because a single data
member is not a big deal, computing it's length is a single memory read. Only starting from nested_group2_benchmark
cursor-based API really starts to shine since message structure becomes really complex at that point.