- Note
- Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.
To reason about the performance of generated code I've made a set of benchmarks around this message:
<sbe:message name="msg1" id="1">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="flat_group" id="10">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
</group>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
<group name="nested_group2" id="30">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
</group>
<data name="data" id="6" type="varDataEncoding"/>
</sbe:message>
They all use the same scenario: read all message fields in-order up to a certain point. For example, top_level_fields_benchmark reads only 5 top-level fields, flat_group_benchmark reads top-level fields and all fields in all entries of flat_group and so on.
There are 4 different reading methods:
raw_reader, a reader written by hand which uses pointer arithmetic and casts
sbepp_reader, a reader which uses normal accessors of sbepp generated code
sbepp_cursor_reader, a reader which uses cursor-based accessors of sbepp generated code
real_logic_reader, a reader which uses code generated by RealLogic which provides a forward-only access
The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:
Fixed group size and data field size to 10. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.
Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
Benchmark Time CPU Iterations
sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10 1515 ns 1515 ns 439318
sbepp_reader::flat_group_benchmark/1000/10/10/10/10 23784 ns 23783 ns 29424
sbepp_reader::nested_group_benchmark/1000/10/10/10/10 60434 ns 60431 ns 11524
sbepp_reader::nested_group2_benchmark/1000/10/10/10/10 580107 ns 580068 ns 1208
sbepp_reader::whole_message_benchmark/1000/10/10/10/10 822789 ns 822741 ns 848
sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10 1516 ns 1516 ns 462815
sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10 23767 ns 23765 ns 29446
sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10 59644 ns 59642 ns 11640
sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10 397326 ns 397305 ns 1732
sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10 412343 ns 412322 ns 1716
raw_reader::top_level_fields_benchmark/1000/10/10/10/10 1518 ns 1517 ns 460772
raw_reader::flat_group_benchmark/1000/10/10/10/10 23761 ns 23759 ns 29490
raw_reader::nested_group_benchmark/1000/10/10/10/10 62226 ns 62219 ns 11198
raw_reader::nested_group2_benchmark/1000/10/10/10/10 431421 ns 431394 ns 1617
raw_reader::whole_message_benchmark/1000/10/10/10/10 423216 ns 423194 ns 1654
real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10 1524 ns 1524 ns 462506
real_logic_reader::flat_group_benchmark/1000/10/10/10/10 23044 ns 23042 ns 30361
real_logic_reader::nested_group_benchmark/1000/10/10/10/10 60635 ns 60632 ns 11447
real_logic_reader::nested_group2_benchmark/1000/10/10/10/10 422053 ns 422028 ns 1642
real_logic_reader::whole_message_benchmark/1000/10/10/10/10 431510 ns 431489 ns 1642
Randomized group size in range [0; 20] and data size in range [0; 32]. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.
Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
Benchmark Time CPU Iterations
sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32 1520 ns 1520 ns 460833
sbepp_reader::flat_group_benchmark/1000/0/20/0/32 21984 ns 21983 ns 29613
sbepp_reader::nested_group_benchmark/1000/0/20/0/32 139916 ns 139912 ns 4900
sbepp_reader::nested_group2_benchmark/1000/0/20/0/32 1507963 ns 1507874 ns 481
sbepp_reader::whole_message_benchmark/1000/0/20/0/32 1818439 ns 1818343 ns 388
sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463569
sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32 22442 ns 22442 ns 30635
sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32 137442 ns 137438 ns 5036
sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32 1251388 ns 1251352 ns 540
sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32 1304626 ns 1304581 ns 538
raw_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463647
raw_reader::flat_group_benchmark/1000/0/20/0/32 22794 ns 22793 ns 29730
raw_reader::nested_group_benchmark/1000/0/20/0/32 137293 ns 137289 ns 4893
raw_reader::nested_group2_benchmark/1000/0/20/0/32 1307361 ns 1307269 ns 533
raw_reader::whole_message_benchmark/1000/0/20/0/32 1296803 ns 1296747 ns 544
real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32 1510 ns 1510 ns 463907
real_logic_reader::flat_group_benchmark/1000/0/20/0/32 23054 ns 23053 ns 30689
real_logic_reader::nested_group_benchmark/1000/0/20/0/32 141231 ns 141225 ns 5048
real_logic_reader::nested_group2_benchmark/1000/0/20/0/32 1301144 ns 1301107 ns 524
real_logic_reader::whole_message_benchmark/1000/0/20/0/32 1371855 ns 1371795 ns 539
We can see that when message structure is simple, like in top_level_fields_benchmark and flat_group_benchmark, there's no reason to use more complex cursor-based accessors. Even in nested_group_benchmark there's no significant gain because a single data member is not a big deal, computing it's length is a single memory read. Only starting from nested_group2_benchmark cursor-based API really starts to shine since message structure becomes really complex at that point.