sbepp
Loading...
Searching...
No Matches
Benchmarks

Note
Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.

To reason about the performance of generated code I've made a set of benchmarks around this message:

<sbe:message name="msg1" id="1">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="flat_group" id="10">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
</group>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
<group name="nested_group2" id="30">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<group name="nested_group" id="20">
<field name="field1" id="1" type="uint32"/>
<field name="field2" id="2" type="uint32"/>
<field name="field3" id="3" type="uint32"/>
<field name="field4" id="4" type="uint32"/>
<field name="field5" id="5" type="uint32"/>
<data name="data" id="6" type="varDataEncoding"/>
</group>
</group>
<data name="data" id="6" type="varDataEncoding"/>
</sbe:message>

They all use the same scenario: read all message fields in-order up to a certain point. For example, top_level_fields_benchmark reads only 5 top-level fields, flat_group_benchmark reads top-level fields and all fields in all entries of flat_group and so on.
There are 4 different reading methods:

  • raw_reader, a reader written by hand which uses pointer arithmetic and casts
  • sbepp_reader, a reader which uses normal accessors of sbepp generated code
  • sbepp_cursor_reader, a reader which uses cursor-based accessors of sbepp generated code
  • real_logic_reader, a reader which uses code generated by RealLogic which provides a forward-only access

The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:

  1. Fixed group size and data field size to 10. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.

    Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
    Benchmark Time CPU Iterations
    sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10 1515 ns 1515 ns 439318
    sbepp_reader::flat_group_benchmark/1000/10/10/10/10 23784 ns 23783 ns 29424
    sbepp_reader::nested_group_benchmark/1000/10/10/10/10 60434 ns 60431 ns 11524
    sbepp_reader::nested_group2_benchmark/1000/10/10/10/10 580107 ns 580068 ns 1208
    sbepp_reader::whole_message_benchmark/1000/10/10/10/10 822789 ns 822741 ns 848
    sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10 1516 ns 1516 ns 462815
    sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10 23767 ns 23765 ns 29446
    sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10 59644 ns 59642 ns 11640
    sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10 397326 ns 397305 ns 1732
    sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10 412343 ns 412322 ns 1716
    raw_reader::top_level_fields_benchmark/1000/10/10/10/10 1518 ns 1517 ns 460772
    raw_reader::flat_group_benchmark/1000/10/10/10/10 23761 ns 23759 ns 29490
    raw_reader::nested_group_benchmark/1000/10/10/10/10 62226 ns 62219 ns 11198
    raw_reader::nested_group2_benchmark/1000/10/10/10/10 431421 ns 431394 ns 1617
    raw_reader::whole_message_benchmark/1000/10/10/10/10 423216 ns 423194 ns 1654
    real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10 1524 ns 1524 ns 462506
    real_logic_reader::flat_group_benchmark/1000/10/10/10/10 23044 ns 23042 ns 30361
    real_logic_reader::nested_group_benchmark/1000/10/10/10/10 60635 ns 60632 ns 11447
    real_logic_reader::nested_group2_benchmark/1000/10/10/10/10 422053 ns 422028 ns 1642
    real_logic_reader::whole_message_benchmark/1000/10/10/10/10 431510 ns 431489 ns 1642
  2. Randomized group size in range [0; 20] and data size in range [0; 32]. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.

    Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
    Benchmark Time CPU Iterations
    sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32 1520 ns 1520 ns 460833
    sbepp_reader::flat_group_benchmark/1000/0/20/0/32 21984 ns 21983 ns 29613
    sbepp_reader::nested_group_benchmark/1000/0/20/0/32 139916 ns 139912 ns 4900
    sbepp_reader::nested_group2_benchmark/1000/0/20/0/32 1507963 ns 1507874 ns 481
    sbepp_reader::whole_message_benchmark/1000/0/20/0/32 1818439 ns 1818343 ns 388
    sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463569
    sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32 22442 ns 22442 ns 30635
    sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32 137442 ns 137438 ns 5036
    sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32 1251388 ns 1251352 ns 540
    sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32 1304626 ns 1304581 ns 538
    raw_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463647
    raw_reader::flat_group_benchmark/1000/0/20/0/32 22794 ns 22793 ns 29730
    raw_reader::nested_group_benchmark/1000/0/20/0/32 137293 ns 137289 ns 4893
    raw_reader::nested_group2_benchmark/1000/0/20/0/32 1307361 ns 1307269 ns 533
    raw_reader::whole_message_benchmark/1000/0/20/0/32 1296803 ns 1296747 ns 544
    real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32 1510 ns 1510 ns 463907
    real_logic_reader::flat_group_benchmark/1000/0/20/0/32 23054 ns 23053 ns 30689
    real_logic_reader::nested_group_benchmark/1000/0/20/0/32 141231 ns 141225 ns 5048
    real_logic_reader::nested_group2_benchmark/1000/0/20/0/32 1301144 ns 1301107 ns 524
    real_logic_reader::whole_message_benchmark/1000/0/20/0/32 1371855 ns 1371795 ns 539

We can see that when message structure is simple, like in top_level_fields_benchmark and flat_group_benchmark, there's no reason to use more complex cursor-based accessors. Even in nested_group_benchmark there's no significant gain because a single data member is not a big deal, computing it's length is a single memory read. Only starting from nested_group2_benchmark cursor-based API really starts to shine since message structure becomes really complex at that point.