Note: Benchmarks discussed in this section are only to reason about relative performance when compared to a hand-written code. Real-world performance heavily depends on a particular message structure and access pattern.

To reason about the performance of generated code I've made a set of benchmarks around this message:

<sbe:message name="msg1" id="1">
    <field name="field1" id="1" type="uint32"/>
    <field name="field2" id="2" type="uint32"/>
    <field name="field3" id="3" type="uint32"/>
    <field name="field4" id="4" type="uint32"/>
    <field name="field5" id="5" type="uint32"/>
 
    <group name="flat_group" id="10">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
    </group>
 
    <group name="nested_group" id="20">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
        <data name="data" id="6" type="varDataEncoding"/>
    </group>
 
    <group name="nested_group2" id="30">
        <field name="field1" id="1" type="uint32"/>
        <field name="field2" id="2" type="uint32"/>
        <field name="field3" id="3" type="uint32"/>
        <field name="field4" id="4" type="uint32"/>
        <field name="field5" id="5" type="uint32"/>
 
        <group name="nested_group" id="20">
            <field name="field1" id="1" type="uint32"/>
            <field name="field2" id="2" type="uint32"/>
            <field name="field3" id="3" type="uint32"/>
            <field name="field4" id="4" type="uint32"/>
            <field name="field5" id="5" type="uint32"/>
            <data name="data" id="6" type="varDataEncoding"/>
        </group>
    </group>
 
    <data name="data" id="6" type="varDataEncoding"/>
</sbe:message>

They all use the same scenario: read all message fields in-order up to a certain point. For example, top_level_fields_benchmark reads only 5 top-level fields, flat_group_benchmark reads top-level fields and all fields in all entries of flat_group and so on.
There are 4 different reading methods:

raw_reader, a reader written by hand which uses pointer arithmetic and casts
sbepp_reader, a reader which uses normal accessors of sbepp generated code
sbepp_cursor_reader, a reader which uses cursor-based accessors of sbepp generated code
real_logic_reader, a reader which uses code generated by RealLogic which provides a forward-only access

The idea was to compare performance of normal and cursor-based accessors to the code written by hand with a message of gradually increasing complexity. All the measurements were done for a pack of 1000 messages but using two different strategies:

Fixed group size and data field size to 10. Since all message have the same structure, this benchmark is quite stable and was used for the following analysis.

Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz

Benchmark Time CPU Iterations

sbepp_reader::top_level_fields_benchmark/1000/10/10/10/10 1515 ns 1515 ns 439318

sbepp_reader::flat_group_benchmark/1000/10/10/10/10 23784 ns 23783 ns 29424

sbepp_reader::nested_group_benchmark/1000/10/10/10/10 60434 ns 60431 ns 11524

sbepp_reader::nested_group2_benchmark/1000/10/10/10/10 580107 ns 580068 ns 1208

sbepp_reader::whole_message_benchmark/1000/10/10/10/10 822789 ns 822741 ns 848

sbepp_cursor_reader::top_level_fields_benchmark/1000/10/10/10/10 1516 ns 1516 ns 462815

sbepp_cursor_reader::flat_group_benchmark/1000/10/10/10/10 23767 ns 23765 ns 29446

sbepp_cursor_reader::nested_group_benchmark/1000/10/10/10/10 59644 ns 59642 ns 11640

sbepp_cursor_reader::nested_group2_benchmark/1000/10/10/10/10 397326 ns 397305 ns 1732

sbepp_cursor_reader::whole_message_benchmark/1000/10/10/10/10 412343 ns 412322 ns 1716

raw_reader::top_level_fields_benchmark/1000/10/10/10/10 1518 ns 1517 ns 460772

raw_reader::flat_group_benchmark/1000/10/10/10/10 23761 ns 23759 ns 29490

raw_reader::nested_group_benchmark/1000/10/10/10/10 62226 ns 62219 ns 11198

raw_reader::nested_group2_benchmark/1000/10/10/10/10 431421 ns 431394 ns 1617

raw_reader::whole_message_benchmark/1000/10/10/10/10 423216 ns 423194 ns 1654

real_logic_reader::top_level_fields_benchmark/1000/10/10/10/10 1524 ns 1524 ns 462506

real_logic_reader::flat_group_benchmark/1000/10/10/10/10 23044 ns 23042 ns 30361

real_logic_reader::nested_group_benchmark/1000/10/10/10/10 60635 ns 60632 ns 11447

real_logic_reader::nested_group2_benchmark/1000/10/10/10/10 422053 ns 422028 ns 1642

real_logic_reader::whole_message_benchmark/1000/10/10/10/10 431510 ns 431489 ns 1642
Randomized group size in range [0; 20] and data size in range [0; 32]. This one cannot be used to compare different reading approaches since message structure heavily changes and is only provided for a reference.

Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz

Benchmark Time CPU Iterations

sbepp_reader::top_level_fields_benchmark/1000/0/20/0/32 1520 ns 1520 ns 460833

sbepp_reader::flat_group_benchmark/1000/0/20/0/32 21984 ns 21983 ns 29613

sbepp_reader::nested_group_benchmark/1000/0/20/0/32 139916 ns 139912 ns 4900

sbepp_reader::nested_group2_benchmark/1000/0/20/0/32 1507963 ns 1507874 ns 481

sbepp_reader::whole_message_benchmark/1000/0/20/0/32 1818439 ns 1818343 ns 388

sbepp_cursor_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463569

sbepp_cursor_reader::flat_group_benchmark/1000/0/20/0/32 22442 ns 22442 ns 30635

sbepp_cursor_reader::nested_group_benchmark/1000/0/20/0/32 137442 ns 137438 ns 5036

sbepp_cursor_reader::nested_group2_benchmark/1000/0/20/0/32 1251388 ns 1251352 ns 540

sbepp_cursor_reader::whole_message_benchmark/1000/0/20/0/32 1304626 ns 1304581 ns 538

raw_reader::top_level_fields_benchmark/1000/0/20/0/32 1511 ns 1511 ns 463647

raw_reader::flat_group_benchmark/1000/0/20/0/32 22794 ns 22793 ns 29730

raw_reader::nested_group_benchmark/1000/0/20/0/32 137293 ns 137289 ns 4893

raw_reader::nested_group2_benchmark/1000/0/20/0/32 1307361 ns 1307269 ns 533

raw_reader::whole_message_benchmark/1000/0/20/0/32 1296803 ns 1296747 ns 544

real_logic_reader::top_level_fields_benchmark/1000/0/20/0/32 1510 ns 1510 ns 463907

real_logic_reader::flat_group_benchmark/1000/0/20/0/32 23054 ns 23053 ns 30689

real_logic_reader::nested_group_benchmark/1000/0/20/0/32 141231 ns 141225 ns 5048

real_logic_reader::nested_group2_benchmark/1000/0/20/0/32 1301144 ns 1301107 ns 524

real_logic_reader::whole_message_benchmark/1000/0/20/0/32 1371855 ns 1371795 ns 539

We can see that when message structure is simple, like in top_level_fields_benchmark and flat_group_benchmark, there's no reason to use more complex cursor-based accessors. Even in nested_group_benchmark there's no significant gain because a single data member is not a big deal, computing it's length is a single memory read. Only starting from nested_group2_benchmark cursor-based API really starts to shine since message structure becomes really complex at that point.