Generating AWS Embedded Metrics Format in C++

Amazon Web Services introduced^[ref] the Embedded Metrics Format (from now on referred to as EMF) which allows for metrics generation out of the log stream.

For AWS Lambda Serverless functions this is made even easier by just using the standard output stream, allowing for asynchronous ingestion of log data without any additional network requests

There are various libraries available for different languages, but I have not found one for C++ to be used in combination with the AWS Lambda Custom Runtime, so I decided to write one.

Generating EMF Messages

The example creates a EMG logger for the metrics namespace test_ns with one metric of type Count and two dimensions, one with a dynamic value, requestId and one with a static value dimension, called function_name

Once the scope of the logger variable is left and the object is destroyed, it flushes the message to the default message sink which is the standard output sink.

The values for the metrics and dynamic dimensions can either be set using an index, matching the position of the template, or by the name used in the logger setup.


emf_logger<"test_ns",
                emf_metrics<
                        emf_metric<"my_counter", Aws::CloudWatch::Model::StandardUnit::Count>>,
                emf_dimensions<
                        emf_dimension<"requestId">,
                        emf_static_dimension<"function_name", "my_lambda_fun">>> logger;

...

logger.metric_value_by_name<"my_counter">(12);

// Or
// logger.put_metrics_value<0>(12);

...

logger.dimension_value_by_name<"requestId">(request.id());

Message Sink

The message sink is the types used to send the log metric messages and are defined by the concept:


template concept emf_msg_sink_c = requires(S sink, const nlohmann::json& data) {
    { sink.sink(data) };

    { sink.generate() } -> std::same_as<bool>;
};

Currently there are 2 sinks available, the standard output sink and a null sink that will silently drop all messages. More Sinks could be supported for non AWS Lambda environments which require the use of the CloudWatch API to send those messages.

Dimensions

To accommodate "multiple" parameter packs the dimensions and metrics are wrapped into wrapper types for each. The individual dimensions are setup by using the types defined by the concept:

The compiler will also ensure there are no more then 9 dimensions specified as per the AWS EMF specification.


template concept emf_dimension_c = requires(D dimension) {

    { dimension.name() } -> std::same_as<std::string_view>;

    { dimension.value() } -> std::same_as<std::string_view>;
};

Currently there are 2 implementations available of this concept, the first one with a static value that is set at compile time and the second one with a dynamic value that can be set at runtime.

Metrics

The metrics are setup using the following interface as defined by the concept:


template concept emf_metric_c = requires (M metric, typename M::type value){

    { metric.name() } -> std::same_as;
    { metric.unit_name() } -> std::same_as;

    { metric.put_value(value) };

    { metric.size() } -> std::same_as;
};

The metrics are configured through 3 parameters, its name, its CloudWatch metric type and C++ data type, which is defaulted to double.

If you call the put_value() method multiple times (through the Loggers interface), the output generates a JSON array rather than a single numeric value. EMF has a limit of 100 elements per metric array. If that limit is exceeded, the library will create multiple messages with blocks of 100 values, or the remainder for the last message.

Benchmarks

To check on the performance of this, I ran catch2^[ref] benchmarks on an Intel Core i7-8550U CPU at 1.80GHz with 40GB of RAM and an M.2 SSD. The first set of benchmarks is produced without actually generating the message output and are optimised release builds on gcc 10.2 with C++20 standard settings.

The first benchmark uses a single integer metric with a single value and no output:



benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single metric, no output                       100          1093     2.6232 ms 
                                        23.2795 ns     23.114 ns    23.6399 ns 
                                        1.18915 ns   0.670797 ns    2.37045 ns

The next benchmark uses two metrics which are set via the name lookup:



benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Metric By Name, no output                      100           577     2.6542 ms 
                                        44.5421 ns    43.8828 ns     46.222 ns 
                                        4.79016 ns   0.696764 ns    9.05731 ns

Generating 150 metric values into a single metric using the index of the metric:


...............................................................................

benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
150 Metrics, no output                         100            73     2.7594 ms 
                                        391.399 ns    386.147 ns    399.212 ns 
                                        32.4228 ns    23.6457 ns    43.2131 ns

And then all three benchmarks again, this time generating the output into a string


...............................................................................

benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
Single metric                                  100             5      3.191 ms 
                                        5.86022 us    5.78607 us    6.16137 us 
                                        656.242 ns    108.839 ns    1.53253 us 
                                                                               
Metric By Name                                 100             4     3.2084 ms 
                                        8.40916 us    8.24599 us    8.68389 us 
                                        1.06261 us     738.91 ns    1.65966 us 
                                                                               
150 Metrics                                    100             2      4.648 ms 
                                        21.8884 us    21.5728 us    22.4238 us 
                                        2.04818 us    1.33857 us    3.20579 us

Closing Notes

Currently the library uses Niels Lohmann JSON library^[ref] to generate the output. As a future performance enhancement it could be considered to use just direct stream output without generating an intermediate JSON data structure.

For using "named" string template parameters, I use this utility class that was inspired by a stackoverflow comment:



template<int N> struct named {

    constexpr named(char const (&s)[N]) {
        std::copy_n(s, N, this->m_elems);
    }

    constexpr auto operator<=>(named const&) const = default;

    constexpr const char* name() const {
        return &m_elems[0];
    }

    /**
     * Contained Data
     */
    char m_elems[N];

};
template<int N> named(char const(&)[N])->named<N>;