How Logging Affects Application Performance
If you’ve been exploring this site, you may know that I’m a fan of the Twelve-Factor app methodology. I’ve also been getting familiar with the OpenTelemetry specifications.
These two things overlap whem it comes to describe the use and importance of logging (see OpenTelementry and 12-Factor App).
The Twelve-Factor App says:
Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services.
OpenTelemetry says:
A log is a timestamped text record, either structured (recommended) or unstructured, with metadata… any data that is not part of a distributed trace or a metric is a log. For example, events are a specific type of log.
The aim for a non-trivial system should be to treat logs as a stream of structured records directed to the output stream of each process.
How streams behave
The vocabulary that is used when talking about data streams evoke the idea of flowing water and, in general, this is a useful metaphor.
However, while the term stream
might conjure up the idea of a gently flowing creek, it’s really the term pipe
that captures things most accurately.
A pipe has one end where water is coming in and one where water goes out. If you’re not letting the water out, there’s only so much you can push in. If you keep going, the water pressure goes up and up until no more can be added. To further build on the water metaphor, the term “backpressure” is used to describe the mechanism whereby a stream
can indicate that no more data should be passed in.
For data streams, the “water flowing in” represents the bytes being written and the “water flowing out” represents the bytes being read.
As a result of backpressure, the rate at which data can be written to a stream depends upon the rate at which data is being read from the stream.
To handle this a little more gracefully, a stream will generally have a buffer. A buffer acts as a storage tank in the pipe. If data is flowing in more quickly than its flowing out, the buffer will fill before backpressure starts being applied. Once the write rate drops below the read rate, the buffer can “drain” until its empty again.
Writes to a stream can be synchronous or asynchronous, depending on what the other end of the stream is connected to. From the NodeJS docs:
Window | POSIX | |
---|---|---|
Files | sync | sync |
Terminals | async | sync |
Pipes | sync | async |
Since logging events are written to a stream by application processes, and stream writes are constrained by stream reads, it follows that logging can have an impact on an applications performance.
So let’s look into that…
Benchmarking the impact of logging
In order to explore the impact of logging, I created a benchmarking script to test two different approaches.
The test function
I created a test
function that takes a logger
object to test, a number of cycles
to run and a logging payload
to pass to the logger.
The logger
object has a log
function, and onDone
function to set a handler when the logger
is finished write all its data, and a done
function that can be called to tell the logger
that no more data is coming..
I’m using the performance
object from the node:perf_hooks
module to perform our measurements.
To generate load on the CPU, I’m calling Math.log
for each cycle.
The code measures three performance metrics:
- loop: The time taken to execute all the iterations of the internal loop (from
start_loop
toend_loop
) - drain: The time taken from
logger.done
being called to the logger reporting that it’s done (i.e. the time taken to drain the logger’s buffer) (fromend_loop
toend
) - program: The time taken to execute the whole function (from
start
toend
)
The suite function
This test function is called by a suite function that calls test
repeatedly, gathers performance statistics and then generates averages data.
The suite function makes reps
calls of the test function, and provides the appropriate logger, cycle count and log payload being passed to the test
function.
The following test suites were run:
The loggers
I tested three “loggers”:
- A
no
logger - A
console
logger - A
child
logger
The no logger
The no
logger does nothing at all. Instead of logging the payload
, it just drops it.
This lets us see how the test runs when no logging is done
The console logger
The console
logger just uses console.log
to emit the payload
as a logging event.
Most NodeJS applications either use console.log
directly, or indirectly via some logging library.
For this logger, the done
and onDone
methods don’t do anything.
The child logger
I created a very simple logging library that uses a separate child process to handle the logging stream.
This relieves the application process from the burden of dealing with backpressure from the log stream.
This library creates a child process (see below) to do the actual writing of log entries to an output stream. Whenever something is logged, the library serializes it, if necessary, and then writes to the stdin
stream of the child process.
When done
is called, the library sets a flag (to prevent further writes to the log), creates a handler for the disconnect
event from the child process and then calls end
on the stdin
stream. By doing this, the library is telling the child process “Nothing else is coming,” and then waits for the child process to say “OK, I’m done with my work.”
This is what the child process looks like:
It is intentionally simple for test purposes. In a real system, you’d definitely need to add some error handling to make things more robust.
All this process does is reads from stdin
and puts it into a FIFO queue (messages
). It then proceeds to take items from the FIFO and write them to the stdout
stream that it inherited from the library.
Test environments
I ran these test suites in three different environments:
- In a normal terminal, with the output scrolling by:
node ./index.js
- In a normal terminal, with the output redirected to a file :
node ./index.js > results.out
- In a detached
screen
session:screen
, thennode ./index.js
, then Ctrl+a, d
The results
For each round, I calculated the minimum and maximum run times for a test
call, as well as a mean with outliers removed. Outliers were removed by calculating the interquartile range and only using values
where .
In the tables below, all times are in milliseconds and in the form:
The no logger
First, let’s look at how long it took to run the tests with the no
logger. This gives us a baseline on how much of a performance impact logging has.
Environment | Payload | Program | Loop | Drain |
---|---|---|---|---|
Terminal | string | |||
Terminal | object | |||
Redirect | string | |||
Redirect | object | |||
Screen | string | |||
Screen | object |
Unsurprisingly, the mean times are all pretty much the same. Since there’s no logging, there is no data being streamed that could slow things down.
The maximum time of 9.25ms for the Screen/object test is definitely an outlier and probably resulted from my laptop doing something else while I was running the test.
The console logger
Next, let’s look at the console
logger, which is what most folks use for their logging
Environment | Payload | Program | Loop | Drain |
---|---|---|---|---|
Terminal | string | |||
Terminal | object | |||
Redirect | string | |||
Redirect | object | |||
Screen | string | |||
Screen | object |
Perhaps the most glaring observation here is just how much logging slows things down. Now, before anyone gets carried away, this test is somewhat artificial, in that you would never really log every single iteration in a CPU-intensive workflow like this. However, it is worth taking a moment to note that logging does have an impact. However, in general, the benefit of logging far outweighs the cost.
Perhaps not unexpectedly, it takes a little longer to log objects, rather than strings. This is due to the cost of serialization. In these tests, serialization added a 7-14% penalty.
Finally, all of the execution time of the test is spent in the loop. Essentially no time spent draining the stream buffer. This is to be expected, since writes to STDOUT are synchronous for both writes to the terminal and writes to a file.
I was surprised to see that the slowest tests were the tests with redirected output. Generally, redirecting output sends data to a file, which is quicker than sending it to the screen. I don’t really have a hypothesis for why this happened here, but it’s not really important to the general point of this post.
The child logger
Now, let’s look at child
logger.
Environment | Payload | Program | Loop | Drain |
---|---|---|---|---|
Terminal | string | |||
Terminal | object | |||
Redirect | string | |||
Redirect | object | |||
Screen | string | |||
Screen | object |
First of all, we can see that the tests complete way faster with the child
logger than the console
logger. Perhaps this is to be expected, since we have two processes running, rather than one.
We can see that the time spent in the loop is much lower for the child
logger than the console
logger. However, we’re “paying” for this with a longer drain time.
What we can conclude from this is that the logging is less impactful for the CPU-intensive workload, but we are buffering the log entries in memory until we are able to write them to the output stream.
The conclusion
When it comes to logging, you will get much better performance, if you offload the work of logging to a child process of your application process.
There are, however, a few caveats here:
- There’s a little overhead to creating a child process with
fork
. For long running application processes (such as a server), this will be insignificant, but for short scripts, it’s simpler and quicker to just useconsole.log
. - Creating a child process brings the need to manage this process properly. The child process could fail, so failure recovery code is needed.
- Since the child process has its own internal buffer of pending log entries, it’s important that the application process does its best to let the child process shutdown gracefully so that pending log entries are not lost.
You may be wondering why I didn’t user Worker threads instead of a child process. Unfortunately, per the Node docs:
Workers utilize message passing via
<MessagePort>
` to implement interactions with stdio. This means that stdio output originating from a Worker can get blocked by synchronous code on the receiving end that is blocking the Node.js event loop.
Only by creating a separate process, can we be handling log entries while the application process does its thing.
Next steps
My next steps are to get started on implementing a logging library that uses a child process. I’ll be sure to add a link here, once it’s ready.
As for you: well, for most people, console.log
(or some library that wraps around it), is probably more than good enough.
However, as an application scales and the number of log entries generated per second grow, there may come a time that you need to reassess your choice and switch to a dedicated logging process. When that happens, check back here for my library 😊