Wednesday, April 8, 2015

What is really included in load average on Linux?

Everyone "knows" that load average = amount of runnable processes + processes blocked on I/O. While this may be true enough for a lot of use cases, it is incorrect.

The purpose of this article is to note briefly what is really counted, not to enumerate all possibilities.

First a short note that the kernel counts threads, not processes.

With this out of the way let's take a look a relevant comment (source):
 * Once every LOAD_FREQ:
 *
 *   nr_active = 0;
 *   for_each_possible_cpu(cpu)
 *      nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;

An alert reader may note that there are lies, damned lies, statistics and comments in the code. I have to agree, thus this requires validation.

A quick eye-grep reveals:

long calc_load_fold_active(struct rq *this_rq)
{
        long nr_active, delta = 0;

        nr_active = this_rq->nr_running;
        nr_active += (long) this_rq->nr_uninterruptible;

        if (nr_active != this_rq->calc_load_active) {
                delta = nr_active - this_rq->calc_load_active;
                this_rq->calc_load_active = nr_active;
        }

        return delta;
}
While not strictly sufficient, it's fine enough for this article.

So we know "threads blocked on I/O" is not the criterion here, but threads which contribute to nr_uninterruptible counter.

nr_uninterruptible represents threads in TASK_UNINTERRUPTIBLE state (which are not frozen, but what it means is beyond the scope of this article).

When can this happen?
  • while waiting for event completion (also used when dealing with I/O)
  • while trying to acquire a sleepable locking primitive such as a semaphore
Significance of this information is that when a server with abnormally high load (say > 1k on a 64-way machine) is encountered, people tend to think I/O is at fault here (e.g. dead nfs server), which very easily may be false. For instance one thread could take a semaphore for writing and block itself for some reason, and a lot of other threads started tripping over it while trying to take it for reading.

No comments:

Post a Comment