Monday, June 27, 2016

when the kernel can kill a process

Common cases of the kernel killing something include the OOM killer, things like SIGSEGV/SIGBUS due to incorrect memory access or a prosaic signal sent by someone.

Let's take a look at less popular ones.

1. OOPS

If the kernel detects a problem with its state, it will print information about the problem. Depending on the particular inconsistency it may also decide to OOPS, which depending on the state of kernel.panic_on_oops will either crash or kill the thread which happened to be executing the code at the time.

Either way, an OOPS is typically an indication of a bug in the kernel itself.

2. failed execve

execve(2) is used to execute a new binary in the current process. Setting everything up is a complicated process with several failure points. Some of which are present after the original address space is destroyed. If any failure happens afterwards, there is no address space to return to and the kernel has no choice but to kill the process.

This is not a big issue - if the process was doing execve it was designated to either succeed and get the new image or exit indicating an error anyway.

3. failed copy on write of a non-transparent huge page

When a process forks, its memory pages are marked as copy on write. Then when either the child or the parent writes something, the target page is unshared. What follows is that another page is allocated.

hugepages are a special case and a much more limited resource. Interested parties can force their use through hugetlbfs.

If there are no free hugepages to use on copy on write, the kernel kills the child.


There are many more situations when such a kill can happen. People interested in the subject are welcome to grep the source for send_sig and investigate from there.