Friday, October 30, 2015

when strace fails to obtain syscall information

strace(1) (or truss(1) on BSDs) is a system call tracer. You may have seen threads waiting for various operations in the kernel which were successfully reported (e.g. open). Yet sometimes you attach to the target process and don't get any output. The boring answer is that no threads in the process are executing any syscalls and as a result there is nothing to report. But what if we can tell for sure at least one thread is executing a syscall or at least called one and is now blocked?

Let's see how strace works in the first place. The kernel provides a special interface: ptrace(2). It can be used to observe various actions of the target process and interact with it. In particular, it can be told to stop the target process on syscall entry and exit. Once it is stopped, the tracer can read the state and determine what syscall is being called and what arguments were provided. The key here is that the target process has to reach this code.

So how does strace manage to properly report a thread waiting for open? [1] The thread in such a state is in an interruptible sleep. It is woken up, goes all the way back to kernel<->userspace boundary where it executes ptrace bits and proceeds to re-executes the syscall.

For what threads will strace fail to obtain syscall information? Definitely ones blocked in an uninterruptible sleep as they cannot be woken up like that, and in effect can't go back to let the tracer do its thing. The other possibility is a thread actively executing code in the kernel - it does not sleep and there is no mechanism to tell it to go back to the boundary.

What to do for such threads? In most (not all!) cases it is possible to read kernel backtrace (/proc/<tid>/stack) and try to work out stuff from there.

As a final remark, not all threads entering the kernel are executing syscalls. A typical example is a page fault or floating point exception, none of which are reported by strace.

[1] Of course there is no guarantee that all open operations will be interruptible, but a popular example of waiting for the writer when opening a fifo is.