Wednesday, January 13, 2016

can strace affect traced processes?

Please read "when strace fails to obtain syscall" first if you don't know how strace works.

Can strace/truss/whatever usage change the result of running a program?

Using ptrace introduces quite a lot of overhead (i.e. it slows things down). But besides that, can strace usage result in changing what the process ends up doing?

Yes it can and the key lies in waking up threads in an interruptible sleep. Note that you can count yourself unlucky if you encounter any of things below. strace is perfectly fine to use most of the time.

Depending on the code which put the thread to sleep, the thread will:
- do something without going back to the syscall boundary
- go back to the boundary and restart the syscall
- exit the kernel with EINTR

All of these are invasive in principle, but most of the time harmless.

One funny example of EINTR interferring with troubleshooting was few years back, when nginx would just hang around doing nothing when it was supposed to shutdown. Attaching to it revealed it was blocked in epoll_wait, which now returned with EINTR and that prompted nginx to exit.

Let's see a demo for the second case - automagic restart.

We are going to create a named pipe. The thread opening it for reading will block until it gets opened for writing (writers are blocked in the same manner).

shell1$ mkfifo fifo
shell1$ cat > fifo

shell2$ cat fifo

Stuff typed into the first cat should be read by the second cat.

Now consider:

shell1$ mkfifo fifo
shell1$ cat > fifo

shell2$ rm fifo 
shell2$ mkfifo fifo
shell2$ cat fifo

"cat > fifo" results in the shell trying to open the file, before it can proceed to execve cat. It got blocked:
[<ffffffff81226480>] pipe_wait+0x70/0xc0
[<ffffffff81226501>] wait_for_partner+0x31/0x70
[<ffffffff81226f60>] fifo_open+0x1b0/0x310
[<ffffffff8121bbff>] do_dentry_open+0x1ff/0x2f0
[<ffffffff8121d096>] vfs_open+0x56/0x60
[<ffffffff8122c0c4>] path_openat+0x1e4/0x12a0
[<ffffffff8122e34a>] do_filp_open+0x8a/0x100
[<ffffffff8121d45a>] do_sys_open+0x13a/0x230
[<ffffffff8121d56e>] SyS_open+0x1e/0x20
[<ffffffff817795ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff


But the name got unlinked and the other cat is waiting on a different pipe.

What if we strace? The sleep is interruptible and open is going to be restarted. And indeed:

open("fifo", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC, 0666) = 3

Since it went all the way back to the boundary, the name "fifo" got looked up again and the thread proceeded to open the new pipe. And indeed, messages get relayed:

shell1$ cat > fifo
test

shell2$ cat fifo 
test

So, stracing blindly can sometimes give interesting results.

There are other methods which are not invasive in this manner, including systemtap, dtrace or ktrace (BSD-specific).

No comments:

Post a Comment