Tuesday, January 19, 2016

Fun fact: transient ETXTBSY when trying to open a script on Linux

When you run given binary, the kernel marks it as being executed which is then used to disallow opening it for writing. Similarly, when a file is opened for writing, it is marked as such and execve fails.

The error is ETXTBSY or Text file busy.

Situation with scripts is a little bit more involved. Doing "sh script.sh" will typically result in execve of /bin/sh, i.e. the kernel does not really know nor care what script.sh is.

Let's consider a file with the following content being passed to execve:
#!/bin/sh

echo meh
exit 0


A special handler recognizes #! and proceeds to change the executed binary to /bin/sh.

However, once execution gets going, you can open the file for writing no problem.

This poses two questions:
- will the execution fail if the script is opened for writing?
- will opening the file for writing ever fail because the script is being executed?

Let's experiment. The following program will help:

#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int
main(int argc, char **argv)
{
        int fd;

        if (argc != 2)
                return 1;

        for (;;) {
                fd = open(argv[1], O_WRONLY);
                if (fd != -1) {
                        close(fd);
                        continue;
                }
                perror("open");
        }
       
        return 1;
}


As you can see the program just repeatedly tries to open the file for writing.
We will run the script ("script.sh") in one terminal, while running the program in another. That is:

shell1$ ./write script.sh

shell2$ while true; do ./script.sh; done

And this gives....

shell1$ ./write script.sh
open: Text file busy
open: Text file busy
open: Text file busy
open: Text file busy
open: Text file busy
open: Text file busy
[snip]


shell2$ while true; do ./script.sh; done
zsh: text file busy: ./script.sh
zsh: text file busy: ./script.sh
meh
meh
zsh: text file busy: ./script.sh
meh

[snip]

So we see 2 failure modes:
- sometimes we fail to execve because the file is opened for writing
- sometimes we fail to open for writing because the file is being executed

The second condition is transient - the file is unmarked as the kernel proceeds to look up /bin/sh instead of the script.

A side observation is that if you have a file which is executable by others, they may interfere with your attempts to write it by repeatedly calling execve.h

Wednesday, January 13, 2016

can strace affect traced processes?

Please read "when strace fails to obtain syscall" first if you don't know how strace works.

Can strace/truss/whatever usage change the result of running a program?

Using ptrace introduces quite a lot of overhead (i.e. it slows things down). But besides that, can strace usage result in changing what the process ends up doing?

Yes it can and the key lies in waking up threads in an interruptible sleep. Note that you can count yourself unlucky if you encounter any of things below. strace is perfectly fine to use most of the time.

Depending on the code which put the thread to sleep, the thread will:
- do something without going back to the syscall boundary
- go back to the boundary and restart the syscall
- exit the kernel with EINTR

All of these are invasive in principle, but most of the time harmless.

One funny example of EINTR interferring with troubleshooting was few years back, when nginx would just hang around doing nothing when it was supposed to shutdown. Attaching to it revealed it was blocked in epoll_wait, which now returned with EINTR and that prompted nginx to exit.

Let's see a demo for the second case - automagic restart.

We are going to create a named pipe. The thread opening it for reading will block until it gets opened for writing (writers are blocked in the same manner).

shell1$ mkfifo fifo
shell1$ cat > fifo

shell2$ cat fifo

Stuff typed into the first cat should be read by the second cat.

Now consider:

shell1$ mkfifo fifo
shell1$ cat > fifo

shell2$ rm fifo 
shell2$ mkfifo fifo
shell2$ cat fifo

"cat > fifo" results in the shell trying to open the file, before it can proceed to execve cat. It got blocked:
[<ffffffff81226480>] pipe_wait+0x70/0xc0
[<ffffffff81226501>] wait_for_partner+0x31/0x70
[<ffffffff81226f60>] fifo_open+0x1b0/0x310
[<ffffffff8121bbff>] do_dentry_open+0x1ff/0x2f0
[<ffffffff8121d096>] vfs_open+0x56/0x60
[<ffffffff8122c0c4>] path_openat+0x1e4/0x12a0
[<ffffffff8122e34a>] do_filp_open+0x8a/0x100
[<ffffffff8121d45a>] do_sys_open+0x13a/0x230
[<ffffffff8121d56e>] SyS_open+0x1e/0x20
[<ffffffff817795ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff


But the name got unlinked and the other cat is waiting on a different pipe.

What if we strace? The sleep is interruptible and open is going to be restarted. And indeed:

open("fifo", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC, 0666) = 3

Since it went all the way back to the boundary, the name "fifo" got looked up again and the thread proceeded to open the new pipe. And indeed, messages get relayed:

shell1$ cat > fifo
test

shell2$ cat fifo 
test

So, stracing blindly can sometimes give interesting results.

There are other methods which are not invasive in this manner, including systemtap, dtrace or ktrace (BSD-specific).