Monday, November 2, 2015

the kernel vs userspace arguments

Plenty of syscalls (e.g. open(2)) write to or read from userspace memory using dedicated primitives and maintain a local copy. Why not just deal with it like with regular kernel memory? As outlined in one of previous posts, mere access should work.

Passed address may belong to kernelspace, so it has to be validated. But let's say we already did that.

Consider a toy syscall:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int
sys_meh(const char *name, int value)
{
        if (!is_root()) {
                if (strcmp(name, "special") == 0)
                        return -EPERM;
        }

        spin_lock(&meh_lock);
        meh_modify(name, value);
        spin_unlock(&meh_unlock);
        return 0;
}

Here we accept a name and a value, but only root is allowed to modify the object identified as special.

First access is at line 5. What if the passed address is garbage? The read will trigger a page fault and with no way to communicate the problem to strcmp, the kernel is forced to oops/panic.

So let's say the address is not garbage.

The name is read twice: by sys_meh itself and later by meh_modify. Or in other words, the code relies on the value not changing. Is the expectation met? No. For instance there can be a second thread which will try to modify the string after strcmp is done, but before meh_modify is called. This would in effect circumvent the protection we had in place.

Here the situation is even worse. By the time the code reaches meh_modify, the kernel could have decided to evict the page backing the string. On access a page fault will occur and the kernel will try to bring it in. But it took a spinlock, which means it is illegal to service a page fault due to deadlock potential.

In situations like this the standard way is to store relevant data in a temporary buffer.

This causes serious trouble when various security-oriented syscall wrappers were implemented. For instance, code trying to restrict file access by monitoring filenames had the exact same bug visible with sys_meh above (but it could be also circumvented in myriad of other ways, including symlinks). Interested parties are invited to read Exploiting Concurrency Vulnerabilities in System Call Wrappers.

No comments:

Post a Comment