Tuesday, November 3, 2015

a primitive to read data from userspace

As was outlined previously, a special primitive is needed to access userspace data safely. There are several highly specialized variants in both Linux and FreeBSD kernels, but they all work based on the same principle. Example below is taken from FreeBSD since Linux equivalent is way more convoluted.

Let's reiterate, consider:
int val;

val = *some_userspace_pointer;
printf("%d\n", val);

If some_userspace_pointer e.g. contains garbage, a page fault is going to occur. The page fault handler will conclude the fault cannot be satisified. But there is no way to tell this code about this issue - it only reads the value and assumes it succeeded.

What's needed is a function which will be able to actually detect the condition and return an error to the caller. With such a primitive in place the code becomes:
int val, error;
error = copyin(some_userspace_pointer, &val, sizeof(val));
if (error != 0)
        return error;
printf("%d\n", val);
A super slow variant would lock the address space, ensure relevant mappings are fine and only then do the read. That's a lot of of work completely unnecessary in the common case.

Instead, the standard approach is to have a way to tell the page fault handler where to jump if the page fault cannot be serviced. The place is supposed to clean up after failed copy and go back to the original caller.

In pseudo-code it would look like this:
int
copyin(void *from, void *to, size_t len)
{
       
        set_fault_handler(copyin_fault);
        if (len == 0)
                goto done_copyin;
        if (!fits_userspace(from, len))
                goto copyin_fault;
        memcpy(to, from, len);
done_copyin:
        set_fault_handler(0);
        return 0;
copyin_fault:
        set_fault_handler(0);
        return EFAULT;
}    

Let's take a look at an actual implementation with straightforward assembly (copyin(9) from the FreeBSD tree):
/*
 * copyin(from_user, to_kernel, len) - MP SAFE
 *        %rdi,      %rsi,      %rdx
 */
ENTRY(copyin)
        PUSH_FRAME_POINTER
        movq    PCPU(CURPCB),%rax
        movq    $copyin_fault,PCB_ONFAULT(%rax)

The handler is first set...
        testq   %rdx,%rdx                       /* anything to do? */
        jz      done_copyin

        /*
         * make sure address is valid
         */
        movq    %rdi,%rax
        addq    %rdx,%rax
        jc      copyin_fault
        movq    $VM_MAXUSER_ADDRESS,%rcx
        cmpq    %rcx,%rax
        ja      copyin_fault

... the range is then validated ...

        xchgq   %rdi,%rsi
        movq    %rdx,%rcx
        movb    %cl,%al
        shrq    $3,%rcx                         /* copy longword-wise */
        cld
        rep
        movsq
        movb    %al,%cl
        andb    $7,%cl                          /* copy remaining bytes */
        rep
        movsb
 ... and finally the copy actually done.  In an event of a page fault which cannot be satisified, the kernel will go to copyin_fault label which will unset the handler and return an error effectively cleaning up after the function. The target buffer may now contain partially copied data, but that's an acceptable state - if the syscall failed, buffer content is not specified. Finally, if a page fault could be serviced without an issue (e.g. a page was swapped in) or there were no page faults, copying finishes and the code falls below to unset the handler and return 0.

done_copyin:
        xorl    %eax,%eax
        movq    PCPU(CURPCB),%rdx
        movq    %rax,PCB_ONFAULT(%rdx)
        POP_FRAME_POINTER
        ret

        ALIGN_TEXT
copyin_fault:
        movq    PCPU(CURPCB),%rdx
        movq    $0,PCB_ONFAULT(%rdx)
        movq    $EFAULT,%rax
        POP_FRAME_POINTER
        ret
END(copyin)

No comments:

Post a Comment