Just a few months ago LinuxJournal published an article I had written regarding Linux vDSO system calls. Anyways, after chatting with one of their employees, they have decided to put it on their front page. What is exciting is that this information is free and easy to access. Get yer' nerd on.
-Matt
Showing posts with label vdso. Show all posts
Showing posts with label vdso. Show all posts
Tuesday, February 7, 2012
Tuesday, November 8, 2011
Kernel and vDSO cookery...
A while back, I wrote an article on how to create a "virtual dynamic shared object" in the Linux kernel. Anyways, I sent this to LinuxJournal, and they accepted it. The November 2011 edition has the article "Creating a vDSO: the Colonel's Other Chicken"
-Matt
-Matt
Saturday, February 12, 2011
Linux syscall, vsyscall, and vDSO... Oh My!
The following article is a brief investigation I wrote up, which looks into the system call functionality of the Linux kernel. Mainly, I was trying to clarify the difference between vsyscall and vdso. This was conducted, more or less, on 2.6.32 and 2.6.37 branches, if I recall properly. Anyways here goes:
Often these calls are not even accessed directly by an application, rather they are executed via a wrapper. The GNU C library, glibc, has wrapper functions that do all the handy-dandy wrapping. For instance, a call to gettimeofday() in an application is really just a call to glibc's wrapper for the system call for gettimeofday(). There is no overhead because the wrapper, at link-time, just associates gettimeofday() to the system's appropriate version of the routine. If one were to desire to make a true system call, avoiding the wrapper, Linux provides a syscall() routine allowing such. Both wrapper and the explicit syscall versions of gettimeofday() are demonstrated below:
System calls are loaded at kernel boot-time. All of these calls are accessed by a number. In the example above SYS_gettimeofday is just a constant integer. And this integer is the index into an array (called an interrupt descriptor table) of system calls that are constructed at boot-time. This table/vector can be thought of as pointers to the actual routines [3]. So at index SYS_gettimeofday is a pointer to a piece of code which is where the actual sys_gettimeofday() routine exists.
Ok, so interrupts are relatively cheap in execution time. However, the way the kernel is designed does add some overhead when a syscall is executed. Linux is divided into two primary memory segments. User-space and kernel-space (userland, kernelland). User-space memory is that which user applications run in. Kernel-space is where all the kernel services run. This segmentation acts as a security barrier, so that crafty and/or malicious user apps cannot directly access the kernel's memory. Making a syscall, as mentioned previously, is just a bridge, a mere wormhole, between these two segments of memory. It is this hop between address spaces that causes quite some overhead, as the kernel must switch between user and kernel-land memory segments, and then back again as the syscall completes [5, 6]. Changing the memory addressing requires some register shuffling, and just imparts more overhead on the whole syscall process.
userland, that should not cause a security hole for the kernel. The page of memory where these calls lies is mapped into each running process' user-space. Thus, when a call to one of these syscalls is made, no context switch between the memory regions of user and kernel-space is conducted, thus less overhead.
Another interesting means of reducing syscall overhead comes specific to the underlying CPU architecture. Both the more recent AMD and Intel chips have implemented a fast syscall functionality. Instead of issuing an interrupt, programs can issue instructions (SYSCALL/SYSENTER and SYSRET/SYSEXIT) that act faster than a traditional interrupt. The usage of these are based on object code in the OS. Therefore, a programmer does not have to consider how to implement/request the use of a SYSCALL/SYSENTER over a traditional "int 0x80" [1]. When applications are built, based on the architecture and the system, vsyscall and vdso linkage is done automagically.
As demonstrated above, by using the syscall() routine, a traditional syscall will be conducted, even if there is vDSO support (virtual Dynamically Shared Object). However, despite this fact, that call might still be using the newer SYSCALL/SYSENTER CPU instructions. The glibc wrapped gettimeofday() call is what most programs would use. Since the kernel has been designed to use the most efficient mechanism of syscall, that version has the potential to be a virtualized syscall that is mapped into userspace.
To determine if a specific call is using a virtualized (user-space) syscall or a traditional, memory segment-shuffling, syscall, the strace utility can be used. If a true traditional syscall is being conducted, the routine will be output by strace, will look similar to the following:
According to comments in glibc-2.12.1: "The vsyscall page is a virtual DSO (Dynamic Shared Object) pre-mapped by the kernel" [7].
vsyscall and vDSO are similar in how they work, however there are some slight differences. vsyscall is limited to 4 entries, and is static in memory. Therefore, any statically linked applications can guarantee where vsyscalls are loaded. On-the-other-hand, vDSO is dynamically loaded into the user process, therefore it is not predictable due to Linux's randomized address space layout. If more than 4 vsyscalls are needed, then a vdso should be used instead [8].
For example, run `cat /proc/self/maps` and look at both the '[vdso]' and '[vsyscall]' entries. If your system supports these, the memory range for vDSO is different for each process issued, and vsyscall is totally predictable. If you dont believe me, run 'cat /proc/self/maps' a few times and note the addresses of vdso and vsyscall. Since this example is looking at '/proc/self/maps' the memory mappings displayed are for that 'cat' process [9, 10].
[1] http://en.wikipedia.org/wiki/System_call
[2] Linux User's Manual: intro(2)
[3] http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
[4] http://en.wikipedia.org/wiki/Interrupt
[5] http://en.wikipedia.org/wiki/Kernel_(computing)
[6] http://www.linux.it/~rubini/docs/ksys/ksys.html
[7] glibc-2.12.1
[8] Linux Kernel 2.6.37 arch/x86/kernel/vsyscall_64.c
[9] http://anomit.com/2010/04/18/examining-the-linux-vdso/
[10] http://www.trilithium.com/johan/2005/08/linux-gate/
-Matt
-= What are System Calls =-
System calls are routines that communicate directly with the operating system in hopes of attaining some specific piece of information, or to make a specific request that the OS needs to fulfill. For example, gettimeofday() is a request often from user-land for an application to obtain the current timing information from the operating system's kernel. In Linux this call is implemented as a system call. Often this interface exists as a means of fulfilling a request by a user for kernel information. This interface allows a lesser privileged application to access higher-privileged kernel information [1, 2].Often these calls are not even accessed directly by an application, rather they are executed via a wrapper. The GNU C library, glibc, has wrapper functions that do all the handy-dandy wrapping. For instance, a call to gettimeofday() in an application is really just a call to glibc's wrapper for the system call for gettimeofday(). There is no overhead because the wrapper, at link-time, just associates gettimeofday() to the system's appropriate version of the routine. If one were to desire to make a true system call, avoiding the wrapper, Linux provides a syscall() routine allowing such. Both wrapper and the explicit syscall versions of gettimeofday() are demonstrated below:
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/time.h>
void main(void)
{
struct timeval tv;
/* glibc wrapped */
gettimeofday(&tv, NULL);
/* Explicit syscall */
syscall(SYS_gettimeofday, &tv, NULL);
return 0;
}
System calls are loaded at kernel boot-time. All of these calls are accessed by a number. In the example above SYS_gettimeofday is just a constant integer. And this integer is the index into an array (called an interrupt descriptor table) of system calls that are constructed at boot-time. This table/vector can be thought of as pointers to the actual routines [3]. So at index SYS_gettimeofday is a pointer to a piece of code which is where the actual sys_gettimeofday() routine exists.
-= Overhead =-
Originally, invoking a syscall in Linux was actually a pretty expensive process, as it was implemented as a system interrupt (int 0x80). When the syscall interface was called, via the "int 0x80" instruction, the CPU would pass control to the OS. The OS, would look at what value was passed in, such as SYS_gettimeofday, and then the interrupt vector would be indexed and gettimeofday() would get called. Interrupts force the CPU to save the state of execution just before the system call was executed and the system interrupted. This state is restored after the interrupt (in this case a syscall) completes.Ok, so interrupts are relatively cheap in execution time. However, the way the kernel is designed does add some overhead when a syscall is executed. Linux is divided into two primary memory segments. User-space and kernel-space (userland, kernelland). User-space memory is that which user applications run in. Kernel-space is where all the kernel services run. This segmentation acts as a security barrier, so that crafty and/or malicious user apps cannot directly access the kernel's memory. Making a syscall, as mentioned previously, is just a bridge, a mere wormhole, between these two segments of memory. It is this hop between address spaces that causes quite some overhead, as the kernel must switch between user and kernel-land memory segments, and then back again as the syscall completes [5, 6]. Changing the memory addressing requires some register shuffling, and just imparts more overhead on the whole syscall process.
-= vsyscall and vDSO =-
To reduce the overhead of hopping between user and kernel spaces, a newer mechanism in Linux allows certain syscalls to be accessed directly from userspace, without the need to cross the user/kernel space barrier. This is just what the vsyscall and vdso (Virtual Dynamic Share Object) interfaces do. At boot-time a page of memory is dedicated to containing a subset of syscalls, deemed safe to execute fromuserland, that should not cause a security hole for the kernel. The page of memory where these calls lies is mapped into each running process' user-space. Thus, when a call to one of these syscalls is made, no context switch between the memory regions of user and kernel-space is conducted, thus less overhead.
Another interesting means of reducing syscall overhead comes specific to the underlying CPU architecture. Both the more recent AMD and Intel chips have implemented a fast syscall functionality. Instead of issuing an interrupt, programs can issue instructions (SYSCALL/SYSENTER and SYSRET/SYSEXIT) that act faster than a traditional interrupt. The usage of these are based on object code in the OS. Therefore, a programmer does not have to consider how to implement/request the use of a SYSCALL/SYSENTER over a traditional "int 0x80" [1]. When applications are built, based on the architecture and the system, vsyscall and vdso linkage is done automagically.
As demonstrated above, by using the syscall() routine, a traditional syscall will be conducted, even if there is vDSO support (virtual Dynamically Shared Object). However, despite this fact, that call might still be using the newer SYSCALL/SYSENTER CPU instructions. The glibc wrapped gettimeofday() call is what most programs would use. Since the kernel has been designed to use the most efficient mechanism of syscall, that version has the potential to be a virtualized syscall that is mapped into userspace.
To determine if a specific call is using a virtualized (user-space) syscall or a traditional, memory segment-shuffling, syscall, the strace utility can be used. If a true traditional syscall is being conducted, the routine will be output by strace, will look similar to the following:
gettimeofday({1297472587, 581519}, NULL) = 0
According to comments in glibc-2.12.1: "The vsyscall page is a virtual DSO (Dynamic Shared Object) pre-mapped by the kernel" [7].
vsyscall and vDSO are similar in how they work, however there are some slight differences. vsyscall is limited to 4 entries, and is static in memory. Therefore, any statically linked applications can guarantee where vsyscalls are loaded. On-the-other-hand, vDSO is dynamically loaded into the user process, therefore it is not predictable due to Linux's randomized address space layout. If more than 4 vsyscalls are needed, then a vdso should be used instead [8].
For example, run `cat /proc/self/maps` and look at both the '[vdso]' and '[vsyscall]' entries. If your system supports these, the memory range for vDSO is different for each process issued, and vsyscall is totally predictable. If you dont believe me, run 'cat /proc/self/maps' a few times and note the addresses of vdso and vsyscall. Since this example is looking at '/proc/self/maps' the memory mappings displayed are for that 'cat' process [9, 10].
[1] http://en.wikipedia.org/wiki/System_call
[2] Linux User's Manual: intro(2)
[3] http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
[4] http://en.wikipedia.org/wiki/Interrupt
[5] http://en.wikipedia.org/wiki/Kernel_(computing)
[6] http://www.linux.it/~rubini/docs/ksys/ksys.html
[7] glibc-2.12.1
[8] Linux Kernel 2.6.37 arch/x86/kernel/vsyscall_64.c
[9] http://anomit.com/2010/04/18/examining-the-linux-vdso/
[10] http://www.trilithium.com/johan/2005/08/linux-gate/
-Matt
Subscribe to:
Posts (Atom)