HVX is a high performance hypervisor or Virtual Machine Manager (VMM) capable of running unmodified guests on top of already virtualized hardware. Conventional hypervisors such as VMware ESX, KVM and Xen are designed to run on physical x86 hardware and use virtualization extensions offered by modern CPUs (Intel VT and AMD SVM) to virtualize the Intel architecture. HVX, on the other hand is a nested hypervisor that runs inside a virtual machine, where these hardware extensions are not normally available. Instead, HVX employs a technology called Binary Translation to implement high-performance virtualization that does not require these virtualization extensions.
The job of the hypervisor is to provide the illusion that guest operating systems running below it are running on their own hardware while in fact they are not. The hardware is shared with the hypervisor itself, and any other virtual machines running on the same host.
When virtualization extensions are available, the easiest way to implement the illusion is using “trap and emulate” .Trap and emulate works as follows. The hypervisor configures the processor so that any instruction that can potentially “break the illusion” (e.g., accessing the memory of the hypervisor itself) will generate a “trap”. This trap will interrupt the guest and will transfer control to the hypervisor. The hypervisor then examines the offending instruction, emulates it in a safe way, and then it will allow the guest to continue executing.
The trap and emulate approach is well understood and has good performance. But it depends critically on virtualization extensions. Without the extensions, the Intel architecture is not able to generate all the necessary traps. So in the cloud, where these extensions are not available, it cannot be used to implement nested virtualization.
HVX, the Ravello hypervisor, uses a technology called binary translation. Unlike the trap-and-emulate method, binary translation does work when virtualization extensions are not available.
Binary translation was first described by Digital Equipment Corporation (DEC) in the early 90s. DEC used it to run programs written for the VAX computer on the Alpha AXP processor. The binary translation software would examine the instructions that make up a VAX program, translate them into equivalent Alpha instructions, and then run the translated instructions directly on the Alpha processor. The translation can be done ahead of time for an entire program, or a few instructions at a time while executing. The former is called Static Binary Translation (SBT), the latter is called Dynamic Binary Translation (DBT).
Ravello uses DBT for virtualization. The concept is the same as in the VAX-to-Alpha example. But instead of translating instructions from one CPU to the other, HVX uses DBT to find the “illusion breaking” instructions, and translate those into safe equivalents.
HVX has been extensively optimized to run as a virtual guest. Many of our optimizations are novel and patent pending. Two basic optimizations that we implement are described below.
Firstly, modern operating systems prevent applications from meddling with their internals. It turns out that all the "illusion busting" instructions that an application could possibly execute are also unsafe for the operating system (OS) and already intercepted by it using trap-and-emulate. (In this case the Intel architecture is able to generate sufficient traps). This means that we only need to translate the OS itself (so-called “ring 0”), not any of the applications running inside it (“ring 3”). This greatly reduces the amount of code that needs to be translated.
The second optimization is based on the fact that executable code does not normally change. This means that we can translate a block of code once, save it, and re-use it if it is executed again later. Very quickly the entire guest OS will be translated, and no more translation happens. In addition, a block of translated code can generated in such a way that when it is done, it will directly call into another translated block. This is called block chaining.
The base functionality of HVX is to virtualize the Intel architecture and run unmodified guest operating systems. But because HVX fully controls the execution of its guest VMs, it is also a foundation that enables a lot of advanced features. Below are some examples of current features that are enabled by HVX:
In addition, HVX enables some really interesting future use cases that Ravello is actively investigating. For example, it supports advanced use cases that today are only available in the datacenter, such as live migration, agentless backups and hot-plugging of CPU and memory.