HVX: High performance nested virtualization
An integral part of HVX is a high performance nested hypervisor or Virtual Machine Manager (VMM) capable of running unmodified guests on top of already virtualized hardware. Conventional hypervisors such as VMware ESX™, KVM and Xen are designed to run on physical x86 hardware and use virtualization extensions offered by modern CPUs (Intel VT and AMD SVM) to virtualize the Intel architecture. HVX, on the other hand is a nested hypervisor that runs inside a virtual machine, where these hardware extensions are not normally available. Instead, HVX employs a technology called Binary Translation to implement high-performance virtualization that does not require these virtualization extensions.
Virtualization on the x86 Architecture
The job of the hypervisor is to provide the illusion that guest operating systems running below it are running on their own hardware while in fact they are not. The hardware is shared with the hypervisor itself, and any other virtual machines running on the same host.
When virtualization extensions are available, the easiest way to implement the illusion is using "trap and emulate" .Trap and emulate works as follows. The hypervisor configures the processor so that any instruction that can potentially "break the illusion" (e.g., accessing the memory of the hypervisor itself) will generate a "trap". This trap will interrupt the guest and will transfer control to the hypervisor. The hypervisor then examines the offending instruction, emulates it in a safe way, and then it will allow the guest to continue executing.
The trap and emulate approach is well understood and has good performance. But it depends critically on virtualization extensions. Without the extensions, the Intel architecture is not able to generate all the necessary traps. So in the cloud, where these extensions are not available, it cannot be used to implement nested virtualization.
Nested Virtualization using Dynamic Binary Translation
HVX, the Ravello hypervisor, uses a technology called binary translation. Unlike the trap-and-emulate method, binary translation does work when virtualization extensions are not available.
Binary translation was first described by Digital Equipment Corporation (DEC) in the early 90s. DEC used it to run programs written for the VAX computer on the Alpha AXP processor. The binary translation software would examine the instructions that make up a VAX program, translate them into equivalent Alpha instructions, and then run the translated instructions directly on the Alpha processor. The translation can be done ahead of time for an entire program, or a few instructions at a time while executing. The former is called Static Binary Translation (SBT), the latter is called Dynamic Binary Translation (DBT).
Ravello uses DBT for virtualization. The concept is the same as in the VAX-to-Alpha example. But instead of translating instructions from one CPU to the other, HVX uses DBT to find the "illusion breaking" instructions, and translate those into safe equivalents.
HVX Nested Virtualization: Performance
HVX has been extensively optimized to run inside a virtual machine. Many of our optimizations are novel and patent pending. Two basic optimizations that we implement are described below.
Firstly, modern operating systems prevent applications from meddling with their internals. It turns out that all the "illusion busting" instructions that an application could possibly execute are also unsafe for the operating system (OS) and already intercepted by it using trap-and-emulate. (In this case the Intel architecture is able to generate sufficient traps). This means that we only need to translate the OS itself (so-called "ring 0"), not any of the applications running inside it ("ring 3"). This greatly reduces the amount of code that needs to be translated.
The second optimization is based on the fact that executable code does not normally change. This means that we can translate a block of code once, save it, and re-use it if it is executed again later. Very quickly the entire guest OS will be translated, and no more translation happens. In addition, a block of translated code can generated in such a way that when it is done, it will directly call into another translated block. This is called block chaining.
HVX Nested Virtualization: Foundational Technology
The base functionality of HVX nested virtualization is to virtualize the Intel architecture and run unmodified guest operating systems. But because HVX fully controls the execution of its guest VMs, it is also a foundation that enables a lot of advanced features. Below are some examples of current features that are enabled by HVX:
- HVX allows us to intercept all network traffic and therefore allow us to implement our software defined network.
- HVX allows us to intercept all storage access, making our storage overlay possible.
- HVX allows us to run multiple virtual machines inside a single cloud VM. This is called consolidation. Consolidation allows a much higher utilization and in some cases, helps increase performance.
- HVX allows us to virtualize console access: a graphical console for each virtual machine is available.
HVX Nested virtualization: Running VMware or KVM VMs unmodified on leading clouds
In addition to the core nested hypervisor functionality, HVX exposes VMware or KVM devices to the virtual machine running on top. This enables enterprises to run their existing VMware virtual machines unmodified on top of Ravello on top of Oracle Public Cloud. Everything about the VM stays the same - the same operating system, paravirtualized drivers (VMXNet3 network driver, PVSCSI storage driver etc.), application settings, network settings, VMware tools etc.
HVX Nested^2 Virtualization: Run ESXi or KVM on leading clouds
Not only can HVX run VMware or KVM virtual machines, but recently we implemented virtualization hardware extensions (Intel VT and AMD V) functionality in software inside HVX. Now HVX can expose a true x86 platform type to the "VM" running on top. This allows enterprises to run hypervisors like KVM and ESXi on top of Ravello on top of Oracle Public Cloud. From an implementation perspective, we have adapted our binary translation so that it recognizes the double-nesting, and effectively removes one layer of nesting and runs the guest directly on top of HVX. As a result, the performance overhead is relatively low. In addition, we have also implemented nested pages support inside HVX which will make running a hypervisor on top of HVX even more efficient.
In addition, HVX enables some really interesting future use cases that Ravello is actively investigating. For example, it supports advanced use cases that today are only available in the datacenter, such as live migration, agentless backups and hot-plugging of CPU and memory.