HVX

HVX - A High Performance Nested Hypervisor

HVX - run unmodified guests on top of already virtualized hardware

HVX is a high performance hypervisor or Virtual Machine Manager (VMM) capable of running unmodified guests on top of already virtualized hardware. Conventional hypervisors like VMware ESX, KVM and Xen are designed to run on physical x86 hardware and use virtualization extensions offered by modern CPUs (Intel VT and AMD SVM) to achieve high performance. HVX, on the other hand runs inside a Virtual Machine (already virtualized hardware), and hence, does not have access to these hardware extensions. Instead, HVX utilizes binary translation (originally introduced by VMware for x86) and other patent pending technology to achieve high performance.

A brief introduction to x86 virtualization

In order to better understand how HVX works, here is a brief introduction to x86 virtualization. There are two types of virtualization: Para-Virtualization (PV) and Hardware Virtualization (HVM).

In PV, a guest VM is aware that it is running on top of a VMM. This awareness is accomplished via modifications to the guest kernel. The guest VM usually runs in unprivileged mode (Ring 3 in the x86 architecture). In order to perform privileged operations it uses hypercalls – an interface between a guest VM and the VMM itself. Currently, the only hypervisor supporting PV is Xen and it is only capable of running Linux guests because of the requirement to modify the guest kernel.

The second type of virtualization is HVM. The biggest advantage of HVM is that it does not  require guest modification. When a guest performs a privileged operation, the underlying VMM traps the offending instruction, simulates it, and then returns control to the guest. Modern VMMs use hardware assist (implemented by both Intel and AMD CPUs) in order to run virtualized guests. In this model, the CPU does most of the “hard” work including trapping privileged instructions, implementing shadow pages, virtual APIC and other features on the chip.

HVX does not use hardware-assisted virtualization because nested hardware assist is still not available or is not enabled by cloud providers. In addition, HVX needs to run above both HVM and PV hypervisors or VMMs. Hence, HVX employs another technique called binary translation (which was originally introduced by VMware for x86 virtualization).

HVX binary translation

In binary translation, the VMM reads the original guest bytecode block and “compiles” it into another bytecode sequence that can be executed safely on a CPU without causing a trap. Any privileged instructions are replaced with simulation code that operates in a software CPU rather than on the real CPU. HVX starts reading the guest bytecode until it reaches a jump or any other instruction that changes the instruction pointer. This block of code is then translated and a jump to HVX is put at the end of the sequence. After this code block is executed, the CPU passes control back to HVX which repeats the algorithm again and again. HVX is a complex piece of software that has to overcome many challenges in order to work well, including support of PV and HVM hosts, memory protection and isolation between guest VMs, resource limitations and control.

 

HVX: Performance

One of the most difficult challenges in nested virtualization, is enabling guest VMs to run with high performance. HVX utilizes many patent pending technologies to accomplish high performance. These include - caching and reusing of translated code in order to prevent recompilation, fast shadow MMU and virtual APIC implementation, direct execution of user space code (Ring 3), implementation of paravirtualized devices for network and IO, fast context switching between guest VMs and host kernels and the use of Linux for guest VM scheduling and memory management.

 

HVX: Consolidation

HVX also supports “consolidation” of several guest VMs on a single host VM. HVX can allocate more RAM to guest VMs than total RAM available on the host VM without significantly degrading guest VM performance. HVX does this by scanning and freeing identical memory pages.