The topic of Enterprise Disaster Recovery in the “Cloud” often comes up when I am working with Ravello customers. Since Ravello can create high-fidelity on-demand copies of your VMware and KVM environments with complex networking in Amazon AWS and Google Cloud people often ask me about DR testing scenarios. A couple of years ago I was utilizing Veeam to backup my VMware ESXi clusters and recover virtual machines in a few different on-premise Data Centers. I decided to take what I learned then and apply it to some “Cloud” based DR use cases utilize Ravello’s service and build a detailed blog. Since Ravello can now run ESXi on the public cloud, I can use AWS or Google as my remote site. This could be a useful lab deployment for service providers and/or their customers who want to demo, POC or just try out Veeam Cloud Connect for themselves, using ESXi on a remote site (in this case the remote site happens to be AWS or Google Cloud).
I have some Veeam contacts so I reached to discuss a couple different use cases ideas. I landed on one very interesting concept where I would utilize Veeam’s “Cloud Connect” offering.
Today, lots of service providers support “Cloud Connect Gateways” for off-site backups. This is great as it gives customers another copy of their very important virtual workloads. To restore a workload from those back-ups, customers would typically point to an ESXi (or HyperV as the case may be) target either in their private or in a hosted/public cloud running ESXi. We figured it would be an interesting exercise to use those off-site backups and restore the workload on AWS or Google Cloud using Ravello’s nested ESXi offering.
Here are some options I see that exist today:
– They get around to repairing their local environment or have another data center and point to the repository at the service provider and start the restore process. This is not ideal in my opinion, as you have to transfer all the data required across the WAN to conduct the restore. You will need a huge pipe for that and it will take a considerable amount of time.
-Complete download of hosted backups is not the only option, many service providers are offering in-place restores from cloud connect backups. They can utilize the service providers “shared” VMware environment to do the restore. Each service provider has their own pricing scheme so please look them up, but they typically have a base subscription fee, may or may not have on-boarding fees and then have usage based charges. Also, they tend to provide additional value added services and guidance to the customer in addition to providing the capacity.
– The customer could request to have a “always ready” virtual environment running and provisioned for them when they declare a disaster. Essentially a secondary Data Center sized for their needs. This would come with a large cost, as it needs to be ready and maintained on a regular basis.
Now back to my use case:
– Utilize the endless on-demand capacity of a public cloud such as AWS or Google and pay only for the capacity that you use and require at the time. There are no on-boarding fees or monthly subscription fees – environments can be created and destroyed entirely on-demand. Seems like a cool proof of concept (note that this is a lab scenario and a technology demonstration at this point)
Since Ravello has recently released support for VMware’s ESXi hypervisor, I can take this use case to some pretty cool levels and have an enormous amount of flexible options.
So for this blog and use case we have 2 environments, they will be known as:
1 – “Local Data Center” – That is the one that was running on the “Flex POD” but is now running on Google Cloud. But think of it as my “Local Data Center”
2 – “Cloud Data Center” – This is the one running in Amazon AWS. Think of it as my service provider. It just happens you have full control of the service provider and can spin it up on-demand when you like. Sweet!!!
I may soon expand this blog and use case as Veeam have some other cool options I did not dive into in this blog, keep an eye out for that.
Ok so let’s get into the technical stuff and details on the setup…
Below you can see that I have 2 applications deployed inside Ravello. One application is running in Amazon and the other application is running on Google Cloud. (Each Application has 6 VMs)
I will dive into the detailed application configuration below:
Below is the detailed application view inside Ravello, this application deployed in Google is acting as the “Local Data Center”. This would be where my primary “protected” virtual machines are running,
You can see I have a 2 – VMware ESXi Nodes in a cluster utilizing FreeNAS for NFS datastore(s). I also have a VCenter deployed and managing the cluster.
Also 2 virtual machines running windows and “Veeam Backup and Replication Server”.
One great thing about Veeam is all the software and features are available with the same package installed. You just configure the component you want to utilize.
In my case, 1 is acting as the primary backup server and the 2nd is acting as a WAN accelerator.
I simply keep a virtual machine image inside my Ravello library where I have the base software installed. I “drag” and “drop” a new one on the canvas to add an additional server and or Veeam component(s).
Below is the “Cloud Data Center” application version running in Amazon, Again think of this application environment as my DR site. Later you will see I can recover virtual machines into this environment.
It is basically an exact copy. The only difference is I am running “Veeam Cloud Connect” on one of the windows virtual machines. It receives backups from my “Local Data Center”. This can be on going or scheduled, as you will see this later.
You will see later we are taking advantage of Veeam’s Deduplication and Compression capabilities to reduce the amount of data sent to the “Cloud Connect Machine”.
A nice option with this design is I can schedule when my “Backup Copy” runs on my “Local Data Center”, meaning I can schedule this to happen in the middle of the night.
Even better I don’t have to run my “Cloud Data Center” application all the time. I can simple schedule a separate job to call the Ravello API to start my “Cloud Data Center” just before my replication job runs and shut down after a few hours.
Here you can find some cool examples on how to utilize the API and Python SDK all orchestrated using Jenkins.
It simply comes down to your recovery objectives, in this case I have an entire copy of my virtual environment applied each night, You can also run this 24/7 if you want to have a shorter recovery point objective. RTO / RPO
Below on the right – you can see the network configuration for my “Cloud Connect” virtual machine. We have a local network that we use for application communication. In the case it is L2 – 10.0.0.0/24
A very important aspect of this approach is making sure when we boot up the “Cloud Connect” machine it is accessible via the same public ip address. (Otherwise the local side would need to be manually configured each time).
For this I am using Ravello’s “Elastic IP Address” feature.
This allows me to stop the environment, start it later on and maintain the same public IP address in front of “Cloud Connect”. You could go ahead and register a domain name against this public ip address if you like.
This IP could also move to Google Cloud if you wanted to move your “Cloud Connect” infrastructure to another public cloud or another Amazon region for latency purposes. Customers also like to take advantage of the best price for their applications; Ravello has an option to take advantage of the best-priced cloud provider at the particular time.
Below on the right – You can see what services I have exposed to via the public ip address.
With Ravello you can either choose to completely “fence” your virtual machines or specify what ports you want to expose. You can see I have “fenced” the VCenter and ESXi’s”, the can be access via the other vm’s over the private network or console.
In this case I have opened port “6180” for replication traffic and “3389” so I can RDP to the “Cloud Connect” machine. I use this to configure “Cloud Connect” and I am also running “vCenter Client” on that vm.
Ravello also supports “IP Filtering”, this allows me to only let certain source addresses connect to the services or ports I described above.
Below on the right – I have a 2nd disk attached to my “Cloud Connect” virtual machine. This disk is where I store all the backup’s coming from “Local Data Center”.
You have complete control of the virtual machines configuration, you can also make this bigger later or add more disk, memory, cpu’s and so on…
Here is a detailed view of my application networking; you can see I have many networks.
I have designed the VMWare ESXi cluster in a traditional way, I have separate networks for storage, vmotion, guest networks for example.
With Ravello you also have “console” access to view the virtual machine and apply initial configurations options.
Here you can see the VCenter virtual appliance running.
Below you can see one of my ESXi hypervisors up and running.
Here is my FreeNAS Virtual machine.
OK – so let’s dive into the Veeam configuration and backup examples.
Below you can see I am running a backup on my “Local Data Center”, I am doing a backup on a Linux virtual machine.
I also have a 2nd backup running, this one happens to be my local domain controller.
We are going to let those run and go ahead and configure our “Cloud Connect” virtual machine running in our “Cloud Data Center” on Amazon I described above.
I have RDP’ed into my virtual machine via the “Elastic IP Address” – We will go ahead and configure it.
We can go ahead and add a “Cloud Gateway”; in my example I already have one configured but will review the settings for you.
When you are finished you will see your “Cloud Gateway”, you can also edit these setting later as I am doing in the example.
You can see I am running the service on port “6180”
Here I specify my “Public Elastic IP Address”
You will want to go ahead and create a user, “Cloud Connect” supports multi tenancy. So you can have seperate clients with separate user account, storage and quota’s.
I will do this example using the user “kyle”, you can see I haven’t been active for 34 days but we can go ahead and sync the changes since the lab successful replication run.
I currently have 3 virtual machines protected.
Here I configure my user; you can control the lease for example.
Here you can configure the repository, also the user quota. You can also choose whether you want to utilize WAN acceleration.
Here you can see my WAN accelerator virtual machine. I am RDP’ed into it. It’s just another virtual machine in my “Cloud Data Center”.
Ok so now that we have the “Cloud Connect” all configured and running in out “Cloud Data Center” we can go ahead and register it in our “Local Data Center”.
I am RDP’ed into my “Local” Veeam virtual machine. You can see I have already added my “Local” VCenter and ESXi Cluster”. I have 2 ESXi hosts up and running.
Let’s go ahead and register the “Cloud Connect Gateway” – It’s acting as a “Your Service Provider” for example.
This is the one we just configured in “Amazon”.
Here we specify my “Public Elastic IP Address” – This is my “Cloud Connect” virtual machine running on Amazon.
Here we can see the “Certificate” and also specify the user we configured earlier.
It accepted my user account, you can see I have 100GB quote and am also configured for WAN acceleration.
Ok we are all done; let’s move on to the next steps.
You can see our “Cloud Connect” endpoint up and running.
Earlier you saw me running a couple “Local” backups. These are stored on my “Local” environment.
Next we need to configure a “Backup Copy Job”. This is telling Veeam to take the “Local” backup copy and replicate it to our “Cloud Data Center” Environment.
You can see that you have the option to schedule the job to run at certain times.
Here you click “Add” – you are going to pick from “backups”,
You can see the existing “Local” backups, I am going to pick my Linux virtual machine.
Here I choose my repository – In this case I want to utilize my “Cloud Repository”.
You have some retention options for example. – You can also have a look at “Advanced Options”.
I am choosing 7 restore points. You can also see the quota I still have available.
I want to make sure to utilize my WAN accelerator on my “Local Environment”.
All right now we can see the job running, we are now replicating this virtual machine to our “Cloud Data Center” Environment.
You can see lots of details at the bottom of the running job.
Here you can see I am connected to the “Cloud Data Center”, I see “kyle” as an active user. I have 3 protected virtual machines.
Here I jump back to the “Local Data Center” side. Have a look at the compression and dedupe, you can see we have processed 29.4GB but only copied 477.3MB over the WAN.
You can see the WAN is the bottleneck, not the virtual infrastructure.
Back to the “Cloud Data Center”, you can see the job was successful and I now have 4 protected virtual machine in the “Cloud”.
I am up to 55% of my quote of 100GB.
Ok so not that we have some “Local” backups and also have replicated a new virtual machine to the “Cloud Data Center”.
As you saw above I have 4 vm’s protected in the “Cloud Data Center”.
First – Let’s “Rescan the Repository”, this will allow us to pickup the new changes that have been applied to the Repository.
You can see below, 1 new virtual machine has been added (That is our most recent “backup copy” job).
Let’s get into some “Cloud Data Center” restore options. Go ahead and choose the “Restore” button.
We will start with a “Entire VM Restore”.
Here you select “From Backup”.
You can see all the virtual machines I have available to restore. Also the amounts of restore options I have available for each virtual machine.
You can see I picked my Linux virtual machines. It shows the restore point date available.
Here I want to select a new location where I want to restore to my “Cloud Data Center”.
I choose to restore to “esx01”
I select the “datastore” I want the files to be restore on.
I have the option to choose the disk type, to save space I always select “thin disk”.
Here I select the “virtual network” I want assigned to the virtual machine after it boots up.
Ok let’s go ahead and kick off the restore. We can review all the options we selected.
I have the option to power on after restore, this time I will do it manually via VCenter.
Now looking at VCenter I can see the virtual machines I have started to restore to my “Cloud Data Center”.
Going back into the Veeam screens, the restore is currently running (you can see lots of data in the log).
All right, we have successfully restored our virtual machine!!!
I have booted up the virtual machine in “Cloud Data Center” VCenter and I am able to use the console to connect to the virtual machine.
The vm is in the exact same state as it was running in my “Local Data Center”
Now let’s go ahead and do an “Instant VM Recovery”. Same thing start with the “Restore” button above.
Same thing, pick the virtual machine you want to restore.
I will pick the “Full” backup I have available.
Same as last time, I want to choose to restore to a new location.
I will use “esx01” again.
Make the other selections required.
Pick the datastore attached to the cluster.
Ok our “Instant VM Recovery” has started.
Looking inside VCenter we can see the recovery has started.
So now Veeam is waiting for “user to start migration”.
I can go ahead and boot up my virtual machine in VCenter.
You can see the virtual machine is “Mounted”.
I can choose “Migrate to Production”
I am able to choose the options available.
If you are utilizing separate “Proxy Servers” you can make the appellate selections.
Ok we can finish up the “Recovery”.
You can view the status of the “Migration”.
Ok – SUCCESS!
We can get on the “console” of the virtual machine
Ok, let’s go ahead and try out a “File Level Restore” to our “Cloud Datacenter” environment.
This time we pick a “windows virtual machine” to do a file level restore.
I can pick my restore points available.
Veeam shows me my file system and all the files available that can be restored.
As you can see we also have some “application” options inside Veeam.
This virtual machine is a domain controller so I can restore “Active Directory Items”.
This Virtual machine had a D Drive.
You can simply “Copy” and “Paste to local machine” the files you want to recovery.
We can also do file level restores of Linux file systems.
For this option Veeam will boot up a temporary virtual machine in our ESXi cluster.
Choose your Linux virtual machine
You can select “Customize” to make the appropriate selections.
You can see the “VeeamFLR” virtual machine boot up.
You can now make the appropriate file level restores you desire.
Now I am going to go ahead and turn off just my ESXi cluster environment, I want to leave the “Cloud Connect” and “WAN Accelerator” online to continue to receive vm replication.
In conclusion, utilizing Ravello’s powerful service and the unique feature of being able to run ESXi hypervisors in the public cloud I can take my disaster recover options to the next level. It’s a great use case for Veeam training and also doing dry runs to ensure your disaster recovery plans are ready for when you really need the most. Not to mention it all on-demand and utilizing public cloud infrastructure,
Go ahead and sign up for the ESXi beta and give it a try for yourself.
Please send us your feedback and if you need any help with your specific use case.
About Ravello Systems
Ravello is the industry’s leading nested virtualization and software-defined networking SaaS. It enables enterprises to create cloud-based development, test, UAT, integration and staging environments by automatically cloning their VMware-based applications in AWS. Ravello is built by the same team that developed the KVM hypervisor in Linux.