CloudStack + KVM + HA

Update (20140114): I have tested KVM HA on Apache CloudStack 4.2.1 and its functioning as expected. Please see this new blog post.

The Linux Kernel Virtual Machine (KVM) is a very popular hypervisor choice amongst CloudStack and OpenStack users. It is free and comes ready with popular Linux distributions like CentOS/RedHat and Ubuntu. In some cases, customers insist on using and end to end Open Source solution for their private cloud and KVM ends up being the only choice available.

So in a recent deployment experience, where the private cloud had to run on full open source mature solutions, the obvious choice was to use Apache CloudStack 4.1 and KVM (on CentOS 6.x) Hypervisors. Building the management tier and the KVM hosts itself was a breeze with CentOS KickStart and the SSH based Ansible for post-install configuration of services.

The infrastructure too was built for resilience – dual power supplies, dual 4 port network controllers wired across east-west switches with LACP, HA for storage. From the CloudStack side, the management servers behind load balancers with MySQL replication services, multiple PODs, multiple Clusters and multiple Hosts in a cluster. Also, new service offerings created with HA enabled.

One of the resilience tests was to simply power off a random KVM hypervisor within a logical cluster and watch the affected HA enabled VM(s) auto start on another host within the same cluster after the time out period. To everyones surprise, the Guest VMs just sat there marked in ‘Up’ state despite physically being offline. A close look at the management logs show little to no activity that CloudStack even cared for these affected guest VMs and the KVM host.

CloudStack VM HA with KVM was simply not working.

After spending some time on the Apache CloudStack mailing lists and JIRA, it turns out that its a CloudStack feature to “do nothing” in a host down scenario. This is primarily to avoid any split brain situations where we could potentially end up with the multiple copies of the guest VMs running on more than one physical host due to network connectivity problems. Since KVM does not have in built clustering/HA features, it is up to the CloudStack layer to decide on a corrective course of action. At this time, CloudStack simply chooses to ignore failed KVM hosts.

The situation could be even more problematic if you unfortunately happen to have the CloudStack “virtual router” also running on the failed host. All basic network services like DHCP, DNS and routing for that POD will fail as the router would be offline. This actually happened to a someone on the mailing lists. The “fix” would be to go into the CloudStack database and mark the Virtual Router as “destroyed”. CloudStack would then create a new virtual router and services would resume.

This issue is currently being discussed in this No HA actions are performed when a KVM host goes offline JIRA Ticket and there is developer interest in coming up with a solution for an upcoming Apache CloudStack 4.1.x release. Also see the thread HA not working – CloudStack 4.1.0 and KVM hypervisor hosts on cloudstack-users mailing list.

Please note that this problem is specific to KVM hypervisors only as they do not have in-built clustering capabilities. CloudStack with VMware and XenServers do not have this issue. Both VMware and XenServers clusters automatically do the right thing using their in-built clustering features.

As a side note, Citrix XenServer 6.2 has been fully open sourced in July and installation ISOs are available from XenServer.Org. Given the enterprise features that XenServer (like HA clustering and fault tolerance) already has over KVM, it is very likely to have massive adoption in fully open source clouds with future releases of Apache CloudStack.

Update: According to this thread, XCP is also affected.

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

More Posts - Website

Follow Me:
TwitterLinkedIn

Published by

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

13 thoughts on “CloudStack + KVM + HA”

  1. hi again , i found that cloudstack4.4.1 is compatible with xen serveur 6.0.2 ,the question is
    if i wont to configure a host with xenserver hypervisor, should i install xenserver hypervisor on ubuntu 12.04 LTS or download it .iso and install it like a VM (i use vmware )?

  2. Hello Shanker.

    Congratulations for your post, this is amazing!!!
    I’m sorry my english, i have practiced a little,
    but I still use the translator.
    Please you can i help me with one questions?
    I use cloudstack with centos 7 + KVM + OpenvSwitch and NSF Storage,
    but the HA-Virtual Machine does not work, There is something that must be enabled on the servers?
    I enabled the HA option only in the management Cloudstack!
    Thank you so much and, have a nice day.

    Best regards

    Angelo Moreira
    Network Admistrator in Brazil.

    Email: angelinux2011@gmail.com
    Youtube: https://www.youtube.com/nerdsociety
    Website: http://www.nerdsociety.com.br/

Leave a Reply