A lazy reading on the web seems to suggest that the grass isn’t any more greener in the OpenStack camp when handling KVM host failures. See this earlier blog post for context.
According to the OpenStack Hypervisor feature support matrix, there is a feature called evacuate to deal with failed KVM hosts and the documentation reads as below:
As cloud administrator, while you are managing your cloud, you may get to the point where one of the cloud compute nodes fails. For example, due to hardware malfunction. At that point you may use server evacuation in order to make managed instances available again.
Evacuation seems to be a manual operational task – A large public/private cloud deployment will have nightmares maintaining SLAs (yes, some cloud providers do have SLAs and their ops teams have even tighter internal SLAs) every time a KVM host fails. I suspect that commercial OpenStack distributions (RackSpace/Mirantis/PistonCloud) and OpenStack public clouds (HP/RackSpace) have custom patches for automatic evacuation.
Alternatively, I have not Googled enough to find references for “stock” OpenStack handling KVM host failures automatically.
One reply on “OpenStack + KVM + HA”
[…] the cloud compute nodes fails. For example, due to hardware malfunction. At that point you may use server evacuation in order to make managed instances available […]