CloudStack And Primary Storage NFS Performance

A minimal CloudStack deployment has one zone, one pod and one cluster running with a single shared NFS primary storage. The shared storage hosts the virtual disks for all instances running on the same cluster.

Multiple instance of different types and workloads are hosted on the same cluster. Instances running disk bound workloads like database services compete with instances running CPU/Memory bound tasks like application services. This can lead to unpredictable IO performance as instances are competing for the same shared disk resources. As load increases on the storage server, disk performance on all instances start to degrade.

A multi tenant cloud deployment thus needs to be able to provide instances with Quality Of Service (QoS) features for disk resources like

  1. High disk throughput speeds (Mbps)
  2. High disk input/output operations per second (IOPS)
  3. Low latency disk access

NFS does not work well in a multi tenant cloud environment as the protocol does not guarantee predictable performance. Clients cannot specify any QoS features as mount options. Besides, mount options are applicable to the whole host hypervisor and are applicable to all instances running on the host. NFS is however much simpler to manage and operate than iSCSI

Improving Multi Tenant Storage Performance

Creating Tiered CloudStack Clusters

The lowest organisational unit CloudStack provides with a dedicated primary storage is a “cluster”.  A cluster consists of  multiple hosts of similar type running a common hypervisor. and a dedicated primary storage device.

A workaround would be to have multiple clusters with different NFS based primary storage pools.

  1. Create multiple clusters with a dedicated shared primary storage over NFS
    1. A regular cluster with a shared storage for low performance disk IO
    2. A premium cluster with a shared storage for high performance disk IO
  2. Create instance service offerings with with differing network interface speeds and capabilities
    1. Regular offerings for Low CPU/Low Memory/Low Network Speeds
    2. Premium offerings for High CPU/High Memory/High Network Speeds
  3. Provision instances to match with the tiered cluster(s) using tags
    1. Premium instances on high performance premium clusters
    2. Regular instances on non-premium clusters

Caveats

  1. The partitioning strategy prevents instances running on low priority clusters from competing with premium instances
  2. Instances running within the same cluster continue to compete with each other for IOPS
  3. Instance offerings can only limit overall network interface speeds. It cannot control disk IOPS or overall disk throughput.

NFS Server Optimisations

If availability of dedicated NFS servers per cluster is a problem preventing the creation of tiered service clusters, some server side tricks can be done to create a tiered NFS primary storage.

  1. Redo the storage server RAID configuration to export more than one NFS volume having desired performance characteristics
    1. Create a high performance RAID volume using more disk stripes
    2. Create a regular performance RAID volume using normal disk stripes
  2. Create multiple CloudStack clusters with one or more hosts
    1. A high performance cluster using the high performance NFS volume as shared storage
    2. A regular performance cluster using the regular NFS volume as shared storage
  3. Provision instances to match the tiered cluster using tags
    1. High performance instances on the premium clusters
    2. Regular instances on the regular clusters

Caveats

  1. Like before, the isolation is only at the cluster level. Instances running within the same cluster continue to compete for disk resources

Currently, there does not seem to be a effective multi tenant solution for implementing QoS over NFS on the client side. A rogue instance can certainly make life miserable for other instances being hosted on the same shared primary storage.

For iSCSI based primary storage, Xenserver does allow IO prioritisation but only for multiple hosts accessing the same LUN. See XenServer-6.0.0 reference manual. IO prioritisation is not the same as true QoS but its better than having none.

While many enterprise storage vendors support QoS features, it is not very clear if they provide AWS style “Provisioned IOPS”. An AWS style “Provisioned IOPS” like “feature” can be implemented by having a storage layer that understands multi tenancy and provides a native API to limit/guarantee raw read/write IO operations on a per file/object/directory basis. Basically, a storage layer that does QoS on a wide range of factors. Cloudstack can then use these native APIs to set the desired PIOPS while provisioning instances using a plugin connector.

Its a very interesting problem to solve as clouds in general are notorious when it comes to their IO performance.

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

More Posts - Website

Follow Me:
TwitterLinkedIn

Published by

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

3 thoughts on “CloudStack And Primary Storage NFS Performance”

Leave a Reply