CloudStack Advanced Networking With GRE SDN Tunnels

  • Update (20140124): OpenVswitch integration required for GRE SDN networking is currently broken in Apache CloudStack 4.2.x (and master branch) according to this.

CloudStack has two types of Zones – Basic zones and Advanced Zones.

A Basic zones is a flat network where SecurityGroups are used for traffic isolation while Advanced Zones can use VLAN, GRE or STT based isolation. The maximum VLANs possible per zone is 4096 while the actual number depends on the L3 switching device’s capabilities. Since every guest tenant is allocated one VLAN ID, the maximum tenants supported would be 4K tenants within the same zone. If there is a requirement to support more than 4K customers within the same zone, SDN technologies like GRE and STT can be used.

The GRE SDN feature seems to be most mature with XenServer hypervisor than KVM at this time. To use GRE isolation, the additional steps are roughly as below:

  1. Update global setting sdn.ovs.controller to true
  2. Update global setting sdn.ovs.controller.default.label to match your XenServer’s physical guest interface’s label. Mine’s labeled as GUEST
    [root@vXen-2-1 ~]# xe network-list name-label=GUEST 
    uuid ( RO)                : 41c845e6-4110-5d36-9f7e-b85b4909600d
              name-label ( RW): GUEST
        name-description ( RW): 
                  bridge ( RO): xenbr2
    
  3. Create an Advanced Zone with isolation set to GRE instead of VLAN for the guest network
    CloudStack GRE Physical Network
  4. Add VLAN range for the Guest. Please note that these are not actually VLAN ranges, but just a free range of keys used by CloudStack to create the GRE tunnels. So a range of 100 – 110 will support 11 tenants over GRE.
    Screen Shot 2013-09-16 at 5.59.47 PM
  5. Assign IP addresses to the XenServer physical guest interface. I have used 192.168.65.0/24 across all the XenServer hosts
    Screen Shot 2013-09-16 at 5.34.09 PM

Turn on the Zone once these settings are in place. On creation of a new instance, GRE tunnels get created per tenant. Sample ovs-vsctl, ovs-ofctl and tcpdump output below. All VMs share the same tunnel per tenant. In this case, its tunnel with GRE key “103”.

ovs-vsctl

[root@vXen-2-1 ~]# ovs-vsctl show
    Bridge "xapi1"
        fail_mode: standalone
        Port "xapi1"
            Interface "xapi1"
                type: internal
        Port "vif4.0"
            Interface "vif4.0"
        Port "vif3.0"
            Interface "vif3.0"
        Port "t103-1-3"
            Interface "t103-1-3"
                type: gre
                options: {key="103", remote_ip="192.168.65.246"}
    ovs_version: "1.4.2"

ovs-ofctl

[root@vXen-2-1 ~]# ovs-ofctl show xapi1
OFPT_FEATURES_REPLY (xid=0x1): ver:0x1, dpid:00008219499edf3d
n_tables:255, n_buffers:256
features: capabilities:0xc7, actions:0xfff
 1(vif3.0): addr:fe:ff:ff:ff:ff:ff
     config:     0
     state:      0
 2(t103-1-3): addr:06:19:81:4a:ce:38
     config:     0
     state:      0
 3(vif4.0): addr:fe:ff:ff:ff:ff:ff
     config:     0
     state:      0
 LOCAL(xapi1): addr:4a:68:9f:b2:cc:46
     config:     0
     state:      0
OFPT_GET_CONFIG_REPLY (xid=0x3): frags=normal miss_send_len=0

tcpdump

[root@vXen-2-1 log]# tcpdump -n -i xenbr2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on xenbr2, link-type EN10MB (Ethernet), capture size 65535 bytes
01:36:34.580322 IP 192.168.65.247 > 192.168.65.246: GREv0, key=0x67, length 82: IP 10.1.1.63.38795 > 10.1.1.1.domain: 1255+ A? VM2.xen2.local. (32)
01:36:34.583055 IP 192.168.65.246 > 192.168.65.247: GREv0, key=0x67, length 98: IP 10.1.1.1.domain > 10.1.1.63.38795: 1255* 1/0/0 A 10.1.1.196 (48)
01:36:34.590466 IP 192.168.65.246 > 192.168.65.247: GREv0, key=0x67, length 119: IP 10.1.1.1.domain > 10.1.1.63.60399: 40706* 1/0/0 PTR VM2.xen2.local. (69)
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel

Virtual Router Network Interfaces

Screen Shot 2013-09-16 at 5.55.12 PM

VM Instance Network Interface

Screen Shot 2013-09-16 at 5.56.40 PM

/var/log/ovstunnel.log

2013-09-16 01:07:03    DEBUG [root] Entering create_tunnel
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-vsctl', '--timeout=30', 'wait-until', 'bridge', 'xapi1', '--', 'get', 'bridge', 'xapi1', 'name']
2013-09-16 01:07:03    DEBUG [root] bridge xapi1 for creating tunnel - VERIFIED
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-vsctl', 'add-port', 'xapi1', 't103-3-1', '--', 'set', 'interface', 't103-3-1', 'type=gre', 'options:key=103', 'options:remote_ip=192.168.65.247']
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-vsctl', 'get', 'port', 't103-3-1', 'interfaces']
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-vsctl', 'get', 'interface', 'af37904a-6f7b-4391-9b84-61bd3dba0877', 'options:key']
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-vsctl', 'get', 'interface', 'af37904a-6f7b-4391-9b84-61bd3dba0877', 'options:remote_ip']
2013-09-16 01:07:03    DEBUG [root] Tunnel interface validated:['/usr/bin/ovs-vsctl', 'get', 'interface', 'af37904a-6f7b-4391-9b84-61bd3dba0877', 'options:remote_ip']
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-vsctl', 'get', 'interface', 'af37904a-6f7b-4391-9b84-61bd3dba0877', 'ofport']
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-ofctl', 'add-flow', 'xapi1', 'hard_timeout=0,idle_timeout=0,priority=1000,in_port=4,dl_dst=ff:ff:ff:ff:ff:ff,actions=drop']
2013-09-16 01:07:03    DEBUG [root] Executing:['/usr/bin/ovs-ofctl', 'add-flow', 'xapi1', 'hard_timeout=0,idle_timeout=0,priority=1000,in_port=4,ip,nw_dst=224.0.0.0/24,actions=drop']
2013-09-16 01:07:03    DEBUG [root] Broadcast drop rules added

For further reading, please see https://cwiki.apache.org/confluence/display/CLOUDSTACK/Enhancements+to+GRE-based+SDN+overlay

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

More Posts - Website

Follow Me:
TwitterLinkedIn

Published by

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

18 thoughts on “CloudStack Advanced Networking With GRE SDN Tunnels”

  1. Great article.
    Do you need to use separate physical NICs on your XEN server for each network in cloudstack or could you bond up the NICs and shove everything down the bond?

  2. Hi Luca,

    I believe you can use the same NIC in Advanced Network zone for Management, Storage, Public and Guest traffic though I have not personally tried it. Production cloud I have been involved with have chosen to go with multiple physical NICs for performance reasons.

    If you are using Advanced Network with GRE isolation, you do require a dedicated NIC for the GRE traffic.

  3. We have a prod environment where we use a single bond for everything, it works quite well.
    I’m trying to use the new GRE features, I’ll try it with a dedicated NIC.

  4. Hi Shankar – great resource here. Just wondering if we need to do anything special in CS to allow full 1500 byte frames within the GRE tunnel (i.e. make MTU 1524 bytes) or can this be set on the Xen side?

    This also makes me wonder if it means that tenants could then use VLAN tagging between hosts inside of the GRE tunnels? You’d want to add another 4 bytes to the MTU but is this even possible?

  5. Hi Adrian,

    I am not aware of any tunables on the CloudStack end to control MTUs. CloudStack mailing list would probably be a better place to check. Also, regarding the VLAN tagging between hosts, once you are on GRE, there are no VLAN tags used anywhere – There are only GRE tunnels with the guest traffic encapsulated within the tunnel. Every customer has his own dedicated GRE tunnel.

  6. What versions of Cloudstack/Xenserver was this with? It didn’t work for me with CS 4.2 and XS 6.2 (ovstunnel.log is empty which indicates a bug in CS).

  7. Hi Shanker,

    Good article. I’ve set everything up as you describe:

    – CloudStack 4.2.1 + XS61 with all patches, advanced zone, no security groups, “sdn” global settings set.
    – public + management running on VLAN networks (bonds)
    – guest traffic on a GRE network (both bond and single NIC networks tried), IP configured on the guest network
    – “VLAN range” configured

    However I can’t get the tunnels (or VMs) to come up – VMs as well as VR start fails, logs show various entries with the main one being “createTunnelNetwork failed”. The same Cloudstack runs “normal” VLAN zones successfully so I know my configuration and network is OK.

    Any ideas? I’ve scanned through the Apache Cloudstack mailing list logs (including your own posts) and whilst I see other people have had the same issue I can’t find any solutions as such.

  8. Hi,

    I have the same issue “createTunnelNetwork failed” with CloudStack 4.2.1 and XS6.2
    Have a look at the following issues:

    https://issues.apache.org/jira/browse/CLOUDSTACK-4599
    https://issues.apache.org/jira/browse/CLOUDSTACK-3174 (this seems to be the root cause)

    CLOUDSTACK-3174 is mentioned as fixed. However the fix is comprised of an updated ‘ovstunnel’ plugin along with CloudStack code fix. You can check the code review here:

    https://reviews.apache.org/r/12445/

    But it is not clear where this fix can be found in a Cloudstack supplement package for XenServer (if at all).
    I guess I am just going to try to manually patch the plugin on XenServer, assuming the CloudStack fix itself is already in 4.20, or 4.2.1

  9. Hi Florin,

    I personally consider GRE as a technology preview rather than a mature production feature.

    Unless you can commit engineering resources to GRE, I would stick to regular VLAN isolation.

    Regards.

  10. Hi,

    For our use case GRE is the key. This is because we do not need it so much for an isolation purpose, but for the encapsulation of all L2 traffic in L3 tunnels and the fact this is nicely orchestrated in an SDN fashion. So GRE and SDN are the keywords that made us choose Cloudstack in the first place. Unfortunately if it doesn’t work, we will probably have to drop CloudStack (don’t think we have enough runway to wait for a next major release). The release notes seem to claim however it is there and it actually works.

    Shanker, I have just tried a setup which from a SW version point of view is matching yours: CS 4.1.1, XenServer 6.1 I still have the following issues:


    2014-01-20 12:06:07,092 WARN [xen.resource.CitrixResourceBase] (DirectAgent-3:null) createandConfigureTunnelNetwork failed
    The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem.

    and on XenServer xensource.log:


    Jan 20 11:52:33 xenserver-129 xapi: [error|xenserver-129|818 INET 0.0.0.0:80|VIF.plug R:b57831dbf21a|xenops] Re-raising as INTERNAL_ERROR [ Object with type extra and id 267a5237-5238-4850-9601-a5017e2be9b2 does not exist in xenopsd ]

    If the setup you have tried successfully is still fresh in memory, could you provide a few more details such as:

    – how many hosts and VMs, how are they deployed (POD, Cluster, etc)
    – NIC configuration on the XenServer (how many NICs, assigned to witch traffic type, labels on them).
    – Guest network setup on CS (subnet, GW) and correspondingly on Guest traffic NIC on XenServer

    Thank you,
    Florin

  11. Hi Florin,

    I did a new GRE setup using ACS 4.2.1 and XenServer 6.2 and ran into the same issue as you are facing. Sorry, I don’t know have a solution/workaround yet.

  12. Hi Shanker,
    We are planning to work with CS 4.5.2 + XenServer6.5 with multiples zones using OVS(GRE).

    Can you please confirm if all the mentioned issues are already resolved with this version?

    Thanks for your posts, all of them are very helpful

Leave a Reply