Analyzing public cloud and hybrid networks

Public cloud and hybrid networks can be hard to debug and secure. Many of the standard tools (e.g., traceroute) do not work in the cloud setting, even though the types of paths that can emerge are highly complicated, depending on whether the endpoints are in the same region, different regions, across physical and virtual infrastructure, or whether public or private IPs of cloud instances are being used.

At same time, the fast pace of evolution of these networks, where new subnets and instances can be spun up rapidly by different groups of people, creates a significant security risk. Network engineers need tools than can provide comprehensive guaratnees that services and applications are available and secure as intended at all possible times.

In this notebook, we show how Batfish can predict and help debug network paths for cloud and hybrid networks and how it can guarantee that the network’s availability and security posture is exactly as desired.

Analytics

[1]:
# Import packages and load questions
%run startup.py

load_questions()

def show_first_trace(trace_answer_frame):
    """
    Prints the first trace in the answer frame.

    In the presence of multipath routing, Batfish outputs all traces
    from the source to destination. This function picks the first one.
    """
    if len(trace_answer_frame) == 0:
        print("No flows found")
    else:
        show("Flow: {}".format(trace_answer_frame.iloc[0]['Flow']))
        show(trace_answer_frame.iloc[0]['Traces'][0])

def is_reachable(start_location, end_location, headers=None):
    """
    Checks if the start_location can reach the end_location using specified packet headers.

    All possible headers are considered if headers is None.
    """
    ans = bfq.reachability(pathConstraints=PathConstraints(startLocation=start_location,
                                                           endLocation=end_location),
                          headers=headers).answer()
    return len(ans.frame()) > 0

Initializing the Network and Snapshot

SNAPSHOT_PATH below can be updated to point to a custom snapshot directory. See instructions for how to package data for analysis.

[2]:
# Initialize a network and snapshot
NETWORK_NAME = "hybrid-cloud"
SNAPSHOT_NAME = "snapshot"

SNAPSHOT_PATH = "networks/hybrid-cloud"

bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)
[2]:
'snapshot'

The network snapshot that we just initialized is illustrated below. It has a datacenter network with the standard leaf-spine design on the left. Though not strictly necessary, we have included a host srv-101 in this network to enable end-to-end analysis. The exit gateway of the datacenter connects to an Internet service provider (ASN 65200) that we call isp_dc.

The AWS network is shown on the right. It is spread across two regions, us-east-2 and us-west-2. Each region has two VPCs, one of which is meant to host Internet-facing services and the other is meant to host only private services. Subnets in the public-facing VPCs use an Internet gateway to send and receive traffic outside of AWS. The two VPCs in a region peer via a transit gateway. Each VPC has two subnets, and we have some instances running as well.

The physical network connects to the AWS network using IPSec tunnels, shown in pink, between exitgw and the two transit gateways. BGP sessions run atop these tunnels to make endpoints aware of prefixes on the other side.

You can view configuration files that we used here. The AWS portion of the configuration is in the aws_configs subfolder. It has JSON files obtained via AWS APIs. An example script that packages AWS data into a Batfish snapshot is here.

hybrid-cloud-network

Analyzing network paths

Batfish can help analyze cloud and hybrid networks by showing how exactly traffic flows (or not) in the network, which can help debug and fix configuration errors. Batfish can also help ensure that the network is configured exactly as desired, with respect to reachability and security policies. We illustrate these types of analysis below.

First, lets define a couple of maps to help with the analysis.

[3]:
#Instances in AWS in each region and VPC type (public, private)
hosts = {}
hosts["east2_private"] = "i-04cd3db5124a05ee6"
hosts["east2_public"] = "i-01602d9efaed4409a"
hosts["west2_private"] = "i-0a5d64b8b58c6dd09"
hosts["west2_public"] = "i-02cae6eaa9edeed70"

#Public IPs of instances in AWS
public_ips = {}
public_ips["east2_public"] = "13.59.144.125" # of i-01602d9efaed4409a
public_ips["west2_public"] = "54.191.42.182" # of i-02cae6eaa9edeed70

Paths across VPCs within an AWS region

To see how traffic flows between two instances in the same region but across different VPCs, say, from hosts["east2_private"] to hosts["east2_public"], we can run a traceroute query across them as follows.

In the query below, we use the name of the instance as the destination for the traceroute. This makes Batfish pick the instance’s private (i.e., non-Elastic) IP (10.20.1.207). It does not pick the public IP because that those IPs do not reside on instances but are used by the Internet gateway to NAT instance’s traffic in and out of AWS (see documentation). If an instance has multiple private IPs, Batfish will pick one at random. To make Batfish use a specific IP, supply that IP as the argument to the dstIps parameter.

[4]:
# traceroute between instances in the same region, using SSH
ans = bfq.traceroute(startLocation=hosts["east2_private"],
                     headers=HeaderConstraints(dstIps=hosts["east2_public"],
                                               applications="ssh")).answer()
show_first_trace(ans.frame())
'Flow: start=i-04cd3db5124a05ee6 [10.30.1.166:49152->10.20.1.207:22 TCP length=512]'
ACCEPTED
1. node: i-04cd3db5124a05ee6
  ORIGINATED(default)
  FORWARDED(ARP IP: 10.30.1.1, Output Interface: eni-05452497daf80ccb3, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:10.30.1.1)])
  PERMITTED(~EGRESS_ACL~eni-05452497daf80ccb3 (EGRESS_FILTER))
  SETUP_SESSION(Incoming Interfaces: [eni-05452497daf80ccb3], Action: Accept, Match Criteria: [ipProtocol=TCP, srcIp=10.20.1.207, dstIp=10.30.1.166, srcPort=22, dstPort=49152])
  TRANSMITTED(eni-05452497daf80ccb3)
2. node: subnet-0cb5f4c094bee5214
  RECEIVED(to-instances)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0276455718806058a-vrf-tgw-attach-021f89744fac566dd, Routes: [static (Network: 10.20.0.0/16, Next Hop IP:169.254.0.1)])
  PERMITTED(acl-0b3d0f6b0978f09f8_egress (EGRESS_FILTER))
  TRANSMITTED(vpc-0276455718806058a-vrf-tgw-attach-021f89744fac566dd)
3. node: vpc-0276455718806058a
  RECEIVED(subnet-0cb5f4c094bee5214-vrf-tgw-attach-021f89744fac566dd)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: tgw-06b348adabd13452d-tgw-rtb-00e37bc5142347b03, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  TRANSMITTED(tgw-06b348adabd13452d-tgw-rtb-00e37bc5142347b03)
4. node: tgw-06b348adabd13452d
  RECEIVED(vpc-0276455718806058a-tgw-rtb-00e37bc5142347b03)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0574d08f8d05917e4-tgw-rtb-00e37bc5142347b03, Routes: [static (Network: 10.20.0.0/16, Next Hop IP:169.254.0.1)])
  TRANSMITTED(vpc-0574d08f8d05917e4-tgw-rtb-00e37bc5142347b03)
5. node: vpc-0574d08f8d05917e4
  RECEIVED(tgw-06b348adabd13452d-tgw-rtb-00e37bc5142347b03)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: subnet-06a692ed4ef84368d-vrf-tgw-attach-0648110513acd6de5, Routes: [static (Network: 10.20.1.0/24, Next Hop IP:169.254.0.1)])
  TRANSMITTED(subnet-06a692ed4ef84368d-vrf-tgw-attach-0648110513acd6de5)
6. node: subnet-06a692ed4ef84368d
  RECEIVED(vpc-0574d08f8d05917e4-vrf-tgw-attach-0648110513acd6de5)
  PERMITTED(acl-09c0bb4e71ae5f9e4_ingress (INGRESS_FILTER))
  FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: to-instances, Routes: [connected (Network: 10.20.1.0/24, Next Hop IP:AUTO/NONE(-1l))])
  TRANSMITTED(to-instances)
7. node: i-01602d9efaed4409a
  RECEIVED(eni-01997085076a9b98a)
  PERMITTED(~SECURITY_GROUP_INGRESS_ACL~ (INGRESS_FILTER))
  SETUP_SESSION(Originating VRF: default, Action: FibLookup, Match Criteria: [ipProtocol=TCP, srcIp=10.20.1.207, dstIp=10.30.1.166, srcPort=22, dstPort=49152])
  ACCEPTED(eni-01997085076a9b98a)

The trace above shows how traffic goes from host["east2_private"] to host["east2_public"] – via the source subnet and VPC, then to the transit gateway, and finally to the destination VPC and subnet. Along the way, it also shows where the flow encounters security groups (at both instances) and network ACLs (at subnets). In this instance, all security groups and network ACLs permit this particular flow.

This type of insight into traffic paths, which helps understand and debug network configuration, is difficult to obtain otherwise. Traceroutes on the live AWS network do not yield any information if the flow does not make it through, and do not show why or where a packet is dropped.

Paths across AWS regions

The traceroute query below shows paths across instances in two different regions.

[5]:
# traceroute between instances across region using the destination's private IP
ans = bfq.traceroute(startLocation=hosts["east2_public"],
                     headers=HeaderConstraints(dstIps=hosts["west2_public"],
                                              applications="ssh")).answer()
show_first_trace(ans.frame())
'Flow: start=i-01602d9efaed4409a [10.20.1.207:49152->10.40.2.80:22 TCP length=512]'
DENIED_OUT
1. node: i-01602d9efaed4409a
  ORIGINATED(default)
  FORWARDED(ARP IP: 10.20.1.1, Output Interface: eni-01997085076a9b98a, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:10.20.1.1)])
  PERMITTED(~EGRESS_ACL~eni-01997085076a9b98a (EGRESS_FILTER))
  SETUP_SESSION(Incoming Interfaces: [eni-01997085076a9b98a], Action: Accept, Match Criteria: [ipProtocol=TCP, srcIp=10.40.2.80, dstIp=10.20.1.207, srcPort=22, dstPort=49152])
  TRANSMITTED(eni-01997085076a9b98a)
2. node: subnet-06a692ed4ef84368d
  RECEIVED(to-instances)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0574d08f8d05917e4-vrf-igw-02fd68f94367a67c7, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  PERMITTED(acl-09c0bb4e71ae5f9e4_egress (EGRESS_FILTER))
  TRANSMITTED(vpc-0574d08f8d05917e4-vrf-igw-02fd68f94367a67c7)
3. node: vpc-0574d08f8d05917e4
  RECEIVED(subnet-06a692ed4ef84368d-vrf-igw-02fd68f94367a67c7)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: igw-02fd68f94367a67c7, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  TRANSMITTED(igw-02fd68f94367a67c7)
4. node: igw-02fd68f94367a67c7
  RECEIVED(vpc-0574d08f8d05917e4)
  PERMITTED(~DENY~UNASSOCIATED~PRIVATE~IPs~ (INGRESS_FILTER))
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: backbone, Routes: [bgp (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  TRANSFORMED(SOURCE_NAT srcIp: 10.20.1.207 -> 13.59.144.125)
  TRANSMITTED(backbone)
5. node: isp_16509
  RECEIVED(To-igw-02fd68f94367a67c7)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-Internet, Routes: [bgp (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  DENIED(Block outgoing traffic using reserved addresses (EGRESS_FILTER))

We see that such traffic does not reach the destination but instead is dropped by the AWS backbone (ASN 16509). This happens because, in our network, there is no (transit gateway or VPC) peering between VPCs in different regions. So, the source subnet is unaware of the address space of the destination subnet, which makes it use the default route that points to the Internet gateway (igw-02fd68f94367a67c7). The Internet gateway forwards the packet to aws-backbone, after NAT’ing its source IP. The packet is eventually dropped as it is using a private address as destination. Recall that using the instance name as destination amounts to using its private IP.

The behavior is different if we use the public IP instead, as shown below.

[6]:
# traceroute betwee instances across region using the destination's public IP
ans = bfq.traceroute(startLocation=hosts["east2_public"],
                     headers=HeaderConstraints(dstIps=public_ips["west2_public"],
                                              applications="ssh")).answer()
show_first_trace(ans.frame())
'Flow: start=i-01602d9efaed4409a [10.20.1.207:49152->54.191.42.182:22 TCP length=512]'
ACCEPTED
1. node: i-01602d9efaed4409a
  ORIGINATED(default)
  FORWARDED(ARP IP: 10.20.1.1, Output Interface: eni-01997085076a9b98a, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:10.20.1.1)])
  PERMITTED(~EGRESS_ACL~eni-01997085076a9b98a (EGRESS_FILTER))
  SETUP_SESSION(Incoming Interfaces: [eni-01997085076a9b98a], Action: Accept, Match Criteria: [ipProtocol=TCP, srcIp=54.191.42.182, dstIp=10.20.1.207, srcPort=22, dstPort=49152])
  TRANSMITTED(eni-01997085076a9b98a)
2. node: subnet-06a692ed4ef84368d
  RECEIVED(to-instances)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0574d08f8d05917e4-vrf-igw-02fd68f94367a67c7, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  PERMITTED(acl-09c0bb4e71ae5f9e4_egress (EGRESS_FILTER))
  TRANSMITTED(vpc-0574d08f8d05917e4-vrf-igw-02fd68f94367a67c7)
3. node: vpc-0574d08f8d05917e4
  RECEIVED(subnet-06a692ed4ef84368d-vrf-igw-02fd68f94367a67c7)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: igw-02fd68f94367a67c7, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  TRANSMITTED(igw-02fd68f94367a67c7)
4. node: igw-02fd68f94367a67c7
  RECEIVED(vpc-0574d08f8d05917e4)
  PERMITTED(~DENY~UNASSOCIATED~PRIVATE~IPs~ (INGRESS_FILTER))
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: backbone, Routes: [bgp (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  TRANSFORMED(SOURCE_NAT srcIp: 10.20.1.207 -> 13.59.144.125)
  TRANSMITTED(backbone)
5. node: isp_16509
  RECEIVED(To-igw-02fd68f94367a67c7)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-igw-0a8309f3192e7cea3, Routes: [bgp (Network: 54.191.42.182/32, Next Hop IP:169.254.0.1)])
  TRANSMITTED(To-igw-0a8309f3192e7cea3)
6. node: igw-0a8309f3192e7cea3
  RECEIVED(backbone)
  TRANSFORMED(DEST_NAT dstIp: 54.191.42.182 -> 10.40.2.80)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-00b65e98077106059, Routes: [static (Network: 10.40.0.0/16, Next Hop IP:169.254.0.1)])
  TRANSMITTED(vpc-00b65e98077106059)
7. node: vpc-00b65e98077106059
  RECEIVED(igw-0a8309f3192e7cea3)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: subnet-06005943afe32f714-vrf-igw-0a8309f3192e7cea3, Routes: [static (Network: 10.40.2.0/24, Next Hop IP:169.254.0.1)])
  TRANSMITTED(subnet-06005943afe32f714-vrf-igw-0a8309f3192e7cea3)
8. node: subnet-06005943afe32f714
  RECEIVED(vpc-00b65e98077106059-vrf-igw-0a8309f3192e7cea3)
  PERMITTED(acl-087574e8620270842_ingress (INGRESS_FILTER))
  FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: to-instances, Routes: [connected (Network: 10.40.2.0/24, Next Hop IP:AUTO/NONE(-1l))])
  TRANSMITTED(to-instances)
9. node: i-02cae6eaa9edeed70
  RECEIVED(eni-087e18628dadd9b48)
  PERMITTED(~SECURITY_GROUP_INGRESS_ACL~ (INGRESS_FILTER))
  SETUP_SESSION(Originating VRF: default, Action: FibLookup, Match Criteria: [ipProtocol=TCP, srcIp=10.40.2.80, dstIp=13.59.144.125, srcPort=22, dstPort=49152])
  ACCEPTED(eni-087e18628dadd9b48)

This traceroute starts out like the previous one, up until the AWS backbone (isp_16509) – from source subnet to the Internet gateway which forwards it to the backbone, after source NAT’ing the packet. The backbone carries it to the internet gateway in the destination region (igw-0a8309f3192e7cea3), and this gateway NATs the packet’s destination from the public IP to the instance’s private IP.

Connectivity between DC and AWS

A common mode to connect to AWS is using VPNs and BGP, that is, establish IPSec tunnels between exit gateways on the physical side and AWS gateways and run BGP on top of these tunnels to exchange prefixes. Incompatibility in either IPSec or BGP settings on the two sides means that connectivity between the DC and AWS will not work.

Batfish can determine if the two sides are compatibly configured with respect to IPSec and BGP settings and if those sessions will come up.

The query below lists the status of all IPSec sessions between the exitgw and AWS transit gateways (specified using the regular expression ^tgw- that matches those node names). This filtering lets us ignore any other IPSec sessions that may exist in our network and focus on DC-AWS connectivity.

[7]:
# show the status of all IPSec tunnels between exitgw and AWS transit gateways
ans = bfq.ipsecSessionStatus(nodes="exitgw", remoteNodes="/^tgw-/").answer()
show(ans.frame())
Node Node_Interface Node_IP Remote_Node Remote_Node_Interface Remote_Node_IP Tunnel_Interfaces Status
0 exitgw exitgw[GigabitEthernet3] 147.75.69.27 tgw-0888a76c8a371246d tgw-0888a76c8a371246d[external-vpn-0dc7abdb974ff8a69-2] 44.227.244.7 Tunnel4 -> vpn-vpn-0dc7abdb974ff8a69-2 IPSEC_SESSION_ESTABLISHED
1 exitgw exitgw[GigabitEthernet3] 147.75.69.27 tgw-06b348adabd13452d tgw-06b348adabd13452d[external-vpn-01c45673532d3e33e-2] 52.14.53.162 Tunnel2 -> vpn-vpn-01c45673532d3e33e-2 IPSEC_SESSION_ESTABLISHED
2 exitgw exitgw[GigabitEthernet3] 147.75.69.27 tgw-06b348adabd13452d tgw-06b348adabd13452d[external-vpn-01c45673532d3e33e-1] 3.19.24.131 Tunnel1 -> vpn-vpn-01c45673532d3e33e-1 IPSEC_SESSION_ESTABLISHED
3 exitgw exitgw[GigabitEthernet3] 147.75.69.27 tgw-0888a76c8a371246d tgw-0888a76c8a371246d[external-vpn-0dc7abdb974ff8a69-1] 34.209.88.227 Tunnel3 -> vpn-vpn-0dc7abdb974ff8a69-1 IPSEC_SESSION_ESTABLISHED

In the output above, we see all expected tunnels. Each transit gateways has two established sessions to exitgw. The default AWS behavior is to have two IPSec tunnels between gateways and physical nodes.

Now that we know IPSec tunnels are working, we can check BGP sessions. The query below lists the status of all BGP sessions where one end is an AWS transit gateway.

[8]:
# show the status of all BGP sessions between exitgw and AWS transit gateways
ans = bfq.bgpSessionStatus(nodes="exitgw", remoteNodes="/^tgw-/").answer()
show(ans.frame())
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
0 exitgw default 65100 None 169.254.25.162 64512 tgw-06b348adabd13452d None 169.254.25.161 IPV4_UNICAST EBGP_SINGLEHOP ESTABLISHED
1 exitgw default 65100 None 169.254.172.2 64512 tgw-06b348adabd13452d None 169.254.172.1 IPV4_UNICAST EBGP_SINGLEHOP ESTABLISHED
2 exitgw default 65100 None 169.254.215.82 64512 tgw-0888a76c8a371246d None 169.254.215.81 IPV4_UNICAST EBGP_SINGLEHOP ESTABLISHED
3 exitgw default 65100 None 169.254.252.78 64512 tgw-0888a76c8a371246d None 169.254.252.77 IPV4_UNICAST EBGP_SINGLEHOP ESTABLISHED

The output above shows that all BGP sessions are established as expected.

Paths from the DC to AWS

Finally, lets look at paths from the datacenter to AWS. The query below does that using the private IP of the public instance in us-east-2 region.

[9]:
# traceroute from DC host to an instances using private IP
ans = bfq.traceroute(startLocation="srv-101",
                     headers=HeaderConstraints(dstIps=hosts["east2_public"],
                                              applications="ssh")).answer()
show_first_trace(ans.frame())
'Flow: start=srv-101 [203.0.113.12:49152->10.20.1.207:22 TCP length=512]'
ACCEPTED
1. node: srv-101
  ORIGINATED(default)
  FORWARDED(ARP IP: 203.0.113.2, Output Interface: eth0, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:203.0.113.2)])
  TRANSMITTED(eth0)
2. node: leaf1
  RECEIVED(Ethernet10)
  FORWARDED(ARP IP: 10.10.11.1, Output Interface: Ethernet1, Routes: [bgp (Network: 10.20.0.0/16, Next Hop IP:10.10.11.1)])
  TRANSMITTED(Ethernet1)
3. node: spine1
  RECEIVED(Ethernet1)
  FORWARDED(ARP IP: 10.10.100.2, Output Interface: Ethernet10, Routes: [bgp (Network: 10.20.0.0/16, Next Hop IP:10.10.100.2)])
  TRANSMITTED(Ethernet10)
4. node: exitgw
  RECEIVED(GigabitEthernet1)
  FORWARDED(ARP IP: 169.254.25.161, Output Interface: Tunnel1, Routes: [bgp (Network: 10.20.0.0/16, Next Hop IP:169.254.25.161)])
  TRANSMITTED(Tunnel1)
5. node: tgw-06b348adabd13452d
  RECEIVED(vpn-vpn-01c45673532d3e33e-1)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0574d08f8d05917e4-tgw-rtb-00e37bc5142347b03, Routes: [static (Network: 10.20.0.0/16, Next Hop IP:169.254.0.1)])
  TRANSMITTED(vpc-0574d08f8d05917e4-tgw-rtb-00e37bc5142347b03)
6. node: vpc-0574d08f8d05917e4
  RECEIVED(tgw-06b348adabd13452d-tgw-rtb-00e37bc5142347b03)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: subnet-06a692ed4ef84368d-vrf-tgw-attach-0648110513acd6de5, Routes: [static (Network: 10.20.1.0/24, Next Hop IP:169.254.0.1)])
  TRANSMITTED(subnet-06a692ed4ef84368d-vrf-tgw-attach-0648110513acd6de5)
7. node: subnet-06a692ed4ef84368d
  RECEIVED(vpc-0574d08f8d05917e4-vrf-tgw-attach-0648110513acd6de5)
  PERMITTED(acl-09c0bb4e71ae5f9e4_ingress (INGRESS_FILTER))
  FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: to-instances, Routes: [connected (Network: 10.20.1.0/24, Next Hop IP:AUTO/NONE(-1l))])
  TRANSMITTED(to-instances)
8. node: i-01602d9efaed4409a
  RECEIVED(eni-01997085076a9b98a)
  PERMITTED(~SECURITY_GROUP_INGRESS_ACL~ (INGRESS_FILTER))
  SETUP_SESSION(Originating VRF: default, Action: FibLookup, Match Criteria: [ipProtocol=TCP, srcIp=10.20.1.207, dstIp=203.0.113.12, srcPort=22, dstPort=49152])
  ACCEPTED(eni-01997085076a9b98a)

We see that this traffic travels on the IPSec links between the datacenter’s exitgw and the transit gateway in the destination region (tgw-06b348adabd13452d), and then makes it to the destination instance after making it successfully past the network ACL on the subnet node and the security group on the instance.

A different path emerges if we use the public IP of the same instance, as shown below.

[10]:
# traceroute from DC host to an instances using public IP
ans = bfq.traceroute(startLocation="srv-101",
                     headers=HeaderConstraints(dstIps=public_ips["east2_public"],
                                              applications="ssh")).answer()
show_first_trace(ans.frame())
'Flow: start=srv-101 [203.0.113.12:49152->13.59.144.125:22 TCP length=512]'
ACCEPTED
1. node: srv-101
  ORIGINATED(default)
  FORWARDED(ARP IP: 203.0.113.2, Output Interface: eth0, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:203.0.113.2)])
  TRANSMITTED(eth0)
2. node: leaf1
  RECEIVED(Ethernet10)
  FORWARDED(ARP IP: 10.10.11.1, Output Interface: Ethernet1, Routes: [bgp (Network: 0.0.0.0/0, Next Hop IP:10.10.11.1)])
  TRANSMITTED(Ethernet1)
3. node: spine1
  RECEIVED(Ethernet1)
  FORWARDED(ARP IP: 10.10.100.2, Output Interface: Ethernet10, Routes: [bgp (Network: 0.0.0.0/0, Next Hop IP:10.10.100.2)])
  TRANSMITTED(Ethernet10)
4. node: exitgw
  RECEIVED(GigabitEthernet1)
  FORWARDED(ARP IP: 147.75.69.26, Output Interface: GigabitEthernet3, Routes: [static (Network: 0.0.0.0/0, Next Hop IP:147.75.69.26)])
  TRANSMITTED(GigabitEthernet3)
5. node: isp_65200
  RECEIVED(To-exitgw)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-Internet, Routes: [bgp (Network: 0.0.0.0/0, Next Hop IP:169.254.0.1)])
  PERMITTED(Block outgoing traffic using reserved addresses (EGRESS_FILTER))
  TRANSMITTED(To-Internet)
6. node: internet
  RECEIVED(To-isp_65200)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-isp_16509, Routes: [bgp (Network: 13.59.144.125/32, Next Hop IP:169.254.0.1)])
  TRANSMITTED(To-isp_16509)
7. node: isp_16509
  RECEIVED(To-Internet)
  PERMITTED(Block incoming traffic using reserved addresses (INGRESS_FILTER))
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-igw-02fd68f94367a67c7, Routes: [bgp (Network: 13.59.144.125/32, Next Hop IP:169.254.0.1)])
  TRANSMITTED(To-igw-02fd68f94367a67c7)
8. node: igw-02fd68f94367a67c7
  RECEIVED(backbone)
  TRANSFORMED(DEST_NAT dstIp: 13.59.144.125 -> 10.20.1.207)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0574d08f8d05917e4, Routes: [static (Network: 10.20.0.0/16, Next Hop IP:169.254.0.1)])
  TRANSMITTED(vpc-0574d08f8d05917e4)
9. node: vpc-0574d08f8d05917e4
  RECEIVED(igw-02fd68f94367a67c7)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: subnet-06a692ed4ef84368d-vrf-igw-02fd68f94367a67c7, Routes: [static (Network: 10.20.1.0/24, Next Hop IP:169.254.0.1)])
  TRANSMITTED(subnet-06a692ed4ef84368d-vrf-igw-02fd68f94367a67c7)
10. node: subnet-06a692ed4ef84368d
  RECEIVED(vpc-0574d08f8d05917e4-vrf-igw-02fd68f94367a67c7)
  PERMITTED(acl-09c0bb4e71ae5f9e4_ingress (INGRESS_FILTER))
  FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: to-instances, Routes: [connected (Network: 10.20.1.0/24, Next Hop IP:AUTO/NONE(-1l))])
  TRANSMITTED(to-instances)
11. node: i-01602d9efaed4409a
  RECEIVED(eni-01997085076a9b98a)
  PERMITTED(~SECURITY_GROUP_INGRESS_ACL~ (INGRESS_FILTER))
  SETUP_SESSION(Originating VRF: default, Action: FibLookup, Match Criteria: [ipProtocol=TCP, srcIp=10.20.1.207, dstIp=203.0.113.12, srcPort=22, dstPort=49152])
  ACCEPTED(eni-01997085076a9b98a)

We now see that the traffic traverses the Internet via isp_65200 and the Internet gateway (igw-02fd68f94367a67c7), which NATs the destination address of the packet from the public to the private IP.

Evaluating the network’s availability and security

In addition to helping you understand and debug network paths, Batfish can also help ensure that the network is correctly configured with respect to its availability and security policies.

As examples, the queries below evaluate which instances are or are not reachable from the Internet.

[11]:
# compute which instances are open to the Internet
reachable_from_internet = [key for (key, value) in hosts.items() if is_reachable("internet", value)]
print("\nInstances reachable from the Internet: {}".format(sorted(reachable_from_internet)))

# compute which instances are NOT open to the Internet
unreachable_from_internet = [key for (key, value) in hosts.items() if not is_reachable("internet", value)]
print("\nInstances NOT reachable from the Internet: {}".format(sorted(unreachable_from_internet)))
Instances reachable from the Internet: ['east2_public', 'west2_public']

Instances NOT reachable from the Internet: ['east2_private', 'west2_private']

We see that Batfish correctly computes that the two instances in the public subnets are accessible from the Internet, and the other two are not.

We can compare the answers produced by Batfish to what is expected based on network policy. This comparison can ensure that all instances that are expected to host public-facing services are indeed reachable from the Internet, and all instances that are expecpted to host private services are indeed not accessible from the Internet.

We can similarly compute which instances are reachable from hosts in the datacenter, using the query like the following.

[12]:
# compute which instances are reachable from data center
reachable_from_dc = [key for (key,value) in hosts.items() if is_reachable("srv-101", value)]
print("\nInstances reachable from the DC: {}".format(sorted(reachable_from_dc)))
Instances reachable from the DC: ['east2_private', 'east2_public', 'west2_private', 'west2_public']

We see that all four instances are accessible from the datacenter host.

Batfish allows a finer-grained evaluation of security policy as well. In our network, our intent is that the public instances should only allow SSH traffic. Let us see if this invariant actually holds.

[13]:
tcp_non_ssh = HeaderConstraints(ipProtocols="tcp", dstPorts="!22")
reachable_from_internet_non_ssh = [key for (key, value) in hosts.items()
                                   if is_reachable("internet", value, tcp_non_ssh)]
print("\nInstances reachable from the Internet with non-SSH traffic: {}".format(
    sorted(reachable_from_internet_non_ssh)))
Instances reachable from the Internet with non-SSH traffic: ['east2_public']

We see that, against our policy, the public-facing instance allows non-SSH traffic. To see examples of such traffic, we can run the following query.

[14]:
ans = bfq.reachability(pathConstraints=PathConstraints(startLocation="internet",
                                                       endLocation=hosts["east2_public"]),
                      headers=tcp_non_ssh).answer()
show_first_trace(ans.frame())
'Flow: start=internet interface=out [8.8.8.8:49152->13.59.144.125:3306 TCP length=512]'
ACCEPTED
1. node: internet
  RECEIVED(out)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-isp_16509, Routes: [bgp (Network: 13.59.144.125/32, Next Hop IP:169.254.0.1)])
  TRANSMITTED(To-isp_16509)
2. node: isp_16509
  RECEIVED(To-Internet)
  PERMITTED(Block incoming traffic using reserved addresses (INGRESS_FILTER))
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: To-igw-02fd68f94367a67c7, Routes: [bgp (Network: 13.59.144.125/32, Next Hop IP:169.254.0.1)])
  TRANSMITTED(To-igw-02fd68f94367a67c7)
3. node: igw-02fd68f94367a67c7
  RECEIVED(backbone)
  TRANSFORMED(DEST_NAT dstIp: 13.59.144.125 -> 10.20.1.207)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: vpc-0574d08f8d05917e4, Routes: [static (Network: 10.20.0.0/16, Next Hop IP:169.254.0.1)])
  TRANSMITTED(vpc-0574d08f8d05917e4)
4. node: vpc-0574d08f8d05917e4
  RECEIVED(igw-02fd68f94367a67c7)
  FORWARDED(ARP IP: 169.254.0.1, Output Interface: subnet-06a692ed4ef84368d-vrf-igw-02fd68f94367a67c7, Routes: [static (Network: 10.20.1.0/24, Next Hop IP:169.254.0.1)])
  TRANSMITTED(subnet-06a692ed4ef84368d-vrf-igw-02fd68f94367a67c7)
5. node: subnet-06a692ed4ef84368d
  RECEIVED(vpc-0574d08f8d05917e4-vrf-igw-02fd68f94367a67c7)
  PERMITTED(acl-09c0bb4e71ae5f9e4_ingress (INGRESS_FILTER))
  FORWARDED(ARP IP: AUTO/NONE(-1l), Output Interface: to-instances, Routes: [connected (Network: 10.20.1.0/24, Next Hop IP:AUTO/NONE(-1l))])
  TRANSMITTED(to-instances)
6. node: i-01602d9efaed4409a
  RECEIVED(eni-01997085076a9b98a)
  PERMITTED(~SECURITY_GROUP_INGRESS_ACL~ (INGRESS_FILTER))
  SETUP_SESSION(Originating VRF: default, Action: FibLookup, Match Criteria: [ipProtocol=TCP, srcIp=10.20.1.207, dstIp=8.8.8.8, srcPort=3306, dstPort=49152])
  ACCEPTED(eni-01997085076a9b98a)

We thus see that our misconfigured public instance allows TCP traffic to port 3306 (MySQL).

In this and earlier reachability queries, we are not specifying anything about the flow to Batfish. It automatically figures out that the flow from the Internet that can reach hosts["east2_public"] must have 13.59.144.125 as its destination address, which after NAT’ing becomes the private IP of the instance. Such exhaustive analysis over all possible header spaces is unique to Batfish, which makes it an ideal tool for comprehensive availability and security analysis.

Batfish can also diagnose why certain traffic makes it past security groups and network ACLs. For example, we can run the testFilters question as below to reveal why the flow above made it past the security group on hosts["east2_public"].

[15]:
flow=ans.frame().iloc[0]['Flow']  # the rogue flow uncovered by Batfish above
ans = bfq.testFilters(nodes=hosts["east2_public"],
                      filters="~SECURITY_GROUP_INGRESS_ACL~",
                      headers=HeaderConstraints(srcIps=flow.srcIp,
                                                dstIps="10.20.1.207", # destination IP after the NAT at Step 3 above
                                                srcPorts=flow.srcPort,
                                                dstPorts=flow.dstPort,
                                                ipProtocols=flow.ipProtocol)).answer()
show(ans.frame())
Node Filter_Name Flow Action Line_Content Trace
0 i-01602d9efaed4409a ~SECURITY_GROUP_INGRESS_ACL~ Start Location: i-01602d9efaed4409a
Src IP: 8.8.8.8
Src Port: 49152
Dst IP: 10.20.1.207
Dst Port: 3306
IP Protocol: TCP
PERMIT Security Group launch-wizard-1
  • Matched security group launch-wizard-1
    • Matched rule with description Connectivity test
      • Matched protocol TCP
      • Matched destination port 3306
      • Matched source address CIDR IP 0.0.0.0/0

The “Trace” column shows that the flow was permitted because the security group “launch-wizard-1” has a matching rule called “Connectivity test.” (Perhaps someone added this rule to test connectivity but forgot to remove it.)

Such introspection capability is indispensable for complex security groups and network ACLs. See this notebook for a more detailed illustration of these capabilities of Batfish.

Summary

Batfish allows you to analyze, debug, and secure your cloud and hybrid networks. It can shed light on different types of traffic paths between different types of endpoints (e.g., intra-region, cross-region, across hybrid links), and it can reveal the detailed availability and security posture of the network.

Want to learn more? Come find us on Slack or GitHub