Monitoring AWS IP Availability

Running virtual private cloud, or VPC, on AWS is common, and even required in some cases.

VPCs are virtual networks. They logically group AWS resources. Those instances are either connected to the internet or not thanks to subnets. All the relevant information is very well documented on Amazon’s website.

Resources can be added to VPCs, and subnets as long as those have available IPs. Running out of addresses will make it impossible to launch new instances. This will impact services that automatically scale, like lambdas.

There are many opinions on how to pick the right size, and I doubt mine has any real value. Instead, I will show how to monitor available IPs.

Code

To avoid dealing with a full server, the application is best as an AWS Lambda. Any runtime with a quick cold start should work. I picked JavaScript.

The describeSubnets function, on the EC2 object, return subnets.

const AWS = require("aws-sdk");
const EC2 = new AWS.EC2();

EC2.describeSubnets({}, (err, data) => {
  if (err !== null) console.error(err, err.stack);
  if (data !== null) console.log(data);
});

The result, data, contains only a single page. Using the NextToken parameter is a solution, but the eachPage, and the eachItem function is easier.

const AWS = require("aws-sdk");
const EC2 = new AWS.EC2();

EC2.describeSubnets().eachItem((err, data) => {
  if (err !== null) console.error(err, err.stack);
  if (data !== null) console.log(data);
});

AWS requests have another limitation. describeSubnets only returns subnets for a given region. The one in the environment, or an explicit one when creating the EC2 object.

const AWS = require("aws-sdk");
const EC2 = new AWS.EC2({
  region: "us-east-1"
});

To retrieve all subnets, the application must call the function with each region.

const AWS = require("aws-sdk");

const forEachSubnet = (region, callback) => {
  const EC2 = new AWS.EC2({
    region: region
  });

  EC2.describeSubnets().eachItem((err, data) => {
    if (err !== null) console.error(err, err.stack);
    if (data !== null) callback(data);
  });
};

forEachSubnet("us-east-1", console.log);
forEachSubnet("us-east-2", console.log);

Hard coding a list of regions isn’t a viable solution. describeRegions is preferable. It retrieves every available region on AWS.

const AWS = require("aws-sdk");
const EC2 = new AWS.EC2();

const forEachRegion = callback => {
  EC2.describeRegions().eachItem((err, data) => {
    if (err !== null) console.error(err, err.stack);
    if (data !== null) callback(data);
  });
};

forEachRegion(region =>
  forEachSubnet(region.RegionName, console.log)
);

The subnets objects have useful information to extract. Identifiers and the AvailableIpAddressCount attribute are obvious choices. The total amount of IP addresses isn’t available on the object, but via the CidrBlock. Instead of reinventing the wheel, the ip-cidr package offers a quick solution.

const IPCIDR = require("ip-cidr");

const ipAddressCount = cidrStr => {
  const cidr = new IPCIDR(cidrStr);
  return cidr.isValid() ? cidr.toArray().length : 0;
};

Once gathered, the application can push the data to AWS CloudWatch for alarms.

const CloudWatch = require("./aws/cloudwatch");

const params = {
  Namespace: "subnet-ip-availability",
  MetricData: ...
};

CloudWatch.putMetricData(params, (err, data) => {
  if (err !== null) console.error(err, err.stack);
});

With the code finished, it needs to be deployed to AWS.

Infrastructure

AWS has three official ways to create resources: the console, the command line interface, and CloudFormation. The last option takes a bit of time, but guaranties the infrastructure to be the same each time.

The main resource is the AWS Lambda function. It requires a role.

AWSTemplateFormatVersion: "2010-09-09"
Description: AWS Lambda IP Availability
Parameters:
  Name:
    Type: String
    Default: "aws-lambda-ip-availability"
Resources:
  Role:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Ref Name
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Action: "sts:AssumeRole"
            Principal:
              Service: "lambda.amazonaws.com"
  Function:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Ref Name
      Role: !GetAtt Role.Arn
      Runtime: "nodejs8.10" # nodejs10.x doesn't support zip file
      Handler: "index.handler"
      Code:
        ZipFile: "//"

AWS Lambda writes to CloudWatch Logs. Their role needs the permissions to interact with Log Groups and Log Streams.

LogGroup:
  Type: AWS::Logs::LogGroup
  Properties:
    RetentionInDays: 7
    LogGroupName: !Join [ "", [ "/aws/lambda/", !Ref Name ] ]
RoleCloudWatchLog:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Join [ "", [ !Ref Name, "-cloudwatch-log" ] ]
    PolicyDocument:
      Version: "2012-10-17"
      Statement:
        -
          Effect: "Allow"
          Action: "logs:CreateLogGroup"
          Resource: !Join [ "", [ "arn:aws:logs:", !Ref "AWS::Region", ":", !Ref "AWS::AccountId", ":log-group:", !Ref LogGroup ] ]
        -
          Effect: "Allow"
          Action:
            - "logs:CreateLogStream"
            - "logs:PutLogEvents"
          Resource: !GetAtt LogGroup.Arn
    Roles:
      - !Ref Role

The application also interacts with EC2 and CloudWatch.

RoleEc2:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Join [ "", [ !Ref Name, "-ec2" ] ]
    PolicyDocument:
      Version: "2012-10-17"
      Statement:
        -
          Effect: "Allow"
          Action:
            - "ec2:DescribeRegions"
            - "ec2:DescribeSubnets"
          Resource: "*"
    Roles:
      - !Ref Role
RoleCloudWatchMetric:
  Type: AWS::IAM::Policy
  Properties:
    PolicyName: !Join [ "", [ !Ref Name, "-cloudwatch-metric" ] ]
    PolicyDocument:
      Version: "2012-10-17"
      Statement:
        -
          Effect: "Allow"
          Action: "cloudwatch:PutMetricData"
          Resource: "*"
    Roles:
      - !Ref Role

The template can also hold a periodic event to trigger the application.

Event:
  Type: AWS::Events::Rule
  Properties:
    Name: !Ref Name
    ScheduleExpression: "rate(1 hour)"
    Targets:
      -
        Id: "Target-1"
        Arn: !GetAtt Function.Arn
EventPermission:
  Type: AWS::Lambda::Permission
  Properties:
    Principal: "events.amazonaws.com"
    Action: "lambda:InvokeFunction"
    FunctionName: !Ref Function
    SourceArn: !GetAtt Event.Arn

Every execution puts metrics on CloudWatch to trigger alarms.

Alarms

AWS CloudWatch Alarm monitor metrics. When certain conditions are met, a notification is sent to an SNS topic. All subscribers will receive it too. Those can be email addresses, phone numbers, HTTP endpoints, and more.

The metric should either be the amount of available IPs, or the percentage of remaining ones.

When this metric is below an acceptable threshold, the alarm should message the SNS topic. This will draw attention to the subnet to add another or replace it by a larger one.

Adding alarms for every subnet is quite a repetitive task. Using the CLI, or building a small application can make the process less of a chore. Something for the next post … maybe.

Running out of available IPs is a silly move that can happen to anyone. The errors are rarely displayed in the appropriate service making it very hard to debug. This little application, available on GitHub, will hopefully avoid a few sleepless nights.