Dendad Trainer Blog

Saturday, 6 July 2024

Building a Custom AMI and AutoScaling Group using CloudFormation Custom Resources

I recently had to the challenge of demonstrating an AWS EC2 AutoScaling Group.

To set up this demo, I clearly needed to build an AutoScaling Group, Launch Template, Target Group and Application Load Balancer.

However, to get EC2 instances with the necessary software (such as Apache Webserver, and some specific web server pages), I need to identify a suitable AMI.

For demonstration purposes, I therefore needed to build a web server using the relevant User Data, and snapshot this to build a custom AMI which could be referenced by the Launch Template.

Since I use AWS CloudFormation (wherever possible) to build all my demos, this meant that I needed to define a CloudFormation Custom Resource to create the AMI.

This blog explains how I did this. I assume you are basically familiar with CloudFormation templates, so I have not included all the details of each step; instead I have highlighted the main tasks and key code snippets.

The final demo files are available in GitHub here:

https://github.com/dendad-trainer/simple-aws-demos/tree/main/custom-ami-autoscaling

Defining the WebServer

The first task is to define the web server which will be used as the basis for building our AMI.

For cost reasons, I decided to build this using AWS Graviton, using ARM architecture. The Parameter section of the CloudFormation Template looks like this:

Parameters:

AMZN2023LinuxAMIId:

Type : AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>

Default: /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-arm64

Having defined a VPC with Public and Private subnets, I could then define the EC2 instance. The key part of this Resource Definition is as follows:

Resources:
WebInstance:
Type: 'AWS::EC2::Instance'
Properties:
ImageId: !Ref AMZN2023LinuxAMIId
InstanceType: t4g.micro

.....

UserData:
Fn::Base64: !Sub |
#!/bin/bash -ex
# Use latest Amazon Linux 2023
dnf update -y
dnf install -y httpd php-fpm php php-devel
/usr/bin/systemctl enable httpd
/usr/bin/systemctl start httpd
cd /var/www/html
cat <<EOF > index.php
<?php
?>
<!DOCTYPE html>
<html>
<head>
<title>Amazon AWS Demo Website</title>
</head>
<body>
<h2>Amazon AWS Demo Website</h2>
<table border=1>
<tr><th>Meta-Data</th><th>Value</th></tr>
<?php
# Get the instance ID
echo "<tr><td>InstanceId</td><td><i>";
echo shell_exec('ec2-metadata --instance-id');
"</i></td><tr>";
# Instance Type
echo "<tr><td>Instance Type</td><td><i>";
echo shell_exec('ec2-metadata --instance-type');
"</i></td><tr>";
# AMI ID
echo "<tr><td>AMI</td><td><i>";
echo shell_exec('ec2-metadata --ami-id');
"</i></td><tr>";
# User Data
echo "<tr><td>User Data</td><td><i>";
echo shell_exec('ec2-metadata --user-data');
"</i></td><tr>";
# Availability Zone
echo "<tr><td>Availability Zone</td><td><i>";
echo shell_exec('ec2-metadata --availability-zone');
"</i></td><tr>";
?>
</table>
</body>
</html>
EOF
# Sleep to ensure that the file system is synced before the snapshot is taken
sleep 120
# Signal to say its OK to create an AMI from this Instance.
/opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} \

--region ${AWS::Region} --resource AMICreate

This builds the web server that will be used as the basis of the AMI for the AutoScaling Group.

I placed this web server into a Public Subnet, and protected by a Security Group with allows port 80. I also added port 22 to the Security Group Rules, so that I was able to connect to this instance using EC2 Instance Connect.

Synchronising AMI creation

Experience proved that there is a risk that the Snapshot for the AMI could be created before the EC2 filesystem has been fully 'sync'd to disk. To address this risk, I could either use the "reboot" option when creating the Snapshot, or put a suitable 'sleep' command in the Instance UserData to give the Volume time to sync.

I defined a Wait Condition for the /cfn-signal command to trigger, so that the AMI Snapshot only starts after the UserData has completed. The following part of the Resource section captures this:

AMICreate:
Type: AWS::CloudFormation::WaitCondition
CreationPolicy:
ResourceSignal:
Timeout: PT10M

Defining the Custom Resource AMI

The AMI is built using a CloudFormation Custom Resource, which depends upon the Wait Condition:

AMIBuilder:
Type: Custom::AMI
DependsOn: AMICreate
Properties:
ServiceToken: !GetAtt AMIFunction.Arn
InstanceId: !Ref WebInstance

Note that this Custom Resource depends upon the AMICreate wait condition, and requires the InstanceId of the Webserver.

The Lambda Function 'AMIFunction' does the actual processing for this custom resource.

Defining the Lambda Function for the Custom Resource

The Lambda Function is written in Python. This requires a LambdaExecutionRole, which I have included in the GitHub distribution. I have not included all the syntax for this, but just to mention that the following AWS Services need to be enabled:

LambdaExecutionRole:

Type: AWS::IAM::Role

Properties:

RoleName: "Demo-LambdaExecutionRoleForAMIBuilder"

AssumeRolePolicyDocument:

.......

- 'ec2:DescribeInstances'

- 'ec2:DescribeImages'

- 'ec2:CreateImage'

- 'ec2:DeregisterImage'

- 'ec2:CreateSnapshots'

- 'ec2:DescribeSnapshots'

- 'ec2:DeleteSnapshot'

- 'ec2:CreateTags'

- 'ec2:DeleteTags'

The Actual Lambda function itself needs to extract the InstanceId of the EC2 Webserver that was created earlier, and the event RequestType. The latter is set to 'Create', 'Update' or 'Delete', which is the action which the calling CloudFormation stack is currently performing.

def handler(event, context):

# Init ...

rtype = event['RequestType']

print("The event is: ", str(rtype) )

responseData = {}

ec2api = boto3.client('ec2')

image_available_waiter = ec2api.get_waiter('image_available')

# Retrieve parameters

instanceId = event['ResourceProperties']['InstanceId']

The main processing block of this Lambda function needs to handle the 'Update' request type. When the CloudFormation stack is updated, it is assumed that an earlier AMI and snapshot were created. Therefore, the 'Update' code should first delete the old AMI and snapshot, before executing the 'Create' code to build a new one. The following code will therefore find the old snapshot and delete it, an then de-register the custom AMI. Note that this depends upon the old AMI and snapshot having been specifically tagged in the earlier 'Create' operation:

# Main processing block

try:

if rtype in ('Delete', 'Update'):

# deregister the AMI and delete the snapshot

print ("Getting AMI ID")

res = ec2api.describe_images( Filters=[{'Name': 'name', 'Values': ['DemoWebServerAMI']}])

print ("De-registering AMI")

ec2api.deregister_image( ImageId=res['Images'][0]['ImageId'] )

print ("Getting snapshot ID")

res = ec2api.describe_snapshots( Filters=[{'Name': 'tag:Name', 'Values': ['DemoWebServerSnapshot']}])

print ("Deleting snapshot")

ec2api.delete_snapshot( SnapshotId= res['Snapshots'][0]['SnapshotId'] )

responseData['SnapshotId']=res['Snapshots'][0]['SnapshotId']

The next part of this code block deals with the creation of the snapshot and AMI itself, and ensures that they have the correct tags which will be referenced when the stack is deleted later on:

if rtype in ('Create', 'Update'):

# create the AMI

print ("Creating AMI and waiting")

res = ec2api.create_image(

Description='Demo AMI created for autoscaling group',

InstanceId=instanceId,

Name='DemoWebServerAMI',

NoReboot=True,

TagSpecifications=[ {'ResourceType': 'image',

'Tags': [ {'Key': 'Name', 'Value': 'DemoWebServerAMI'} ]},

{'ResourceType': 'snapshot',

'Tags': [ {'Key': 'Name', 'Value': 'DemoWebServerSnapshot'} ]}]

)

image_available_waiter.wait ( ImageIds=[res['ImageId']] )

responseData['ImageId']=res['ImageId']

Finally, I use the cfnresponse utility to send back the signal. The ImageId is sent back in the responseData structure:

# Everything OK... send the signal back

print("Operation successful!")

cfnresponse.send(event,

context,

cfnresponse.SUCCESS,

responseData)

except Exception as e:

print("Operation failed...")

print(str(e))

responseData['Data'] = str(e)

cfnresponse.send(event,

context,

cfnresponse.FAILED,

responseData)

#return True

Defining the AutoScaling Group using the AMI

The remainder of the CloudFormation template was relatively straightforward, defining the ElasticLoadBalancingV2 resources.

The resources I needed to define in CloudFormation are:

AWS::ElasticLoadBalancingV2::TargetGroup
AWS::ElasticLoadBalancingV2::LoadBalancer
AWS::ElasticLoadBalancingV2::Listener
AWS::EC2::LaunchTemplate
AWS::AutoScaling::AutoScalingGroup
AWS::AutoScaling::ScalingPolicy

The EC2 Launch Template references back to the Id of the AMI which I created earlier. The following is a snippet of this part of the CloudFormation template file.

DemoAutoScalingLaunchTemplate:

Type: AWS::EC2::LaunchTemplate

Properties:

LaunchTemplateData:

ImageId: !GetAtt AMIBuilder.ImageId

InstanceType: 't4g.micro'

Monitoring:

Enabled: 'true'

SecurityGroupIds:

- !Ref WebSecurityGroup

Conclusions

This turned out to be a helpful exercise in creating CloudFormation Custom resources, and also passing back attributes using the cfnresponse.send utility.

Specific lessons I have noted are:

Allow time for the EBS volume on the source server to be sync'd. If not, you need to use the "reboot" option when building the AMI
Ensure that the 'Update' scenario is coded for in the Lambda function.
Proper tagging of resources, specifically the EBS snapshot and the AMI, ensure that you can reference them at some future date.

To conclude, the following snippet shows how to extract the attributes from the initial web server, the load balancer and the ImageId which is passed back by the cfnresponse.send utility from the Lambda function:

Outputs:

WebServer:

Value: !GetAtt WebInstance.PublicIp

Description: Public IP address of Web Server

AMIBuilderRoutine:

Value: !GetAtt 'AMIBuilder.ImageId'

Description: Image created by the AMI Builder routine

DNSName:

Value: !GetAtt DemoLoadBalancer.DNSName

Description: The DNS Name of the Elastic Load Balancer

The full CloudFormation template is available in GitHub here:

https://github.com/dendad-trainer/simple-aws-demos/tree/main/custom-ami-autoscaling

Saturday, 7 October 2023

Provisioning Amazon EC2 using just IPv6 (no IPv4)

This blog was prompted by the AWS announcement in July 2023 that with effect from February 1, 2024 there will be a charge for all public IPv4 addresses, whether or not it is attached to an EC2 instance:

https://aws.amazon.com/blogs/aws/new-aws-public-ipv4-address-charge-public-ip-insights/

For some customers, this could become a significant cost. The following AWS blog does give some insight into how to reduce costs, by using Elastic Load Balancing or NET Gateways to reduce your usage of IPv4 address space:

https://aws.amazon.com/blogs/networking-and-content-delivery/identify-and-optimize-public-ipv4-address-usage-on-aws/

However, this got me thinking about whether it is possible to completely bypass IPv4 and just use IPv6 for EC2 instances? This would save the costs of using IPv4 public addresses.

In practice, there will be many cases where you will have to continue to use IPv4. There may be clients which want to access your AWS services, but do not support IPv6. One example I came across is a home router used by a well-known home broadband provider. Whilst my laptop and browser support IPv6, I cannot run it from home because this router does not currently allow it. Home ISPs, please note!

Also, some AWS services do not support IPv6 yet. See the following URL for the full list:

https://docs.aws.amazon.com/vpc/latest/userguide/aws-ipv6-support.html

So, in practice, we may have no choice but to implement a so-called "dual stack" approach, at least in the short term.

However, if we disregard these issues, is it possible to create a single IPv6-only environment? To answer that, we first need to understand some of the key differences between IPv4 and IPv6.

IPv6 compared with IPv4

Aside from the different numbering terminology (IPv4 uses a dotted decimal 32-bit notation, whereas IPv6 uses hex notation for a 128-bit number), one key difference between the different systems is the use of public-facing and private addressing.

In IPv4, there are certain address ranges defined by RFC 1918 which the Internet Assigned Numbers Authority (IANA) has reserved for private networks:

10.0.0.0 - 10.255.255.255 (10.0.0.0/8)
172.16.0.0 - 172.31.255.255 (172.16.0.0/12)
192.168.0.0 - 192.168.255.255 (192.168.0.0/16)

These are very familiar to anyone who has looked into their home internet router (192.168.0.254 anyone?) or the AWS Default VPC which typically uses the 172.16.0.0/12 range. Anyone provisioning a VPC is encouraged to use one of the RFC 1918 address ranges.

With IPv4, if an EC2 instance needs to send/receive traffic to/from the Internet via an IGW, it has to be provisioned with two address; one for internal use, from the VPC subnet number range, and a second Public address which is accessible to the public. This Public IP address is either an "Elastic IP address" (one already pre-allocated to the AWS account), or a Public IP address allocated at instance launch, which would be released when the instance is shut down.

It is these Elastic IP address and allocated Public IP addresses which will become chargeable.

With IPv6, on the other hand, there is only one IPv6 address for an Instance. That address is either visible to the Internet (in the case of a Public subnet), or hidden in a Private Subnet without routing to the Internet. And these addresses are not chargeable.

Is an IPv6-only EC2 Stack possible?

Suppose you have an EC2 instance hosting a web server is using an IPv4 Public IP address. If you browse to your website (e.g. http://ec2instance.amazonaws.com ), the DNS service will resolve that domain name into a public IPv4 address such as 78.65.4.80. Subsequently, the IPv4 protocol will send packets to that address.

With IPv6, on the other hand, we want the domain name to be resolved into an IPv6 address such as 2001:::c4:e6:86. Assuming that our browser supports IPv6 (most modern ones will do), then it will send packets via IPv6 to that address.

In order to implement this, we need to make some changes to our underlying VPC and subnets, and to the launching of the EC2 instances themselves.

VPC Configuration changes

It is worth pointing out that you should not use the Default VPC when designing your network architecture. The Default VPC uses IPv4 subnets, and is configured with the Auto-assign public IPv4 address set to "Yes". This allocates (chargeable) IPv4 addresses to EC2 instances when they are launched.

When we provision our own VPC, we come across a minor problem. Since Amazon VPC seems to use IPv4 internally, you must first specify an IPv4 address range. This address range could be quite small, but we must specify it anyway.

We then need a suitable IPv6 address range. However, unlike IPv4, we cannot easily specify this ourselves. Instead, AWS provides one fixed size (/56) IPv6 CIDR block. Large enterprises may want to control their own IPv6 address allocation. This can be done by using "bring your own IPv6" (BYOIPv6). But for the purpose of this blog, we can accept the address allocation from Amazon.

The following graphic shows the AWS console page for provisioning the VPC:

So the key changes we need to make to a VPC are:

Provision the VPC as a Dual Stack with both IPv4 private address range, and an IPv6 address range
Provision an Internet Gateway and attach it to the VPC, as before.

The following graphic shows the provisioned VPC:

Having specified an IPv4 address range, and received an Amazon-allocated IPv6 range, we can then go ahead and provision subnets.

Public Subnet changes

Typically, Public Subnets not only have a routing table entry to an Internet Gateway, they also have the auto-assign public IPv4 address option enabled. But we want to provision the subnet to use IPv6 only. The IPv6 address of the EC2 instance will be accessible from the internet. So in the console we need to choose the "IPv6-only" check-box, and select a CIDR block, as the following graphic shows:

The key changes we need to make in a Public Subnet are:

Provision the subnet to be IPv6 only.
Disable the auto-assign public IP address option.
Ensure that DNS hostnames and DNS resolution are both enabled.
Add an IPv6 default address route (i.e. :/0) to the Internet Gateway. But we do not need a default address route for IPv4.
Ensure that the security group allows incoming traffic from the internet (:/0) on IPv6, and that the Network ACL (NACL) also allows the traffic using IPv6

The following graphic shows an IPv6-only public subnet routing table, with just IPv6 routing to the Internet Gateway.

Note that there is no "default route" for IPv4, since there are no IPv4 addresses associated with this subnet.

Private Subnet and Firewall changes

As with Public subnets, we can also provision private subnets to only use IPv6.

Private subnets, by definition, do not have access to/from the Internet. But if we have an EC2 instance in a Private Subnet that requires outgoing access to the Internet, we need to provision an "egress only internet gateway" (instead of a NAT gateway), and route default IPV6 traffic to it.

Unlike a NAT gateway, which translates a private IP address into a public one, the egress only internet gateway simply exposes the IPv6 address to the internet without requiring any translation.

The key changes we need to make in a Private Subnet are:

Provision the subnet to be IPv6 only.
Provision an egress only internet gateway within the VPC.
Add an IPv6 default address route (i.e. :/0) to the egress only internet gateway. This will expose the IPv6 address to the internet for Outgoing requests, but block requests coming from outside. We do not need a default address route for IPv4.

The following graphic is the routing table for an IPv6-only private subnet, with a "default route" to the egress only internet gateway, which we have already provisioned and attached to this VPC:

Now we are able to launch an EC2 instance with just IPv6.

EC2 configurations

Amazon EC2 supports launching instances into IPv6-only subnets provided that they are based on the Nitro System.

If we provision our EC2 instance inside a Public subnet, it will receive just an IPv6 address only. The following extract from the AWS console shows the "Auto-assign public IP" (meaning version 4) disabled, and the "Auto-assign IPv6 IP" enabled.

Once the EC2 instance has been launched, you can "ping" its IPv6 address, using the "ping6" command, or use an online site such as https://subnetonline.com/pages/ipv6-network-tools/online-ipv6-ping.php (don't forget to enable ICMP for IPv6 in the Security Group). Here is an example of the output of "ping6":

The EC2 instance will have a domain name (hostname). This will differ from the hostname that you may be familiar with when using IPv4, as follows:

When you launch an instance with IPv4, the private IPv4 address of the instance is included in the hostname. When used as the Private DNS hostname, it will only return the private IPv4 address (A record).
When you launch an instance with IPv6, the EC2 instance ID is included in the hostname of the instance. When used as the Private DNS hostname, it can return both the private IPv4 address (A record) and/or the IPv6 Global Unicast Address (AAAA record).

If you run "nslookup" you should see this domain name resolve to an IPv6 address. You should then be able to browse to this address. When using "nslookup", you can add the "-q=aaaa" option to ask nslookup to return the resolved IPv6 addresses from the "AAAA" record instead of the normal "A" record:

Other Factors to consider

In this blog, I have only addressed the challenge of EC2 instances. I have ignored services such as Elastic Load Balancers, NAT Gateway or AWS Global Accelerators. And I have ignored complications such as VPC peering, and communicating with pre-existing IPv4 services.

There are other reasons why you might need to provision IPv4. For example, you might want to use SSH or RDP to connect to the instance. For this use-case, you can use EC2 Instance Connect, which now has the ability to connect to your instances using private IPv4 addresses.

For more information on how to take advantage of IPv6, and thereby save yourself money in using AWS services, I have used the following AWS resources:

If we were using a dual stack VPC, AWS helps IPv6 AWS resources communicate with IPv4 resources by supporting DNS64 on the subnets and NAT64 on the NAT gateways.

Finally, do look up the following web pages which cover IPv6 on AWS the IPv6 White Paper and the list of AWS Services that Support IPv6.

IPv6 is certainly the way to go with networking in future. Let's hope that AWS continue to bring more initiatives to make this technology easier to implement.

Tuesday, 16 May 2023

Using AWS CloudFormation Custom Resources to populate an S3 bucket

Like many people using AWS, I am drawn to AWS CloudFormation as a quick way to build infrastructure and even full end-to-end application stacks in a programmable, repeatable way.

CloudFormation makes use of Template YAML or JSON files as a Declarative Language. That means that you define what AWS resources you want to build, and it is up to CloudFormation to determine how to build these resources, and in what order.

In many cases, resources could be built in parallel, and that is the approach that CloudFormation typically uses. If there are references in one Resource block to another Resource block (such as an EC2 resource block defines a '!Ref' to a VPC Subnet block), then CloudFormation is able to identify these dependencies and commence the building of one resource block once the dependent block has completed.

However, if you have used CloudFormation for any reasonable length of time, you have probably come across the "circular reference gotcha" !

For example, suppose you define a CloudFormation Resource such as a CloudFront Distribution, referencing an S3 Bucket. Then, you need to define an Access Origin Control Setting to enable access to this Bucket. And finally, you need to update the S3 Bucket resource with a Bucket Policy to enable access using this Access Origin Control Setting. In the first case, you have created an implicit dependency; the S3 Bucket must be created first in order to associate the CloudFront Distribution to it. But on the other hand, you need to update the S3 Bucket policy after the CloudFront Distribution has been created.

Take another use-case: I want to create an S3 Bucket, and then subsequently upload data into that Bucket - maybe inline, or maybe by referencing a GitHub or CodeCommit Repo. Unfortunately, there is no mechanism inside CloudFormation to update contents to an S3 Bucket.

In both cases, one solution is to make use of AWS CloudFormation Lambda-backed Custom Resources. These are documented here: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources-lambda.html

They provide one mechanism for customising CloudFormation to address the "circular reference gotcha", and the "can't do it here" problem.

Taking the second simple example - populating an S3 Bucket with a file - the first step is to create a Lambda Resource inside your CloudFormation Template. In this case, I have written some code which will create an 'index.html' file and upload it to the S3 Bucket:

A few things to note about this code:

This example is written in Python. It uses the Python SDK ('boto3') to make API calls to the Amazon S3 service in order to upload a file which it firstly creates on the /tmp folder of the Lambda runtime environment. However, you could use other languages if you prefer.
Make sure you return a cfnresponse to CloudFormation. This is easy to forget; in which case it results in your CloudFormation Stack hanging until it times out !
The event['RequestType'] is used to identify whether CloudFormation is reading the template file in order to do a "Create", an "Update" on the Stack, or rolling back the Stack (a "Delete"). If you are creating a file on an S3 Bucket, don't forget to include code for the "Delete" action.
The event block also includes an event['ResourceProperties']. This is where you define which dependent resources you wish for this custom resource. In this case, we have referenced the name of the S3 Bucket in the CloudFormation template.

The code itself is very straightforward. It simply creates the file and uploads it to the bucket name which has been passed as a resource property.

Having created the Lambda function, there are just a couple of other things to add. Firstly, you need to create a Role for the Lambda function to make use of. This is referenced as MakeIndexLambdaExecutionRole in the example above.

Secondly, you need to tie everything together by defining the Custom resource itself. This is a very straightforward piece of code, since all you need to do is give it a name, and reference to the Lambda function, and the resource properties you wish to pass it.

Since this Custom Resource will run after the other resources that depend upon it, you can resolve any circular dependencies, by using a Custom Resource to update any existing resource once it has been created.

Happy AWS Building !

Dennis (Dendad Trainer)

The AWS CloudFormation Documentation set is here:

https://docs.aws.amazon.com/cloudformation/index.html

My GitHub Repository with example CloudFormation Templates:

https://github.com/dendad-trainer/simple-aws-demos