Github Repo
All the code required can be found in the github repo: github.com/edrandall-dev/kubernetes-on-ec2
Introduction
Last year, when I was learning Kubernetes, I wanted to create my own cluster on AWS using EC2 instances. The idea behind this was go through the installation of Kubernetes from start to finish, learning everything I needed to along the way.
I’ve also used Amazon’s Elastic Kubernetes Service (EKS) to deploy Neo4j using the official Neo4j Helm Charts. Info on that can be found here.
When I originally created the terraform and ansible code for this endeavour, I’ll admit that I got a bit carried away. I created a Makefile designed to handle every possible part of the process and complex shell scripts which existed alongside the ansible playbooks. If I’m honest, this extra effort did very little to support my learning and made the resulting code a lot more difficult to understand later.
As a result, I’ve decided bite the bullet and re-visit this project, listing out (and explaining) the manual steps instead of trying to abstract away the true complexity. Of course, the argument for automation has long since been fought and won but how many times a day do we really need to spin up and tear down an entire kubernetes cluster with a single command? In the end, I was just troubleshooting automation and scripting, instead of moving forward and actually doing stuff with Kubernetes. I suppose it was a bit like tidying your desk rather than getting started on that important project.
If you’re still with me, then purpose of this post should now be pretty obvious: Break down and document the steps taken to manually create a kubernetes cluster on AWS with a focus on documenting and learning.
Later, I went on to repeat this exercise on a local workstation using vagrant VMs, which I’ve documented in this post .
Pre-requisite Steps
As you’d expect, there are a number of important pre-requisites that we need to satisfy in order to deploy things onto a cloud using terraform, and configuring it remotely.
- Clone the git repository
This git repository contains a simplified (yes, really) version of the code which you should clone to your local machine (or development environment).
git clone https://github.com/edrandall-dev/kubernetes-on-ec2
- Install the AWS CLI and configure access
If needed, follow the instructions to download and install the AWS command line interface (CLI). Once the CLI tool is installed, the following files will need to be created in your local environment:
~/.aws/config
[default]
region = us-east-1
~/.aws/credentials
[default]
aws_access_key_id = AJKYYIUJJLKX72VON324KL
aws_secret_access_key = 5COOK762PASS3BABTRIDGEXSZ26KVkJKJ4FP
To create a new aws_access_key_id
and aws_secret_access_key
, log into your AWS account, go to the security credentials menu and create a new aws cli key pair. The keypair shown here obviously isn’t real!
NOTE: Remember that the
aws_access_key_id
andaws_secret_access_key
give access to your AWS account. These should not be shared, or uploaded to GitHub.
Once you have obtained your aws_access_key_id
and aws_secret_access_key
The aws configure
command will create the two files shown above. Alternatively you can create the files manually, in the format shown.
If configured correctly, a simple command like aws s3 ls
should confirm that everything is working by providing a list of the s3 buckets in your AWS account (assuming, of course, that you have the IAM permissions in order to do that.)
- Create SSH key to connect to the environment
After the EC2 instances are running, we’ll need SSH keys to log into them to perform the various installation tasks.
Use the ssh-keygen
command to create a new SSH keypair. In the first step, change the variable SSH_KEY_NAME
to whatever you want to name your key. I use the date
command to generate a timestamp, just to make it easier to see when the key was created:
SSH_KEY_NAME="key_$(date "+%Y-%m-%d_%H%M")"
ssh-keygen -N "" -q -t rsa -b 4096 -C "$SSH_KEY_NAME" -f $SSH_KEY_NAME
NOTE: We’ll also be relying on the contents of the variable
SSH_KEY_NAME
in later steps. Be sure to set it again manually if you end up using a different terminal session.
With the key created, you’ll need to upload it to AWS.
aws ec2 import-key-pair --key-name "$SSH_KEY_NAME" --public-key-material fileb://$SSH_KEY_NAME.pub
Check the key has been successfully uploaded to AWS, like this:
% aws ec2 describe-key-pairs
{
"KeyFingerprint": "3d:09:3d:35:b0:8f:d1:e1:qd:3d:57:8h:aq:a9:1c:a5",
"KeyName": "key_2023-08-15_0934",
"KeyPairId": "key-09e05c960bb3d8e97"
}
- Install Terraform
On a mac, we can easily install terraform using homebrew, as easily as:
brew install terraform
Once installed, double-check that the command is available in your shell’s search path:
% which terraform
/opt/homebrew/bin/terraform
You can check the version using:
terraform version all
If the either of these commands gives an error, refer back to terraform’s installation instructions in order to troubleshoot.
- Install Ansible
Full installation instructions for Ansible can be found within the ansible documentation However, if you’re using a mac you’ll probably find homebrew to be easiest:
brew install ansible
- Install jq
We’ll also need jq later to parse some json output from terraform. Again, it should be easily to obtain with homebrew (on a mac):
brew install jq
Deploy the cloud resources in AWS
First, we need to initialise terraform:
terraform init
We are now ready to use terraform to deploy the cloud resources that we need for our environment. However, before starting the deployment, review the terraform.tfvars
file and customise the variable values to suit you:
region = "us-east-1"
base_cidr_block = "192.168.0.0/16"
creator = "Ed Randall"
qty_k8s_cp_instances = 1
qty_k8s_worker_instances = 4
instance_types = {
"k8s_cp_instance" = "t3.small",
"k8s_worker_instance" = "t3.small"
}
env_prefix = "my-k8s-env"
NOTE: You should still be in the
terraform-ansible
directory and the theSSH_KEY_NAME
variable should still contain the name of your ssh key.
If you are happy with the values inside the terraform.tfvars
file, issue the following commands to start the terraform apply process:
terraform apply -auto-approve -var="public_key_path=$SSH_KEY_NAME.pub"
If needed, the environment can be torn down with:
terraform destroy -auto-approve -var="public_key_path=$SSH_KEY_NAME.pub"
Terraform will provide a lot of information about the resources being created, along with outputs which look like the following:
Apply complete! Resources: 21 added, 0 changed, 0 destroyed.
Outputs:
control_plane_public_ips = [
"13.53.205.52",
]
worker_public_ips = [
"16.171.161.205",
"13.53.41.85",
"16.171.129.198",
"16.170.243.115",
]
This output shows us the public ipv4 addresses of the control plane node, and each of the worker nodes which have been created by terraform.
Deployment diagram
The following diagram shows which cloud resources are deployed in EC2.
Install Kubernetes on EC2 using Ansible
Now that the cloud environment has been created by terraform, we need to do 2 things before we can execute the ansible playbooks:
- Create an ansible.cfg file
- Create an inventory file which will contain details of the EC2 instances which have been created by terraform
Create ansible config File
An ansible config file can be created a file by copying and pasting the following lines into a new file called ansible.cfg
.
[defaults]
timeout = 60
inventory=ansible_inventory.ini
private_key_file=
host_key_checking=false
deprecation_warnings=False
remote_user=ec2-user
interpreter_python=auto_silent
[privilege_escalation]
become=True
become_method=sudo
become_user=root
You will need to paste the key name of your private key file against the private_key_file=
parameter. If the variable is still set, you should be able to view this with echo $SSH_KEY_NAME
The following command will append your keyname to the correct line in the config file.
sed -i '' "/private_key_file=/s/$/$SSH_KEY_NAME/" ansible.cfg
Generate ansible inventory
In order for ansible to perform actions, it needs to be given an “inventory”. This is a list of the servers (listed by IP address or hostname) that are going to managed by ansible. In order to generate this easily, I wrote a simple script called create-inventory.sh
, which can be found in the scripts
directory and looks like this:
#!/bin/bash
#
# Script: create-inventory.sh
# Purpose: This script can be used to create an inventory for ansible using
# terraform's outputs. It can only be run after terraform has finished
# creating the new environment in AWS. This script also appends some
# variables to the bottom of the file, which will be needed by ansible.
#
# Get public IPs of EC2 instances
worker_ips=($(terraform output -json worker_public_ips | jq -r '.[]'))
ctrlplane_ip=($(terraform output -json control_plane_public_ips | jq -r '.[]'))
# Get private IP of ctrl-plane instance
ctrlplane_private_ip=($(terraform output -json ctrlplane_private_ip | jq -r '.[]'))
#Get instance ids of EC2 instances
worker_ids=($(terraform output -json worker_instance_ids | jq -r '.[]'))
ctrlplane_ids=($(terraform output -json ctrl_plane_instance_ids | jq -r '.[]'))
# Get ctrlplane worker instance info
echo "[ctrlplane_instances]"
echo "ctrlplane-instance ansible_host=${ctrlplane_ip} ansible_user=ec2-user"
echo
# Loop through instances and populate the inventory file
echo "[worker_instances]"
for i in "${!worker_ids[@]}"; do
echo "worker-instance-${i} ansible_host=${worker_ips[i]} ansible_user=ec2-user"
done
echo
echo "[all:vars]"
echo ctrl_plane_private_ip=${ctrlplane_private_ip}
echo pod_network_cidr="10.244.0.0/16"
Generate the ansible inventory with this command:
./create-inventory.sh > ansible_inventory.ini
You will then have an ansible inventory file called ansible_inventory.ini
which looks something like this:
[ctrlplane_instances]
ctrlplane-instance ansible_host=16.170.249.23 ansible_user=ec2-user
[worker_instances]
worker-instance-0 ansible_host=16.171.250.17 ansible_user=ec2-user
worker-instance-1 ansible_host=16.171.196.198 ansible_user=ec2-user
worker-instance-2 ansible_host=16.170.245.34 ansible_user=ec2-user
worker-instance-3 ansible_host=16.171.148.130 ansible_user=ec2-user
[all:vars]
ctrl_plane_private_ip=192.168.2.237
pod_network_cidr=10.244.0.0/16
NOTE: There are several other ways to do this, but I thought that this simple script was an easy way to generate the inventory and demonstrate what’s required.
Run ansible playbooks
With ansible installed and configured, the playbooks can now be executed. I have divided the configuration into 3 separate ansible playbooks which are designed to be executed in order:
-
1-k8s-pre-flight.yaml
Some “pre-flight” checks and tasks which are common to both control plane, and worker nodes
- Turn off SELinux
- Upgrade operating system packages
- Configure package repositories for kubernetes & containerd and install them
- Configure necessary kernel modules
- Update the /etc/hosts file with names and IP addresses of all hosts
-
2-k8s-cp-instance-prep.yaml
Configuration of the control plane node
- Use kubeadm to create the kubernetes cluster
- Start the kubelet service
- Create a kublet config file
- Intall the CNI (Calico)
-
3-k8s-worker-instance-prep.yaml
Configuration of the worker nodes
- Generate the join command & execute it on the worker nodes
- Fetch the kubelet config file
The playbooks should be executed in order, with the ansible-playbook
command:
ansible-playbook 1-k8s-pre-flight.yaml
ansible-playbook 2-k8s-cp-instance-prep.yaml
ansible-playbook 3-k8s-worker-instance-prep.yaml
Once completed, the kubernetes cluster should be up and running on EC2. Test with:
% kubectl get nodes
NAME STATUS ROLES AGE VERSION
ctrl-plane-1 Ready control-plane 24m v1.28.0
worker-1 Ready <none> 22m v1.28.0
worker-2 Ready <none> 22m v1.28.0
worker-3 Ready <none> 22m v1.28.0
worker-4 Ready <none> 22m v1.28.0
Conclusion
If the above steps are followed correctly, you should now have access to a kubernetes cluster which is running inside Amazon Web Services, using EC2 instances. The purpose of this exercise was to facilitate learning, and not to create a “production ready” environment. As such there are several design items within this setup which may not follow “best practice” architectural principles.
Outstanding items
There are a few other tweaks that I’d like to apply to this environment when I have time. They include
- Creating a “multi-zone” environment which has worker nodes distributed across different availability zones in AWS
- Separate security groups for the control plane and worker nodes.