The Digital Cat - Lambdahttps://www.thedigitalcatonline.com/2021-03-22T17:00:00+01:00Adventures of a curious cat in the land of programmingAWS Log Insights as CloudWatch metrics with Python and Terraform2021-03-22T17:00:00+01:002021-03-22T17:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-03-22:/blog/2021/03/22/aws-log-insights-as-cloudwatch-metrics-with-python-and-terraform/<p> A step-by-step report on how to build a Lambda function with Terraform and Python to convert Log Insights queries into CloudWatch metrics</p><p>Recently I started using <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html">AWS CloudWatch Log Insights</a> and I find the tool really useful to extract data about the systems I'm running without having to set up dedicated monitoring tools, which come with their own set of permissions, rules, configuration language, and so forth.</p><p>Log Insights allow you to query log outputs with a language based on regular expressions with hints of SQL and to produce tables or graphs of quantities that you need to monitor. For example, the system I am monitoring runs Celery in ECS containers that log received tasks with a line like the following</p><div class="code"><div class="content"><div class="highlight"><pre>16:39:11,156 [32mINFO [0m [34m[celery.worker.strategy][0m [01mReceived task: lib.tasks.lists.trigger_list_log_notification[9b33b464-d4f9-4909-8d4e-1a3134fead97] [0m
</pre></div> </div> </div><p>In this case the specific function in the system that was triggered is <code>lib.tasks.log_notification</code>, and I'm interested in knowing which functions are called the most, so I can easily count them with</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) as number by task
| sort number desc
| limit 9
</pre></div> </div> </div><p>This gives me a nice table of the top 9 <code>source</code> functions and the number of <code>task</code> submitted for each, and the time frame can be adjusted with the usual CloudWatch controls</p><div class="code"><div class="content"><div class="highlight"><pre>1 lib.tasks.lists.trigger_list_log_notification 4559
2 lib.tasks.notify.notify_recipient 397
3 lib.message._send_mobile_push_notification 353
4 lib.tasks.jobs.check_job_cutoffs 178
5 lib.tasks.notify.check_message_cutoffs 177
6 lib.tasks.notify.check_notification_retry 177
7 lib.tasks.notify.async_list_response 81
8 lib.tasks.hmrc_poll.govtalk_periodic_poll 59
9 lib.tasks.lists.recalculate_list_entry 56
</pre></div> </div> </div><p>Using time bins, quantities can also be easily plotted. For example, I can process and visualise the number of received tasks with</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) by bin(30s)
</pre></div> </div> </div><p>Unfortunately I quickly discovered an important limitation of Log Insights, that is <strong>queries are not metrics</strong>. Which also immediately implies that I can't set up alarms on those queries. As fun as it is to look at nice plots, I need something automatic that sends me messages or scales up systems in reaction to specific events such as "too many submitted tasks".</p><p>The standard solution to this problem suggested by AWS is to write a Lambda that runs the query and stores the value into a custom CloudWatch metric, which I can then use to satisfy my automation needs. I did it, and in this post I will show you exactly how, using Terraform, Python and Zappa, CloudWatch, and DynamoDB. At the end of the post I will also briefly discuss the cost of the solution.</p><h2 id="the-big-picture-f6bc">The big picture<a class="headerlink" href="#the-big-picture-f6bc" title="Permanent link">¶</a></h2><p>Before I get into the details of the specific tools or solutions that I decided to implement, let me have a look at the bigger picture. The initial idea is very simple: a Lambda function can run a specific Log Insights query and store the results in a custom metric, which can in turn be used to trigger alarms and other actions.</p><p>For a single system I already have 4 or 5 of these queries that I'd like to run, and I have multiple systems, so I'd prefer to have a solution that doesn't require me to deploy and maintain a different Lambda for each query. The maintenance can be clearly automated as well, but such a solution smells of duplicated code miles away, and if there is no specific reason to go down that road I prefer to avoid it.</p><p>Since Log Insights queries are just strings of code, however, we can store them somewhere and then simply loop on all of them within the same Lambda function. To implement this, I created a DynamoDB table and every element contains all the data I need to run each query, such as the log group that I want to investigate and the name of the target metric.</p><h2 id="terraform-a3cb">Terraform<a class="headerlink" href="#terraform-a3cb" title="Permanent link">¶</a></h2><p>In the following sections I will discuss the main components of the solution from the infrastructural point of view, showing how I created them with Terraform. The four main AWS services that I will use are: <a href="https://aws.amazon.com/dynamodb/">DynamoDB</a>, <a href="https://aws.amazon.com/lambda/">Lambda</a>, <a href="https://aws.amazon.com/iam/">IAM</a>, <a href="https://aws.amazon.com/cloudwatch/">CloudWatch</a>.</p><p>I put the bulk of the code in a module so that I can easily create the same structure for multiple AWS accounts. While my current setup is a bit more complicated that that, the structure of the code can be simplified as</p><div class="code"><div class="content"><div class="highlight"><pre>+ common
+ lambda-loginsights2metrics
+ cloudwatch.tf
+ dynamodb.tf
+ iam.tf
+ lambda.tf
+ variables.tf
+ account1
+ lambda-loginsights2metrics
+ main.tf
+ variables.tf
</pre></div> </div> </div><h3 id="variables-7edf">Variables</h3><p>Since I will refer to them in the following sections, let me show you the four variables I defined for this module.</p><p>First I need to receive the items that I need to store in the DynamoDB table</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"items"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I prefer to have a prefix in front of my components that allows me to duplicate them without clashes</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"prefix"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"string"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"loginsights2metrics"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The Lambda function will require a list of security groups that grant access to specific network components</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"security_groups"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Finally, Lambda functions need to be told which VPC subnets they can use to run</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"vpc_subnets"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://www.terraform.io/docs/configuration-0-11/variables.html">Terraform variables</a>.</li><li>An <a href="https://spacelift.io/blog/how-to-use-terraform-variables">in-depth post</a> that explains how to use variables in Terraform, by Sumeet Ninawe</li></ul><h3 id="dynamodb-55e8">DynamoDB</h3><p>Let's start with the corner stone, which is the DynamoDB table that contains data for the queries. As DynamoDB is not a SQL database we don't need to define columns in advance. This clearly might get us into trouble later, so we need to be careful and be consistent when we write items, adding everything is needed by the Lambda code.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/dynamodb.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_dynamodb_table"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-items"</span>
<span class="w"> </span><span class="na">billing_mode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"PAY_PER_REQUEST"</span>
<span class="w"> </span><span class="na">hash_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SlotName"</span>
<span class="w"> </span><span class="nb">attribute</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SlotName"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Speaking of items, I assume I will pass them when I call the module, so here I just need to loop on the input variable <code>items</code></p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/dynamodb.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_dynamodb_table_item"</span><span class="w"> </span><span class="nv">"item"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="nv">var.items</span><span class="p">)</span>
<span class="w"> </span><span class="na">table_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">hash_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.hash_key</span>
<span class="w"> </span><span class="na">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">jsonencode</span><span class="p">(</span><span class="nf">element</span><span class="p">(</span><span class="nv">var.items</span><span class="p">,</span><span class="w"> </span><span class="nv">count.index</span><span class="p">))</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Since the query is written as a Terraform string and will be read from Python there are two small caveats here. To be consistent with Terraform's syntax we need to escape double quotes in the query, and to avoid fights with Python we need to escape backslashes. So for example a valid query like</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) as Value by bin(1m)
</pre></div> </div> </div><p>will be stored as</p><div class="code"><div class="content"><div class="highlight"><pre>"parse @message /\\[celery\\.(?<source>[a-z.]+)\\].*Received task: (?<task>[a-z._]+)\\[/ | filter not isblank(source) | stats count(*) as Value by bin(1m)"
</pre></div> </div> </div><p>Another remark is that the Lambda I will write in Python will read data plotted with the name <code>Value</code> on bins of 1 minute, so the query should end with <code>stats X as Value by bin(1m)</code> where <code>X</code> is a specific stat, for example <code>stats count(*) as Value by bin(1m)</code>.</p><p>The reason behind 1 minute is that the maximum standard resolution of CloudWatch metrics is 1 minute. Should you want more you need to have a look at <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#high-resolution-metrics">CloudWatch High-Resolution Metrics</a>.</p><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://aws.amazon.com/dynamodb/">Amazon DynamoDB</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table">aws_dynamodb_table documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table_item">aws_dynamodb_table_item documentation</a></li></ul><h3 id="iam-part-1-cde2">IAM part 1</h3><p>IAM roles are central in AWS. In this specific case we have the so-called <a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html">Lambda execution role</a>, which is the IAM role that the Lambda assumes when you run it. In AWS users or services (that is humans or AWS components) <em>assume</em> a role, receiving the permissions connected with it. To assume roles, however, they need to have a specific permission, a so-called <em>trust policy</em>.</p><p>Let's define a trust policy that allows the Lambda service to assume the role that we will define</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"trust"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s2">"sts:AssumeRole"</span><span class="p">]</span>
<span class="w"> </span><span class="nb">principals</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Service"</span>
<span class="w"> </span><span class="na">identifiers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"lambda.amazonaws.com"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>and after that the role in question</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.prefix</span>
<span class="w"> </span><span class="na">assume_role_policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.trust.json</span>
<span class="p">}</span>
</pre></div> </div> </div><p>To run, Lambdas need an initial set of permissions which can be found in the canned policy <code>AWSLambdaVPCAccessExecutionRole</code>. You can see the content of the policy in the IAM console or dumping it with <code>aws iam get-policy</code> and <code>aws iam get-policy-version</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ aws iam get-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
{
"Policy": {
"PolicyName": "AWSLambdaVPCAccessExecutionRole",
"PolicyId": "ANPAJVTME3YLVNL72YR2K",
"Arn": "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole",
"Path": "/service-role/",
"DefaultVersionId": "v2",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"Description": "Provides minimum permissions for a Lambda function to execute while accessing a resource within a VPC - create, describe, delete network interfaces and write permissions to CloudWatch Logs. ",
"CreateDate": "2016-02-11T23:15:26Z",
"UpdateDate": "2020-10-15T22:53:03Z"
}
}
$ aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole --version-id v2
{
"PolicyVersion": {
"Document": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses"
],
"Resource": "*"
}
]
},
"VersionId": "v2",
"IsDefaultVersion": true,
"CreateDate": "2020-10-15T22:53:03Z"
}
}
</pre></div> </div> </div><p>Attaching a canned policy is just a matter of creating a specific <code>aws_iam_role_policy_attachment</code> resource</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy_attachment"</span><span class="w"> </span><span class="nv">"loginsights2metrics-"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy_arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Now that we have the IAM role and the basic policy we can assign custom permissions to it. We need to grant the Lambda permissions on other AWS components, namely CloudWatch to run Log Insights queries and to store metrics and DynamoDB to retrieve all the items from the queries table.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"cloudwatch:PutMetricData"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"cloudwatch:PutMetricAlarm"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:StartQuery"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:GetQueryResults"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:GetLogEvents"</span><span class="p">,</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s2">"*"</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"dynamodb:Scan"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_dynamodb_table.loginsights2metrics.arn</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Through <code>aws_iam_role_policy</code> we can create and assign the policy out of a <code>data</code> structure</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.prefix</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.loginsights2metrics.json</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document">aws_iam_policy_document documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role">aws_iam_role documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment">aws_iam_role_policy_attachment documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy">aws_iam_role_policy documentation</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/reference/iam/get-policy.html">AWS CLI iam get-policy documentation</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/reference/iam/get-policy-version.html">AWS CLI iam get-policy-version documentation</a></li></ul><h3 id="lambda-0ea2">Lambda</h3><p>We can now create the Lambda function container. I do not use Terraform as a deployer, as I think it should be used to define static infrastructure only, so I will use a dummy function here and later deploy the real code using the AWS CLI.</p><p>The dummy function can be easily created with</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"archive_file"</span><span class="w"> </span><span class="nv">"dummy"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"zip"</span>
<span class="w"> </span><span class="na">output_path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${path.module}/lambda.zip"</span>
<span class="w"> </span><span class="nb">source</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">content</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dummy"</span>
<span class="w"> </span><span class="na">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dummy.txt"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The Lambda function is a bit more complicated. As I mentioned, I'll use Zappa to package the function, so the <code>handler</code> has to be <code>"zappa.handler.lambda_handler"</code>. The IAM role given to the function is the one we defined previously, while <code>memory_size</code> and <code>timeout</code> clearly depend on the specific function. Lambdas should run in private networks, and I won't cover here the steps to create them. The AWS docs contains a lot of details on this topic, e.g. <a href="https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/">https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/</a>.</p><p>The environment variables allow me to inject the name of the DynamoDB table so that I don't need to hardcode it. I also pass another variable, the <a href="https://sentry.io/welcome/">Sentry DSN</a> that I use in my configuration. This is not essential for the problem at hand, but I left it there to show how to pass such values.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_lambda_function"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">function_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"loginsights2metrics"</span>
<span class="w"> </span><span class="na">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"zappa.handler.lambda_handler"</span>
<span class="w"> </span><span class="na">runtime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"python3.8"</span>
<span class="w"> </span><span class="na">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.archive_file.dummy.output_path</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.arn</span>
<span class="w"> </span><span class="na">memory_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">128</span>
<span class="w"> </span><span class="na">timeout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span>
<span class="w"> </span><span class="nb">vpc_config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_subnets</span>
<span class="w"> </span><span class="na">security_group_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.security_groups</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">environment</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">variables</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SENTRY_DSN"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://XXXXXX:@sentry.io/YYYYYY"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"DYNAMODB_TABLE"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">lifecycle</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">ignore_changes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nb">last_modified, filename</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that I instructed Terraform to ignore changes to the two attributes <code>last_modified</code> and <code>filename</code>, and that I haven't used any <code>source_code_hash</code>. This way I can safely apply Terraform to change parameters like <code>memory_size</code> or <code>timeout</code> without affecting what I deployed with the CI.</p><p>Since I want to trigger the function from AWS CloudWatch Events I need to grant the service <code>events.amazonaws.com</code> the <code>lambda:InvokeFunction</code> permission.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_lambda_permission"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">statement_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"AllowExecutionFromCloudWatch"</span>
<span class="w"> </span><span class="na">action</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda:InvokeFunction"</span>
<span class="w"> </span><span class="na">function_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_lambda_function.loginsights2metrics.function_name</span>
<span class="w"> </span><span class="na">principal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"events.amazonaws.com"</span>
<span class="w"> </span><span class="na">source_arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_cloudwatch_event_rule.rate.arn</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/archive/latest/docs/data-sources/archive_file">archive_file documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function">aws_lambda_function documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission">aws_lambda_permission documentation</a></li></ul><h3 id="iam-part-2-7f1e">IAM part 2</h3><p>Since 2018 Lambdas have a maximum execution time of 15 minutes (900 seconds), which is more than enough for many services, but to be conservative I preferred to leverage Zappa's asynchronous calls and to make the main Lambda call itself for each query. The Lambda doesn't clearly call the same Python function (it's not recursive), but from AWS's point of view we have a Lambda that calls itself, so we need to give it a specific permission to do this.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"loginsights2metrics_exec"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"lambda:InvokeAsync"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"lambda:InvokeFunction"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_lambda_function.loginsights2metrics.arn</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I could not define this when I defined the rest of the IAM components because this needs the Lambda to be defined, but the resource is in the same file. Terraform doesn't care about which resource we defined first and where we define it as long as there are no loops in the definitions.</p><p>We can now assign the newly created policy document to the IAM role we created previously</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy"</span><span class="w"> </span><span class="nv">"loginsights2metrics_exec"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-exec"</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.loginsights2metrics_exec.json</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document">aws_iam_policy_document documentation</a> documentation")</li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy">aws_iam_role_policy documentation</a></li></ul><h3 id="cloudwatch-518e">CloudWatch</h3><p>Whenever you need to run Lambdas (or other things) periodically, the standard AWS solution is to use CloudWatch Events, which work as the AWS cron system. CloudWatch Events are made of rules and targets, so first of all I defined a rule that gets triggered every 2 minutes</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/cloudwatch.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_cloudwatch_event_rule"</span><span class="w"> </span><span class="nv">"rate"</span><span class="w"> </span><span class="p">{</span>
<span class="c1"> # Zappa requires the name to match the processing function</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"main.loginsights2metrics"</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Trigger Lambda ${var.prefix}"</span>
<span class="w"> </span><span class="na">schedule_expression</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rate(2 minutes)"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that Zappa has a specific requirement for CloudWatch Events, so I left a comment to clarify this to my future self. The second part of the event is the target, which is the Lambda function that we defined in the previous section.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/cloudwatch.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_cloudwatch_event_target"</span><span class="w"> </span><span class="nv">"lambda"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">rule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_cloudwatch_event_rule.rate.name</span>
<span class="w"> </span><span class="na">target_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-target"</span>
<span class="w"> </span><span class="na">arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_lambda_function.loginsights2metrics.arn</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule">aws_cloudwatch_event_rule documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target">aws_cloudwatch_event_target documentation</a></li></ul><h3 id="using-the-module-5d88">Using the module</h3><p>Now the module is finished, so I just need to create some items for the DynamoDB table and to call the module itself</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="nb">locals</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs submitted tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery\\.(?<source>[a-z.]+)\\].*Received task: (?<task>[a-z._]+)\\[/ | filter not isblank(source) | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Submitted tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs succeeded tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery.(?<source>[a-z\\._]+)].*Task (?<task>[a-z\\._]+)\\[.*\\] (?<event>[a-z]+)/ | filter source = \"app.trace\" | filter event = \"succeeded\" | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Succeeded tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs retried tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery.(?<source>[a-z\\._]+)].*Task (?<task>[a-z\\._]+)\\[.*\\] (?<event>[a-z]+)/ | filter source = \"app.trace\" | filter event = \"retry\" | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Retried tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I need to provide a security group for the Lambda, and in this case I can safely use the default one provided by the VPC</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_security_group"</span><span class="w"> </span><span class="nv">"default"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"default"</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_id</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And I can finally call the module</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">module</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"../../common/lambda-loginsights2metrics"</span>
<span class="w"> </span><span class="na">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">local.items</span>
<span class="w"> </span><span class="na">security_groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">data.aws_security_group.default.id</span><span class="p">]</span>
<span class="w"> </span><span class="na">vpc_subnets</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_private_subnets</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that the variable <code>vpc_private_subnets</code> is a list of subnet names that I created in another module.</p><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group">aws_security_group documentation</a></li><li><a href="https://www.terraform.io/docs/language/modules/develop/index.html">Creating Terraform modules</a></li></ul><h2 id="python-43d2">Python<a class="headerlink" href="#python-43d2" title="Permanent link">¶</a></h2><p>As I mentioned before, the Python code of the Lambda function is contained in a different repository and deployed with the CI using <a href="https://github.com/zappa/Zappa">Zappa</a>. Given we are interacting with AWS I am clearly using Boto3, the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html">AWS SDK for Python</a>. The code was developed locally without Zappa's support, to test out the Boto3 functions I wanted to use, then quickly adjusted to be executed in a Lambda.</p><p>I think the code is pretty straightforward, but I left my original comments to be sure everything is clear. </p><div class="code"><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="kn">import</span> <span class="nn">boto3</span>
<span class="kn">from</span> <span class="nn">zappa.asynchronous</span> <span class="kn">import</span> <span class="n">task</span>
<span class="c1"># CONFIG</span>
<span class="n">logs</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s2">"logs"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="n">cw</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s2">"cloudwatch"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="n">dynamodb</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">resource</span><span class="p">(</span><span class="s2">"dynamodb"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="nd">@task</span>
<span class="k">def</span> <span class="nf">put_metric_data</span><span class="p">(</span><span class="n">item</span><span class="p">):</span> <span class="callout">3</span>
<span class="n">slot_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"SlotName"</span><span class="p">]</span>
<span class="n">log_group</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"LogGroup"</span><span class="p">]</span>
<span class="n">cluster_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"ClusterName"</span><span class="p">]</span>
<span class="n">query</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"Query"</span><span class="p">]</span>
<span class="n">namespace</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"Namespace"</span><span class="p">]</span>
<span class="n">metric_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"MetricName"</span><span class="p">]</span>
<span class="c1"># This runs the Log Insights query fetching data</span>
<span class="c1"># for the last 15 minutes.</span>
<span class="c1"># As we deal with logs processing it's entirely possible</span>
<span class="c1"># for the metric to be updated, for example because</span>
<span class="c1"># a log was received a bit later.</span>
<span class="c1"># When we put multiple values for the same timestamp</span>
<span class="c1"># in the metric CW can show max, min, avg, and percentiles.</span>
<span class="c1"># Since this is an update of a count we should then always</span>
<span class="c1"># use "max".</span>
<span class="n">start_query_response</span> <span class="o">=</span> <span class="n">logs</span><span class="o">.</span><span class="n">start_query</span><span class="p">(</span> <span class="callout">4</span>
<span class="n">logGroupName</span><span class="o">=</span><span class="n">log_group</span><span class="p">,</span>
<span class="n">startTime</span><span class="o">=</span><span class="nb">int</span><span class="p">((</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">15</span><span class="p">))</span><span class="o">.</span><span class="n">timestamp</span><span class="p">()),</span>
<span class="n">endTime</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">timestamp</span><span class="p">()),</span>
<span class="n">queryString</span><span class="o">=</span><span class="n">query</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">query_id</span> <span class="o">=</span> <span class="n">start_query_response</span><span class="p">[</span><span class="s2">"queryId"</span><span class="p">]</span>
<span class="c1"># Just polling the API. 5 seconds seems to be a good</span>
<span class="c1"># compromise between not pestering the API and not paying</span>
<span class="c1"># too much for the Lambda.</span>
<span class="n">response</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">while</span> <span class="n">response</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">response</span><span class="p">[</span><span class="s2">"status"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"Running"</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">slot_name</span><span class="si">}</span><span class="s2">: waiting for query to complete ..."</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">logs</span><span class="o">.</span><span class="n">get_query_results</span><span class="p">(</span><span class="n">queryId</span><span class="o">=</span><span class="n">query_id</span><span class="p">)</span>
<span class="c1"># Data comes in a strange format, a dictionary of</span>
<span class="c1"># {"field":name,"value":actual_value}, so this converts</span>
<span class="c1"># it into something that can be accessed through keys</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s2">"results"</span><span class="p">]:</span> <span class="callout">5</span>
<span class="n">sample</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">d</span><span class="p">:</span>
<span class="n">field</span> <span class="o">=</span> <span class="n">i</span><span class="p">[</span><span class="s2">"field"</span><span class="p">]</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">i</span><span class="p">[</span><span class="s2">"value"</span><span class="p">]</span>
<span class="n">sample</span><span class="p">[</span><span class="n">field</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
<span class="c1"># Now that we have the data, let's put them into a metric.</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s2">"bin(1m)"</span><span class="p">],</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2"> %H:%M:%S.000"</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s2">"Value"</span><span class="p">])</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">slot_name</span><span class="si">}</span><span class="s2">: putting </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2"> on </span><span class="si">{</span><span class="n">timestamp</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">cw</span><span class="o">.</span><span class="n">put_metric_data</span><span class="p">(</span> <span class="callout">6</span>
<span class="n">Namespace</span><span class="o">=</span><span class="n">namespace</span><span class="p">,</span>
<span class="n">MetricData</span><span class="o">=</span><span class="p">[</span>
<span class="p">{</span>
<span class="s2">"MetricName"</span><span class="p">:</span> <span class="n">metric_name</span><span class="p">,</span>
<span class="s2">"Dimensions"</span><span class="p">:</span> <span class="p">[{</span><span class="s2">"Name"</span><span class="p">:</span> <span class="s2">"Cluster"</span><span class="p">,</span> <span class="s2">"Value"</span><span class="p">:</span> <span class="n">cluster_name</span><span class="p">}],</span>
<span class="s2">"Timestamp"</span><span class="p">:</span> <span class="n">timestamp</span><span class="p">,</span>
<span class="s2">"Value"</span><span class="p">:</span> <span class="n">value</span><span class="p">,</span>
<span class="s2">"Unit"</span><span class="p">:</span> <span class="s2">"None"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">loginsights2metrics</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span> <span class="callout">1</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"package_info.json"</span><span class="p">,</span> <span class="s2">"r"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">package_info</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">build_timestamp</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">package_info</span><span class="p">[</span><span class="s2">"build_time"</span><span class="p">])</span>
<span class="n">build_datetime</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">fromtimestamp</span><span class="p">(</span><span class="n">build_timestamp</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"###################################"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span>
<span class="s2">"LogInsights2Metrics - Build date: "</span>
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">build_datetime</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s2">"%Y/%m/</span><span class="si">%d</span><span class="s2"> %H:%M:%S"</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"###################################"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Reading task from DynamoDB table </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"DYNAMODB_TABLE"</span><span class="p">]</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">dynamodb</span><span class="o">.</span><span class="n">Table</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"DYNAMODB_TABLE"</span><span class="p">])</span>
<span class="c1"># This is the simplest way to get all entries in the table</span>
<span class="c1"># The next loop will asynchronously call `put_metric_data`</span>
<span class="c1"># on each entry.</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">scan</span><span class="p">(</span><span class="n">Select</span><span class="o">=</span><span class="s2">"ALL_ATTRIBUTES"</span><span class="p">)</span> <span class="callout">2</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s2">"Items"</span><span class="p">]:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"* Processing item </span><span class="si">{</span><span class="n">i</span><span class="p">[</span><span class="s1">'SlotName'</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">put_metric_data</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
</pre></div> </div> </div><p>So, when the Lambda is executed, the entry point is the function <code>loginsights2metrics</code> <span class="callout">1</span> which queries the DynamoDB table <span class="callout">2</span> and loops over all the items contained in it. The loop executes the function <code>put_metric_data</code> <span class="callout">3</span> which being a Zappa <code>task</code> runs it in a new Lambda invocation. This function runs the Log Insights query <span class="callout">4</span>, adjusts Boto3's output <span class="callout">5</span>, and finally puts the values in the custom metric <span class="callout">6</span>.</p><p>The problem I mention in the comment just before I run <code>logs.start_query</code> is interesting. Log Insights are queries, and since they extract data from logs the result can change between two calls of the same query. This means that, since there is an overlap between calls (we run a query on the last 15 minutes every 2 minutes), the function will put multiple values in the same bin of the metric. This is perfectly normal, and it's the reason why CloudWatch allows you to show the maximum, minimum, average, or various percentiles of the same metric. When it comes to counting events, the number can only increase or stay constant in time, but never decrease, so it's sensible to look at the maximum. This is not true if you are looking at execution times, for example, so pay attention to the nature of the underlying query when you graph the metric.</p><p>The Zappa settings I use for the function are</p><div class="code"><div class="title"><code>zappa_settings.json</code></div><div class="content"><div class="highlight"><pre><span class="p">{</span>
<span class="w"> </span><span class="nt">"main"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"app_module"</span><span class="p">:</span><span class="w"> </span><span class="s2">"main"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"app_function"</span><span class="p">:</span><span class="w"> </span><span class="s2">"main.loginsights2metrics"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"runtime"</span><span class="p">:</span><span class="w"> </span><span class="s2">"python3.8"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"log_level"</span><span class="p">:</span><span class="w"> </span><span class="s2">"WARNING"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"xray_tracing"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"exception_handler"</span><span class="p">:</span><span class="w"> </span><span class="s2">"zappa_sentry.unhandled_exceptions"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And the requirements are</p><div class="code"><div class="title"><code>requirements.txt</code></div><div class="content"><div class="highlight"><pre>zappa
zappa-sentry
</pre></div> </div> </div><p>Please note that as I mentioned before <code>zappa-sentry</code> is not a strict requirement for this solution.</p><p>The code can be packaged and deployed with a simple bash script like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="ch">#!/bin/bash</span>
<span class="nv">VENV_DIRECTORY</span><span class="o">=</span>venv
<span class="nv">LAMBDA_PACKAGE</span><span class="o">=</span>lambda.zip
<span class="nv">REGION</span><span class="o">=</span>eu-west-1
<span class="nv">FUNCTION_NAME</span><span class="o">=</span>loginsights2metrics
<span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span>-d<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span>rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
<span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span>-f<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span>rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>
<span class="nb">source</span><span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>/bin/activate
pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
zappa<span class="w"> </span>package<span class="w"> </span>main<span class="w"> </span>-o<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span>
rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>
aws<span class="w"> </span>--region<span class="o">=</span><span class="si">${</span><span class="nv">REGION</span><span class="si">}</span><span class="w"> </span>lambda<span class="w"> </span>update-function-code<span class="w"> </span>--function-name<span class="w"> </span><span class="si">${</span><span class="nv">FUNCTION_NAME</span><span class="si">}</span><span class="w"> </span>--zip-file<span class="w"> </span><span class="s2">"fileb://</span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="s2">"</span>
</pre></div> </div> </div>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="costs-dbe1">Costs<a class="headerlink" href="#costs-dbe1" title="Permanent link">¶</a></h2><p>I will follow here the <a href="https://aws.amazon.com/lambda/pricing/">AWS guide on Lambda pricing</a> and the calculations published in 2018 by my colleague João Neves on <a href="https://silvaneves.org/how-much-does-a-lambda-cost.html">his blog</a>.</p><p>I assume the following:</p><ul><li>The Lambda runs 4 queries, so we have 5 invocations (1 for the main Lambda and 4 asynchronous tasks)</li><li>Each invocation runs for 5 seconds. The current average time of each invocation in my AWS accounts is 4.6 seconds</li><li>I run the Lambda every 2 minutes</li></ul><p>Requests: <code>5 invocations/event * 30 events/hour * 24 hours/day * 31 days/month = 111600 requests</code></p><p>Duration: <code>0.128 GB/request * 111600 requests * 5 seconds = 71424 GB-second</code></p><p>Total: <code>$0.20 * 111600 / 10^6 + $0.0000166667 * 71424 ~= $1.22/month</code></p><p>As you can see, for applications like this it's extremely convenient to use a serverless solution like Lambda functions.</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>