Created on March 14, 2021 at 2:32 pm
Updated on March 14, 2021 at 2:32 pm

Mini BigQuery Lambda Project w/ Go

Disclaimer: This project was mainly to test if it could be done. I don't think there is a good use case for actually connecting these two services together directly. A lot of the data manipulation can be done on Bigquery directly.

This is a short write up of my project connecting Bigquery to Lambda connected to a UI through API Gateway.

Uploading Google Credentials to AWS SSM Parameter Store

I had made a previous post about loading Google credentials into S3 and pulling the credentials from S3 to create the Bigquery client. This is an inefficient way to do this and insecure given that the credentials can accidently be exposed to the public via a simple url.

A better way to do this is to use AWS Systems Manager's (formerly SSM) parameter store. The parameter store is a key/value store system. Each parameter in the credentials file could be uploaded manually, but that wouldn't be very programmer like. Instead, I used the SDK for Go to read the file and unmarshal the data into a map that I subsequently uploaded. You can see the code here. Here is a bit of explanation for the putParams function shown below (line 43 on Github)

func putParams(svc *ssm.SSM, key string, value string) {
	ssmInput := &ssm.PutParameterInput{
		Name:  aws.String("/bqconfig/" + key),
		Value: aws.String(value),
		Type:  aws.String("String"),
	}

	_, err := svc.PutParameter(ssmInput)
	if err != nil {
		log.Fatal(err)
	}
}

The PutParameter function takes a PutParameterInput struct that has three required properties which are show above. You can read more about that in the AWS documentation. The /bqconfig/ in quotes under the Name property is creating a hierarchical parameter. I used this in my Lambda function to easily pull the necessary parameters from parameter store. This is what it ends up looking like through the AWS console.

gif

The Lambda Function

The goal of this project was to create a simple filter and search based on the publicly available SEC data set. You can see the code here.

The init function is similar to the one I made in the previous post except the credentials data is being pulled from SSM parameter store. I'll give a bit of explanation on the code below.

func paramsByPath(svc *ssm.SSM) map[string]string {
	pathInput := &ssm.GetParametersByPathInput{
		Path: aws.String("/bqconfig"),
	}

	res, err := svc.GetParametersByPath(pathInput)
	if err != nil {
		log.Println(err)
	}

	params := make(map[string]string)

	for _, param := range res.Parameters {
		name := strings.Replace(*param.Name, "/bqconfig/", "", -1)
		value := *param.Value
		params[name] = value
	}

	return params
}

As stated from above, all the credential parameters have been uploaded as a set of hierarchical parameters. Doing this allows me to get the entire set of parameters by specifying the hierarchy label. After that it's just a matter of returning it as a map and marshalling it into byte code to be passed as credentials.

Parametrized Query

The bulk of the function is just one single parametrized SQL query that is being executed through the client.

q := client.Query(
	`SELECT company_name, measure_tag, value, units, period_end_date
	FROM ` + "`bigquery-public-  data.sec_quarterly_financials.quick_summary`" + `
	WHERE fiscal_year = @fiscal_year
	AND form = "10-K"
	AND value BETWEEN @min and @max
	AND measure_tag IN (@measure_tag)
	ORDER BY company_name ASC
	LIMIT 50;`)
q.Parameters = []bigquery.QueryParameter{
	{
		Name:  "fiscal_year",
		Value: fiscalYear,
	},
	{
		Name:  "min",
		Value: min,
	},
	{
		Name:  "max",
		Value: max,
	},
	{
		Name:  "measure_tag",
		Value: measureTag,
	},
}

Bigquery provides what are called parameterized queries. This is useful for getting user input driven query data. The parametrized queries will be pulled from the query parameters exposed through API Gateway.

Connecting to API Gateway

Connecting Lambda to API Gateway is fairly straight forward. The following instructions are all done through the console, but it can be done both through the CLI and through infrastructure as code platforms such as AWS Cloudformation or Terraform.

First you want to go the API Gateway console and click on Create API

gif

Depending on whether your front end is hosted externally, you will choose either a public REST API or Private REST API

gif

I chose the top option. From their choose New API, then give your API a name, and choose either Regional or Edge Optimized. I chose Regional since this is just a test project.

gif

You should see this screen when your done. In which case, from the Actions drop down menu, choose create Resource, and then create a method on that resource. Or you can Create a method directly on the root resource url.

gif

When creating the resource, leave the proxy resource unchecked, and click Enable API Gateway Cors. This will be needed later for localhost testing.

gif

After that, create your methods on that resource. I only had one for this project. The integration type is Lambda Function. Turn on Proxy integration so you don't have to manually set up the integration responses. Type in the name of your Lambda function and it should auto populate.

gif

Once that's done, hit create, and you can test out your method. There is additional set up you could do, but it's functional as is. Below is an image of me testing out another function I had created, and as you can see it pulls data properly from Bigquery.

gif

Now you can call this as an exposed API endpoint which you can find after you've done deploying the API. The url will be in the stages tab on the left hand side.

As far as enabling cors is concerned, from what I gathered, enabling cors on the API isn't good enough. You'll also need to return the proper access headers with the response directly from your Lambda function. Also read this article about setting up stage variables and mapping templates from your mock integration response type. I won't go into details here, because the way I did it was to basically enable everything until it worked, and I'm 100% sure that's not the best way to go about it. I will be creating another blog post about how to properly enable cors later.

Thanks for reading!