Using AWS Websockets for file delivery

  • Discussion |
  • 2022-12-03 |
  • 🕑 3 mins |
  • Christopher Lai

This week I learnt how to return a file in s3 through a WebSocket API.

I was following this guide to build a WebSocket API that can return data after a step function has finished. In my use case I swapped out the step function to a SageMaker pipeline.

I had everything up and running with my client using the python websockets library to receive the messages from the post_to_connection as specified in the guide. All good until I realize the bytes for the file data I am sending is encoded to utf-8 and sent as TEXT, which corrupts the file transfer. It seems that the function does this encoding and there’s no option to stop it converting. It is especially weird as the post_to_connection params expects bytes.

At this point I considered either building a separate REST API endpoint to transfer the file until I found out about pre-signed s3 urls. This is a time limited file url that can be generated using boto3. Very useful! The client side just needs to download the file through a simple request.get().

I wish the guide had this info upfront but its actually hidden in the application code, which you can only review after you deploy it.

Here is the final code for sending the pre-signed url.

import boto3

api_region = 'eu-west-2' #Change to your region
api_url = 'wss://sldfjal.' #Change to your websocket api url
bucket = 'test-bucket' 
key = 'path/to/file'

apiManagement = boto3.client(
  "apigatewaymanagementapi", region_name=api_region, endpoint_url=api_url
)
s3_presigned_url = generate_presigned_s3(api_region, bucket, key, expiration)
apiManagement.post_to_connection(Data=s3_presigned_url, ConnectionId=connection_id)


# Generate Pre-signed URL 15 minute expiration

def generate_presigned_s3(api_region, bucket, key, expiration):
 params = {"Bucket": bucket, "Key": key}
 s3 = boto3.client("s3", api_region)
 url = s3.generate_presigned_url("get_object", Params=params, ExpiresIn=expiration)
 print("s3 pre-signed URL: " + url)
 return url

On the client side:

import requests
import websockets

def download_file(url, folder):

 get_response = requests.get(url,stream=True)
 file_name  = url.split("/")[-1].split("?")[0]
 with open(f"{folder}/{file_name}", 'wb') as f:
	 for chunk in get_response.iter_content(chunk_size=1024):
		 if chunk: # filter out keep-alive new chunks
			 f.write(chunk)

async def connect_to_websocket():
  try:
	  async with websockets.connect(
			  f"wss://websock.thesillyhome.ai/?pipeline_execution_arn={pipeline_execution_arn}" 
			 ) as websocket:
		  async for message in websocket:
			  if message == 'end_transmission':
				  await websocket.close()
			  else:
				  try:
					 print (f'Received pre-signed url : {message}')
					 folder_path = "/path/to/folder"
					 download_file(message, folder_path)
				  except ValueError:
					 print('No data in buffer')
	except TimeoutError as e:
		 print('Error connecting to websocket.')