Serverless Architecture

This project explores a serverless design using ECS & Fargate, Eventbridge, SQS Queues, and Lambda Functions to convert and OCR scan documents, leaving the text as a json object that could be searched or used for other data processing from documents. The actual process of converting documents and scanning are abstracted from this presentation to focus on the services AWS provide to build serverless systems beyond simply putting code into lambda functions.

The first stage of my serverless architecture extends the functionality of the Static Hosting/CDN project documented on this website. Users are first required to login or sign up for an account, then they are allowed through an API Gateway and routed to the S3 website. Ajax calls are then made to the routing lambda function with can start other backend processes. The lambda function routes responses back to the success function from ajax calls and notifies the end user.

The primary function for this website is to allow an interface where users can upload files. The files need to be processed once they are stored. Processing could include a variety of operations to include altering file formats, scanning content, or saving different versions as a few examples.

The S3 buckets are configured with triggers to send a message to the queues about the file stored. On the first stage, an Eventbridge rule is set to only trigger the first ECS Task (PDF-PNG) when the uploaded file has a pdf file extension. When the file is converted, the resulting image is stored into another bucket for PNG files only. This decouples the different file types and a variety of operations could be used such as a file naming convention or tags on the files to categorize them in context.

A similar trigger is configured on the second bucket (PNG) where another rule starts the second stage ECS Task (PNG-TXT) which polls a message about the file, scans it for text, and saves the txt in a db record.