Many analytics companies rely on machine learning or other data models to predict the probability of an outcome. The time and expense to execute a model increases with the number of features or inputs a model has. Particularly time consuming are training and validating data sets to improve a model’s accuracy.
Testing models in a lab environment or deploying models into production using on-premise hardware can be expensive and in many cases not fast enough to meet the demands a business use case requires. Additionally, on-premise systems are not elastic, meaning CPU, RAM and storage are always on whether a machine is in use or not.
Migrating model training and operationalized model deployments to a scalable, elastic cloud architecture is an option worth considering. AWS EMR (Elastic MapReduce) is a managed platform that provides a low cost, dynamically scalable alternative for processing a broad set of big data use cases. Because EMR supports many popular distributed frameworks such as Apache Spark, it can be used as a platform to process machine learning models.
By decoupling storage from compute, low cost S3 storage can serve as the data layer for EMR clusters. Spot and reserved EC2 instance pricing as well as the EC2 auto-scaling feature can provide additional cost savings. Clusters can be automated to spin up and process one-off jobs or can remain on to handle long-running or always on analytics. EMR Security Configurations are templates for security configurations that can be re-used whenever a cluster is created. Additional security controls exist depending on the configuration of your environment to protect data at rest and in flight. EMR also integrates with several other AWS data sources and external third-party tools.
EMR provides an affordable, elastic approach to process machine learning and other predictive models that can speed time to insight while reducing total cost of ownership.