Your browser is unable to display this site correctly. Please try an up-to-date version of Chrome or Firefox instead.

< Back to all posts

Costing Machine Learning Solutions Across Cloud Platforms

Bilal Karim

By Bilal Karim

Jonah Alumnum

April 22, 2019

Costing Machine Learning Solutions Across Cloud Platforms

"The cloud is no longer relegated to handling ancillary jobs, but is quickly become the base for mission critical -- or even all -- enterprise IT operations, the head of Amazon Web Services said. The cloud is the new normal."" - Andy Jassy, CEO - Amazon Web Services in 2014

Depending on an organization's size and maturity of its cloud adoption, a comparison of cloud ML services providers and the associated feasibility analysis may be preceded by a question of cloud vs. on-premesis development; however, since more-and-more commonly, organizations are developing POC for cloud development and machine learning, and specifically focusing on adopting the public cloud, it seems relevant to discuss how much machine learning costs on each cloud platform.

In case you're only here for the numbers, I have provided the cost comparison first. A brief note about the lessons I learnt follow for those who may want to adapt and replicate this for their own businesses.

Cost comparison across the top 3 cloud providers

The intention of this analysis is to simulate the same project on Amazon AWS, Microsoft Azure, and Google Cloud Platform for the solution. I imagined a scenario where the machine learning problem has been defined, the project duration and overheads estimated, and the compute and storage infrastructure have been decided. What follows is the financial exercise for estimating the cost of running the same model on all 3 of the major cloud providers in the market.

Notes and Assumptions

Project specs: We estimated the project with the following parameters:

  • The project is operated entirely on the cloud. The machine images are provided by the cloud services for free.
  • The duration for the project is 1 year in total:

    • The machine learning model is trained for 2 weeks (~300 hours)
    • Once trained, the model is deployed as a RESTful service for 1 year (~9000 hours) to get real-time predictions from. We projected the cost of this at 3 levels of utilization: 10%, 50%, and 100%. Even though the cost of the Virtual Machine remains constant throughout the year, the computing resource can be utilized for other projects.
  • Variable costs (such as storage and hourly HR) are excluded from the calculation.

Machine specs

AWS EC2 Azure Virtual Machines Google Compute Engine
Compute Name: p3.2xlarge
GPUs: 1x NVIDIA V100 Tensor Core vCPU: 8 Mem: 61 GB GPU mem: 16 Storage EBS-only EBS bandwidth: 1.5 Gbps Networking performance: Up to 10 Gigabit
Name: NC6v3
GPU: 1x NVIDIA Tesla V100 vCPU: 6 Mem: 112 GB GPU mem: 16 Storage: 736 GB
Name: n1-highmem-8
GPUs: 1x NVIDIA Tesla V100 (billed separately) vCPU: 8 Mem: 52GB GPU mem: 16 Storage: 375GB SSD
Image Deep Learning AMI (Conda on Ubuntu) Data Science Virtual Machine for Linux (Ubuntu) Deep Learning VM Image
Storage EBS Throughput optimized HDD (st1) volumes $0.045 per GB-month of provisioned storage Temp storage included with Compute; Azure Managed Disk Zonal persistent disk $0.040/GB per month
On-demand cost $3.06/hour $3.06/hour $2.95/hour
Notes Microsoft Azure price-matches AWS for similar computing capacity. Calculation for hourly cost is $0.4736 (VM) + $2.48 (GPU) = $2.95. Google offers a 30% sustained use discount over 1 year, so effective price is 0.702.95= $2.07
Google automatically applies a 'sustained use' discount of 30% over the year. The prices do not reflect this.
Relevant Links *https://aws.amazon.com/ec2/instance-types/p3/
*https://aws.amazon.com/marketplace/pp/B077GCH38C
* https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu
*https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.linux-data-science-vm-ubuntu?tab=PlansAndPrice
*https://azure.microsoft.com/en-us/pricing/details/managed-disks/
*https://cloud.google.com/compute/pricing#predefined *https://cloud.google.com/deep-learning-vm/docs/

This specific virtual machine was selected based on its capability to handle high-performance computing and machine learning workloads and its compatibility with NVIDIA’s Tesla V100 GPU.

With virtual machines provisioned, the project team is able to immediately start working on the machine learning model because the machine images provided by the above vendors come with the data and development tools they need.

Below are the costs of doing the project for 1 year based on the project specifications above.

Utilization
10% 10% 50% 50% 100% 100%
Model training costs Monthly Annual Monthly Annual Monthly Annual
AWS $918.00 $229.50 $2,754.00 $1,147.50 $13,770.00 $2,295.00 $27,540.00
Azure $918.00 $229.50 $2,754.00 $1,147.50 $13,770.00 $2,295.00 $27,540.00
GCP $885.00 $221.25 $2,655.00 $1,106.25 $13,275.00 $2,212.50 $26,550.00

As has been the case for many years, Google undercuts the competition at every level, coming out to be the cheapest option for most organizations, whereas AWS and Microsoft remain constant with their pricing.

As we will see below, comparing machine learning development across platforms comes down to more than just the price. The business leader must answer a few questions and work within certain constraints.

Comparing apples to oranges

The biggest problem I faced in this analysis was the disparity between terminology and offerings across the 3 platforms compared above. Service providers vary in the ways they bill their customers, offer discounts, operate, and handle processes within themselves and with other providers.

This article notes that the majority of organizations have a multi-cloud strategy. Despite that, more often than not the choice of the machine learning cloud platform is going to come down to the ecosystem that the organization is working in. While at Jonah we are platform-agnostic and constantly explore proof-of-concept options on all cloud services, most organizations will have some degree of lock-in with their vendors. Although it helps to occasionally take a peek across the ecosystem wall and look at the market, it is not something that is advisable nor reasonable every time a new decision is needed.

As Dave Bartoletti of Forrester tweeted, "Don't pick a cloud before you pick a strategy."

About Jonah Group

Jonah Group is a digital consultancy the designs and builds high-performance software applications for the enterprise. Our industry is constantly changing, so we help our clients keep pace by making them aware of the possibilities of digital technology as it relates to their business.

  • 24,465
    sq. ft office in downtown Toronto
  • 128
    team members in our close-knit group
  • 18
    years in business, and counting