Showing posts from April, 2017

High CPU Steal On AWS Burstable Instances

Seeing High CPU Steal on AWS Burstable Instance Types? At Tiller , we were, too. We have some backend systems that process data offline, using a job processing system that we’ve set up in a one layer of an AWS OpsWorks stack, using a node-based Agenda job processors running on t2.small instances. We’ve been having some subtle problems for a while, that we finally reached a point we could no longer ignore, and so we looked in deeper. The symptoms were that, after a while, an instance in that layer becomes busy enough to start missing deadlines and generating significant numbers of errors. We noticed a high amount of CPU Steal on those instances, at those times, and initially thought we might be suffering from the ‘noisy neighbor’ problem. Turns out it wasn’t a noisy neighbor: it was us.  Spoiler alert: our choice of instance type and job processor algorithm weren’t really a good match. This outstanding blog post by Leonid Mamchenkov was a great help in figuring this out,