Reducing costs on infrastructure can often feel like a chore when done in isolation. In this post, we’ll discuss the coordinated process that NextRoll is using to make the effort feel like a full-team project with the satisfaction of having a job well done.
Costs have a nasty habit of accumulating over time, usually because of a wide variety of decisions made by disparate teams that were made in the past and never revisited. It’s an ancient problem. Aristotle writes:
For that which is common to the greatest number has the least care bestowed upon it. Every one thinks chiefly of his own, hardly at all of the common interest; and only when he is himself concerned as an individual. For besides other considerations, everybody is more inclined to neglect the duty which he expects another to fulfill…
Of course, the difficulty is that many small problems demand many small solutions. As such, significantly reducing your costs can seem like an exceptionally burdensome task.
Here are our tips if you’re looking to save a chunk out of your budget.
1. Local vs. Global Optimization
Often, teams are asked to track costs on their own and apply some common sense rules to keep them down. This works ok in practice, but it is far from optimal. One of the main issues is that it’s very easy for a team to look at a relatively small cost-saving opportunity and it gets pushed down to the bottom of their priority list. I mean, in the grand scheme of things, does $1000 per month seem like that big of a deal?
Metaphor for a non-right-sized instance.
Well, probably every engineering team is making similar calls. NextRoll Engineering has dozens of teams as a moderately-sized company. Forty-two teams each saving $1000 every month results in an extra half million dollars in the company coffers. And the thing is, $1000 per month is not an onerous amount to be saving when you deal with data at the scale that we do; more on that later, but really, this represents a lower bound on what can be saved.
Beyond this, operating costs affect the bottom line of any company. Ideally, this money would be invested, so that returns can be made: hiring more people, spinning up new products, etc. that can meaningfully drive the top line revenue. Instead, in the best case, these costs are just wasted, but in the worst case, actually compound over time because not investing the money elsewhere is an opportunity cost. Also, as the company scales, these wasted costs will only increase. Getting your unit economics in line is critical.
To this end, we recommend creating a task force, with representatives across the engineering organization, that is responsible for talking through all the cost-saving initiatives. This team doesn’t need to meet frequently, but tracking and accountability are key. It also helps everyone get on the same page about how systems are built and interoperate, revealing more opportunities.
2. Solicit Ideas
No one knows the code better than the engineers who work on their systems every day. Some in management may be tempted to set arbitrary cost saving goals, but this is extremely counterproductive. It’s also not enough to demand some vague sense of “cost savings” as a project. In reality, each individual cost-saving initiative is unique in its own right and has its own set of tradeoffs that the engineers are best equipped to evaluate.
To this end, ask every engineering team to submit their ideas. Each team can meet amongst themselves, spend an hour or two just brainstorming, and write everything down. At NextRoll, we take an attitude of every idea being valid in this stage, no matter how difficult to accomplish. In some ways, this is similar to a design sprint: the ideas must be on the table before further discussions around feasibility and prioritization can happen. There are side benefits as well: small ideas can be enhanced or compounded; new product ideas can emerge; wild ideas can inspire other, more feasible ones.
One of NextRoll’s core values is, “Do more with less.” We try to consistently uphold this, but these cost-saving efforts provide a good opportunity for us to reflect on that value. It’s important during brainstorming that engineers ask themselves not, “What do we want,” or “What is convenient,” but rather, “What do we need?”
Everyone is contributing. This creates a shared sense of ownership and engineers become invested in their own ideas. Finger-pointing at the teams with the largest cost centers melt away, because, hey, we already know the little stuff adds up and we’re all in this together. Every team is trying to reduce their footprint to only what’s required. Every team lists out some potential objectives and it’s clear how these relate to the overarching theme and goal of the project.
Once all of the ideas are generated, they need to be submitted to the aforementioned task force. It’s up to the task force to meet and decide which tasks will be tackled. Hopefully the task force has enough representation across engineering that they know enough context to discuss the proposals intelligently. But just in case, these ideas should contain documentation that includes:
- The estimated amount of savings, in a standard unit (such as dollars/month)
- The estimated amount of engineering effort, in a standard unit (such as eng-weeks)
- A high-level description of the proposal
- A list of any open questions that would need to be resolved before tackling the project
- A list of impacted teams
Estimates do not need to be super accurate. The idea is to get a rough cut of what is possible. If you have enough ideas submitted, it is likely that some estimates will be over and some will be under. Statistically, over all submissions, you’ll probably hit a reasonable total estimate of how much money is on the table for the taking.
From here, the task force can compile the list of items in a centralized location, assess the total opportunity on each team and across the organization, and be prepared for their first meeting.
Another good tip for soliciting ideas is to set a reasonable minimum on how much a submission can save. This will keep brainstorming meetings focused and reduce the number of items the task force has to prioritize. What that minimum should be is going to depend on any number of variables, such as revenue, current total costs, number of teams, desired savings goals, etc. Pick something that fits your company’s situation.
3. Task Force Prioritization
The task force finally needs to meet and discuss all items. This will likely be a time-consuming meeting, but it will be valuable. We’ll talk about side benefits later, but for now, let’s talk about how to prioritize.
One thing that’s critical to keep in mind is that individual teams still have product roadmaps to execute on. Cost reduction doesn’t happen in a vacuum. Representatives should be able to explain to the group what the current workload for each individual team is, and how much engineering time is available to spare. But time isn’t free, so product management needs to be on board with the overall effort, and recognize that some time is going to be spent on getting unit costs down.
At this point, it’s common sense to tackle the biggest bang-for-the-buck items. As mentioned prior, NextRoll is fundamentally a data company. A really easy place for things to build up are S3 costs. My background is data science; my personal inclination is to keep plenty of data around so I can look at historical models, their performance, and so on. But how often do we really need to go back, say, 90 days? Is 45 enough? These are the questions we were asking ourselves on my teams, and we realized there were a lot of five-minute tasks to set some TTLs that slashed our budgets pretty significantly.
Metaphor for a typical S3 bucket.
Right-sizing instances is another big opportunity; as products develop over time, the needs for their servers can shift. It’s also possible to identify less-used features in the product that are supported by data and infrastructure and to just delete and shut off those services entirely. For high-volume systems, even a 10% improvement in efficiency can meaningfully impact your budget.
A common question at NextRoll was, “What about AWS reservations? If we’ve already paid for servers, why should we shut them down?” Sure, but this is something that should largely factor into prioritization. The reservations buy you time to reduce the number of servers you need. Try to avoid falling for a sunk cost fallacy: once the money is spent it’s gone. Those reservations could be used for other services that need them more, or could be used to spin up new product features. Also, reservations will eventually expire. Even though you’ve bought some time, try to get ahead of the problem and don’t lose focus on the end goal. Chances are, that expiration date will sneak up on you, and you’ll be in a new product development cycle that may be hard to find the time. The point is to strike while the iron’s hot.
Once the task force has balanced every team’s roadmaps and selected a set of appropriate items, they organize their selections into a centralized document that all teams can reference. The selected items get pushed down to the individual teams for implementation with some expectation on when the items will be completed. At NextRoll, we’ve opted for a quarterly deadline. Like any engineering project, deadlines can slip, but what’s important is to be on top of the tracking.
Some of the side benefits of this task force meeting are that participants gain a better sense of how other systems they aren’t responsible for work and interact with each other. They also get a better sense of other teams’ product roadmaps. If they’re technical leaders for their respective teams, they can bring that knowledge back. On top of all this, the task force meeting is yet another opportunity to brainstorm and share notes. As the team goes through all the items, new opportunities and solutions may present themselves that can be discussed with the implementing teams.
I’m a big fan of this quote, probably apocryphally attributed to Karl Pearson, and likely expressed by others before him:
“That which is measured improves. That which is measured and reported improves exponentially.”
After each team implements an item, they should spend some time to measure the actualized savings and report back. The task force can log these data as they come in, and track how well estimates line up with reality. This feedback loop will hopefully lead to better estimates in the future.
The most obvious benefit here is that these numbers can help FP&A teams with their jobs. Understanding the overall impact on the company justifies the effort spent achieving the results. Measuring the return on investment will also help in the future when it’s time to tackle further initiatives.
And, of course, measuring the results provides engineers with some actual satisfaction by seeing the results of their labors. Since we approached this collectively, every engineer contributed to the outcome. The flipside of the “death by a thousand cuts,” is that the thousand bandages add up to something big and meaningful. The task force should take some time to communicate the summarized results to not only the whole engineering team, but the whole company. Whatever medium works for you is fine; at NextRoll we report on our results at our All Hands meetings.
At NextRoll, the task force meets monthly to quickly go over updates. We chat about progress, blockers, and celebrate successes. It’s not particularly different from a scrum of scrums. With the distributed contributions of every engineer, the task force meeting is not particularly onerous because they’ve been provided with the information they need ahead of time.
5. Lather, Rinse, Repeat
Once the original deadline is passed, the task force reviews unfinished items, reevaluates items that weren’t prioritized in the first round, and continues to prioritize and push things down to the teams for the next period. This ensures that harder, but perhaps bigger, opportunities don’t become forgotten and dropped. Recall that the first pass was about the best bang-for-the-buck projects. Things will get harder, but with an appropriate balance, the payoffs should be higher in the next round.
Momentum shouldn’t be lost. Cost saving should not be a one-off initiative because, as stated, costs tend to accumulate over time. In some sense, it’s easier to tackle things as they come up; on the other hand, it’s not easy to change culture. Explicitly maintaining cost reduction as a priority sends a strong message that the value of “Do more with less” is not empty.
One thing that I’ve noticed at NextRoll that’s been great is that individual engineers continue to ping me about new opportunities they’ve managed to uncover, even outside of brainstorming sessions. We add these items to our overall tracking and make sure these ideas don’t go unnoticed or unprioritized. Sometimes I even get pinged with cost reduction measures that have already been completed, and I make sure even those things are documented and tracked to recognize what these engineers have accomplished.
And this is really the attitude that we’re looking for. Yes, it’s a decent amount of upfront effort to document and coordinate everything. But once the process is in place to solicit creative ideas from all sides, the tracking mechanisms are in place to measure accomplishments, and recognition is provided, cost savings becomes less of a chore and hopefully a rewarding process unto itself.
I hope this provides a glimpse into our cost saving process at NextRoll. Obviously, not all of these processes are necessarily applicable to your own company or situation. Adapt as you see fit, but whatever process you decide to implement, I strongly recommend adhering to the following principles:
- Small stuff adds up. Think globally, act locally, as they say.
- Put the idea generation in the hands of the engineers. They know what they’re doing.
- Crazy ideas are fun and can spur further innovation.
- Centralize coordination and prioritization.
- Estimate. Measure. Report.
Thanks for reading!