Forest analytics company Bitcomp (part of Sitowise) recently teamed up with Amazon Web Services (AWS) expert, NordHero, to carry out a Well Architected Review (WAR) and assess how well their workload measures up against current AWS best practice. The results showed a number of high-risk issues for remediation but also some positive surprises.
Nordhero has a long history and experience in navigating the AWS cloud and has been performing these kind of workload reviews for so long that Amazon now recognizes and approves them as a certified reviewer. But for Bitcomp this was all relatively new. “We’d never done a review before,” said Mervi Kolehmainen, GIS Service Manager, Bitcomp, “so having Nordhero by our side at every step was invaluable support.”
Teemu Niiranen, who led the review from the NordHero side, first guided the team through the curated list of questions that the framework offers. From there, the review generated a list of improvement issues, signifying various risk levels that were in need of remediation.
The Bitcomp team also scored high in the review’s sustainability issues. Being in the forestry sector, sustainability was already high on the agenda. “We were already optimizing our cloud resources and their energy usage,” says Olli Ujanen, GIS Consultant on the Bitcomp team. This is achieved through the company’s green coding principles, designed to reduce energy and resource use. “Our systems already scale based on actual and predictive loads so that we only use the computing resources we need,” Olli explains.
This was positively eye opening for the team. “Before we did the review, we hadn’t fully understood what a large impact small steps around energy use and data storage could actually have for our operations,” Mervi affirms.
Of the improvement issues generated by the review, 26 were high-risk, and mostly coming from the three pillars of operational excellence, security, and cost optimization. Teemu’s role, aside from helping the Bitcomp team make sense of these issues, was to help cut through to the most critical and high-risk issues, relevant for Bitcomp’s business continuation.
In their daily operations, Bitcomp specializes in geographic information solutions (GIS), using satellite data from the European Space Agency to model their own data sets and smart forest analytics. The company’s maps, or layers, which are mostly of remote forest areas, number around 400, and include roads, waterways, topography, and photos of forest or infrastructure. For example, Bitcomp’s GIS data uses pattern or image recognition technology and machine learning to monitor any changes in forest ecosystems. This could be related to illegal logging, extreme weather events, or forest health, such as identifying bark beetle outbreaks.
“GIS is the central piece in our company’s offering,” Mervi affirms. “So, if these maps aren’t working then we don’t have an offering!”
With Teemu’s help, the Bitcomp team eventually settled on what they considered to be the most crucial 14 issues to prioritise their efforts on in terms of business continuity. “In those first stages, Teemu was a great help to us in narrowing the scope to maximise our time and resources,” Mervi recalls.
The sheer scope of the review questions was what Olli was most positively surprised by. “The framework goes well beyond just looking at technical assessments and improvements, to look processes and even cultural issues like how we work in our teams.”
In fact, when it came to remediating the issues, the team found the hardest issues were the ones that involved larger teams or multiple teams. “It was challenging not just scheduling to get us all together, but also figuring out the core roles and responsibilities for key personnel, as well as actually documenting them.”
The review is more than just troubleshooting a set of issues in the moment, it’s also about building up best practices for running workloads in the cloud over the long term. “At Bitcomp the impact still continues,” Olli says, “because it has shown us how to look more critically at all our operations and processes. We also have a better understanding of the total picture of our business and our architecture.”
In cost awareness, the team now understands much better which costs are coming from where and how to better optimize spending. In process reliability, they now have an incident playbook, which has already been tested a few times and it works. They have even set up early warning alerts for glitches or issues.
“Before the review, knowledge was inside people’s heads, Olli says. “Now we have clear step-by-step instructions on what to do.” That means all the core responsibilities related to an incident have been mutually agreed and written down as well as communications protocols like who messages who, and who has responsibility for investigates the faults. This structured approach helps ensure that incidents are dealt with promptly.
“When there’s an incident, the most important thing is finding what’s wrong before our customers do and figuring out how to fix it.” Mervi affirms. Bitcomp has also got better at self-reflection. For example the team holds meetings after incidents to brainstorm on how well they did. They’re also finding more metrics all the time, which better positions them to stay on top of things and to always be doing better.
When there’s an incident, the most important thing is finding what’s wrong before our customers do and figuring out how to fix it.
“The guys at Bitcomp are the heroes of their own show", Teemu concludes. “What I try to do is chair the meetings and help keep them on track, concentrating on those areas that make the best sense for their business. And our work continues.”
“Teemu has the technical knowhow and always has an answer for our questions,” Mervi says. “He’s also good at listening and keeping us calm.”