ONE UPGRADEABLE PLATFORM: STABILIZING AN AIR-GAPPED KUBERNETES PRODUCT FOR REPEATABLE CUSTOMER DELIVERIES

Written by Kinetive | Feb 23, 2026 2:53:37 PM

A Product That Ships in a Box and Must Run as Shipped

A global industrial company delivered “ready-to-run” environments into customers’ locations. These were not friendly cloud accounts with infinite bandwidth. They were controlled sites, often with strict network rules, and in this case: truly air-gapped. No internet access. What ships with the environment is what the environment has.

Over time, the platform grew the pragmatic way. Some components ran as Linux services, others as Docker services, and a Kubernetes cluster carried the newer workloads. That mix helped the company move fast early on, but it also created three different operational models to maintain—while still having to package the whole thing for repeatable customer deliveries.

When the business started pushing for smoother upgrades and a credible path to managed Kubernetes services later, the cracks showed. The platform needed to become more stable, more consistent, and easier to operate—without turning the product team into accidental platform engineers.

Upgrades Were Becoming a High-Risk Event

The main pain wasn’t a single outage or one bad release. It was that upgrades and day-two operations were harder than they needed to be.

There was no dedicated Kubernetes platform engineer to set standards and remove the rough edges. Developers did their best, but without time to dive deep into Kubernetes best practices, patterns varied across services. A few typical symptoms appeared:

Upgrades felt risky because installs and configurations differed depending on when and how something was added.
Some workloads still depended on underlying host configuration, which is fragile when you ship environments to customer sites.
Troubleshooting took longer than necessary because logs and metrics weren’t consistently structured or easy to access.
Air-gapped reality magnified every missing dependency and every “we’ll fix it later” shortcut.

Support teams felt it too. If each customer environment drifted even slightly, diagnosing issues became slower and more expensive. And when customers can’t “just check the vendor dashboard,” the platform itself must provide visibility.

Approach

The company wanted a low-friction engagement: clear scope, visible progress, and minimal risk to ongoing deliveries. They didn’t need a lecture on Kubernetes. They needed a platform that behaves predictably.

Kinetive joined as a small senior partner to work alongside the team and consolidate the environment into Kubernetes using best practices that would still make sense in a managed Kubernetes future. The guiding principles stayed practical:

One operational model: Kubernetes-first, no parallel runtime zoo.
Offline-first supply chain: everything needed for installs and upgrades must be available internally.
Reduce host coupling: the platform should not rely on “that one node setup.”
Improve observability: developers and customers should be able to spot bottlenecks in real time.

Turning Three Operational Models into One Kubernetes Playbook

The work was delivered in short loops, prioritising stability and supportability first, then portability.

1) Aligned on a target platform model and migration plan
A short discovery and mapping workshop clarified what was running where, what carried the most operational risk, and which components should move first. The team agreed on consistent patterns for how applications should be deployed, configured, and upgraded.

2) Consolidated Linux and Docker services into Kubernetes
Services moved in stages—starting with quick wins, then the ones causing the most operational friction. Each migration reduced variance and removed “special case” deployment paths. Over time, Kubernetes became the single place to run and operate the system.

3) Standardised installs with Helm and environment overlays with Kustomize
Helm provided versioned, repeatable packaging for platform components and applications. Kustomize provided clean, reviewable overlays for differences across environments and customer sites—without copy-pasting manifests. This made upgrades more systematic and less stressful.

4) Adopted cloud-native operations patterns, including operators where they reduced toil
For components where lifecycle automation actually matters—installation, reconciliation, upgrades—operators reduced manual work and improved consistency. The key was selective use: operators were introduced where they lowered operational load, not as a blanket rule.

5) Removed dependencies on the underlying hosts
Hidden assumptions about node configuration were systematically eliminated. The goal was simple: the environment should behave like a portable product that can run on different Kubernetes distributions, including managed Kubernetes later.

6) Improved logging and monitoring so bottlenecks are visible in real time
A major end result was better cluster observability. Logging and monitoring were strengthened so both developers and customers could see what the system is doing as it runs: error patterns, resource pressure, and performance bottlenecks—without waiting for a support ticket to escalate. This shifted troubleshooting from guesswork to evidence and made performance discussions concrete.

7) Enabled the team with documentation and coaching
Runbooks, upgrade notes, and practical “how we do things here” documentation were delivered alongside the changes. Developers didn’t need to become platform specialists overnight, but they did gain a paved road: consistent patterns they can follow confidently.

Results

The improvements landed where it matters for an industrial company shipping customer environments.

Stability improved through standardisation. Fewer special cases and one operational model reduced drift across customer deliveries.
Upgrades became routine instead of heroic. Versioned installs and consistent patterns made upgrade work far more predictable.
Support became faster and calmer. Better logging and monitoring meant issues and bottlenecks could be spotted in real time, often before they turned into customer-visible incidents.
A path to managed Kubernetes became realistic. By reducing host coupling and following best practices, the platform became portable by design.

Less Tribal Knowledge, More Paved Road

Developers got time back. Instead of remembering which service deploys where and how to debug it, they had consistent workflows and better visibility into system behavior.

Operations and support got leverage. Real-time observability reduced “blind debugging,” and the platform became easier to run across multiple customer sites without each environment turning into its own special snowflake.

In a small-to-mid sized Finnish context, that’s the real win: less cognitive load, fewer surprises, and a platform that scales with the team you actually have.

Next steps

With consolidation and observability foundations in place, the company can build further without destabilising deliveries:

Strengthen policy and provenance controls for the offline supply chain
Standardise “golden paths” for new services so best practices are automatic
Continue improving reliability targets with consistent SLOs and alerting across customer environments
Plan a staged move to managed Kubernetes when the business case is right

Shipping Air-Gapped Environments? Let’s Make Upgrades Boring.

If you’re delivering offline customer environments and Kubernetes upgrades feel like a recurring pain point, Kinetive can help you consolidate, stabilise, and modernise without slowing the business down.

Reach out to discuss your current setup and what a low-risk path to a Kubernetes-first, upgrade-friendly platform could look like.

View full post