Platform Engineering: Building Developer Platforms Right
Most companies don't need a platform team. The next tier up absolutely does. I've seen companies scale to 30 engineers on a single team without needing a formal platform. I've also seen companies with 100 engineers still trying to manage infrastructure through tribal knowledge and shared Slack channels.
The difference is: what's the tax on shipping features? If engineers spend 90% of their time on features and 10% on DevOps, you're fine. If it's 70/30, you need a platform team.
What Is a Platform?
An internal developer platform (IDP) is a curated set of tools, workflows, and services that let product teams deploy and operate software without needing to be infrastructure experts.
That's the boring definition. Here's what it actually means: Your platform team builds self-service capabilities so that a product engineer can, at 2 AM with no infrastructure knowledge, deploy a fix to production without involving an ops engineer or an on-call rotation. The platform provides guardrails (this code can't access that database, this service can't breach these resource limits), visibility (you can see your service's behavior in production), and controls (rollback, gradual rollout, alerting).
A platform is not Kubernetes. Kubernetes is a tool your platform might use. A platform is not "we have Terraform repos." That's infrastructure as code, which is great, but platforms are bigger than that.
Good platforms are invisible. Engineers don't think about them. They have an idea, write the code, run one command, and it's in production with monitoring, logging, and alerting. They don't think: "How do I deploy this?" They think: "This is deployed."
Bad platforms are the opposite. Engineers spend hours reading wiki pages, asking questions in Slack, and jumping through approval processes. They become a productivity tax, not a multiplier.
The Golden Path
The single best concept in platform engineering is the golden path. It's the boring, well-paved road to doing the common thing. For 90% of your services, there's a standard way to do it. That should be the easiest way.
The golden path means: You have a standard way to write services. (Most companies: a template repository or a starter project.) A standard way to deploy them. (Most companies: one command to the platform.) A standard way to operate them. (Most companies: built-in dashboards and alerting.)
This doesn't mean everyone uses the golden path. It means the golden path is so easy that not using it requires justification. If 95% of services are Node.js, the golden path is Node.js. If you're using Rust, it's not because the platform makes it hard; it's because you have a specific reason (performance, correctness).
The golden path is how you avoid platform bloat. You don't support 15 programming languages, 10 deployment strategies, and custom infrastructure for every team. You support one thing, and you make it great. Want to deviate? You can. You're just choosing a harder path.
Key Capabilities of a Mature Platform
Self-service environments. Engineers request an environment (dev, staging, prod) and it's provisioned automatically. No waiting for ops. No manual configuration. The environment matches your standard configuration. This one capability eliminates a massive category of "why isn't this working" bugs.
Deployment pipelines. Code push triggers a series of automated steps: test, build, deploy. The pipeline is standardized but customizable. Teams don't write their own CI/CD logic (they don't have to); they configure the provided pipeline for their needs. Your platform team maintains the pipeline; product teams use it.
Observability tooling. Every service automatically has logging, metrics, and tracing. Engineers don't have to set it up. They write code, and the platform captures what's happening. This is huge. Most platform teams skip this. Don't.
Secrets management. How do services access databases, third-party APIs, and other services? Through secrets, managed securely. The platform provides a way to manage secrets that's easier than storing them in code (which they definitely will if you make it hard) but more secure than a shared Slack file.
Standardized networking and service discovery. Services need to talk to each other. The platform provides this. Services register themselves on deploy, and the platform handles discovery, load balancing, and retries.
Common Platform Mistakes
Building a platform nobody uses. This happens when the platform team doesn't talk to product teams about what actually blocks them. They build "best practices" infrastructure that solves problems nobody has. Then they're surprised when teams run their own deployments in a way that terrifies ops. Talk to your users. Ask what three things would save them the most time. Build those.
Over-investing before you need it. I've seen companies with 10 engineers invest platform infrastructure that works great at 100 engineers. It's wasted effort. You don't need that level of sophistication until you actually have the scale problem you're solving for. Start simple. Scale the platform as the company scales.
Building a bureaucracy instead of a platform. If using the platform requires more approvals than not using it, you've built the wrong thing. The platform should be faster and easier than the alternative. If it's not, it won't be used.
Forgetting that platforms are products. Platforms are not "nice to have infrastructure work that happens in the background." They're products with users (your engineers). They need a PM. That PM should work with platform engineers to understand what product teams need, prioritize what to build, and decide what to deprecate. Without a PM, your platform team ends up building what they find technically interesting, not what engineers actually need.
The Platform PM Role
Many platform teams don't have a PM. This is a mistake.
A platform PM works with product teams, understands their pain, and helps translate that into platform investments. A platform PM also says no - "We're not supporting that, here's why." A platform PM thinks about adoption. "If we build this, will teams use it? How do we make migration from the old way painless?"
This is a different PM skill than managing user-facing features, but it's the same mindset: understand users, solve their problems, measure adoption.
Connecting to Codebase Intelligence
Here's where Glue helps platform teams. Platform teams need to understand how engineers actually work, not how they say they work.
Glue lets you ask: "Which teams are using the golden path? Which teams have custom deployments? Of those custom deployments, what's actually different? Are there patterns we can standardize?" You might discover that five teams have built nearly-identical deployment logic in their Makefiles. That's a platform candidate.
Or: "Which services don't have proper observability? Which ones are missing alerting?" This tells you which teams need onboarding to your observability tooling, or whether your observability is hard enough to set up that teams are skipping it.
Or: "Which services violate our platform's service size guidelines?" If you've decided services should be under 50k lines of code and you have 10 services over 200k, that's valuable visibility into where your golden path isn't being followed and why.
Codebase intelligence helps platforms stay aligned with actual practice, not doctrine.
Platform Engineering in 60 Seconds TL;DR
Platform teams scale engineering productivity. They're not needed at 10 engineers; they're essential at 100. Platforms provide self-service capabilities: deploy without ops involvement, monitor automatically, manage secrets safely. The golden path is the boring, well-paved way to do the common thing (it should be the easiest thing). Platforms are products - they need a PM. Don't over-invest before you need it. Avoid building bureaucracy instead of productivity.
Frequently Asked Questions
Q: How big does a company need to be to have a platform team?
A: Typically 50-80 engineers is when you start seriously needing one. Before that, the overhead of organizing a separate team is worse than the pain of infrastructure being decentralized. Watch for the inflection point: if infrastructure questions are blocking product development regularly, it's time.
Q: Should platform engineering report to DevOps, Infrastructure, or Engineering?
A: To engineering or the CTO. If it reports to DevOps or Infrastructure, it stays infrastructure-focused. If it reports to engineering, it stays product-focused. Platforms are products for engineers. They should report alongside the teams they serve, not below the infrastructure org.
Q: How do we migrate existing services to the platform?
A: Incrementally. Pick your simplest service first. Work through onboarding it. Document what was hard. Improve the platform. Pick the next service. In parallel, new services use the platform from day one (make sure of this - don't let new services build around the platform, off the platform). After a year, most services should be on it.
Related Reading
- What Is Product Knowledge Management?
- What Is Codebase Intelligence?
- The CTO's Guide to Product Visibility
- Glue for Competitive Gap Analysis
- Knowledge Silos Are a Product Problem, Not Just an Engineering Problem