Lessons from Building an AI Engineering Manager

Feb 14, 2025

Everyone’s talking about AI for software engineering: from Devin to Cursor, it’s clear that the future involves the creation of much more software than ever before. But if the underlying work becomes more accessible to AI, then maybe more value accrues at the management level1.

This was the observation that put me on a two-month quest to build AI for engineering managers (EMs). My practical observations were:

In the future, every EM will be responsible for managing larger, faster, human-machine hybrid teams. Raw velocity will increase.
Even today, being an EM isn’t easy: managing feedback, planning, meetings, performance, code reviews, etc. – it’s a ton of high-context work.
Many other stakeholders, like PMs and executives, don’t fully trust or understand what’s happening in their engineering teams, and want insight;
LLMs are really good at ingesting text from disparate sources and bringing it all together in a flexible, insightful way.

I thought I saw the contours of a useful product: integrate with GitHub and Linear, set up integrations to pull all the data in real-time, dissect and assemble the data in various useful ways, and then expose that via an LLM interface that allows a user to make requests and get insights.

Two months later, this turned out to be much less viable than expected. In this post, I’m going to share what I learned: while engineering management looks like one discipline from afar, up-close it’s a long tail of differing use-cases that are hard to abstract over. You can build many small tools, but there are very few that are big enough to support a real business.

Context: My Needs

I’ve been building fully-remote software businesses for the past five years. In that, I manage small engineering teams that are working on various products around the clock. As such, I need to know:

What are people working on? Are they getting stuff done? I don’t want Ghosts.
Are projects getting done on time? What’s running over in the current sprint?
How can I better plan our projects?

Like the old YC adage goes, I started by building for myself. Version zero of engmanager.ai was a simple GitHub bot that sifted through every commit on every branch, and gave me a daily overview of what’s happening:

MVP

The daily summaries were useful to me. Friends mentioned this could be useful to them for performance reviews, productivity analytics, and generally for getting a quick daily lay of the land. That’s when I tweeted out my thesis, which struck a chord with a larger audience. I got on dozens of calls with engineers and EMs. The problems people raised fit into three broad categories:

1. Productivity

Initially, I thought of my product’s value proposition mostly in terms of analyzing productivity: are people getting enough work done? Are they doing the most important things? That’s a pretty basic question, but many organizations struggle with it. When I started bringing my solution to EMs, I found that the reaction was generally:

Yes, productivity is important…
…But we don’t want to pay for it…
…Because analyzing it is not directly actionable.

In practice, productivity tools give managers lots of dashboards, but that doesn’t drive outcomes unless someone pushes initiatives for raising productivity, which is a ton of work that frankly isn’t going to win them any friends2.

The proof is in the pudding: despite this kind of thing nominally having a big market size, there are no successful products at scale. DX seems to be the industry leader with a good product, but they’re just not very big. This is the kind of startup graveyard where you can get a big TAM by looking at the sum total of salaries, but what’s seizable is much smaller.

2. Digests

The summaries I was creating were useful daily digests to me. And that seemed to match an EM need: every morning, EMs spend time going through messages, updates, and notifications to see what’s changed, and where to direct their attention. I thought there’s an easy opportunity: LLMs can scan through Github, Linear, Slack, etc. and surface what the manager needs to see. I built an MVP for daily digests, but found it hard to get the experience right:

It’s easy to surface what happened yesterday, but that’s not actually very valuable. What managers want to know is what they need to pay attention to. Figuring this out is very hard, because it’s so subtle and contextual3.
Most information is posted in low-context ways: PR comments or Slack messages often aren’t addressed to anyone, and figuring out the properties critical to managers – things like urgency or controversy – is still error-prone with the tools we have today.

The key insight for me was that I wasn’t really being asked to automate summaries – that’s easy – but what people really wanted me to do was to automate figuring out what’s relevant. And that’s very difficult and subjective.

Potential customers wanted additional product variants in this space: changelogs, business updates, work summaries for performance reviews, etc. However, none of these were really urgent problems with any worthwhile willingness to pay.

3. Planning

Many EMs mentioned needing help planning: they need to understand velocity, see what current work is going to run over, and how best to schedule future work. The key question is always: how long should this task take? I spoke with EMs who showed me sophisticated homegrown systems for estimating storypoints and leveraging existing reporting from their ticketing system. I heard three main concrete feature requests:

Detailed velocity estimates by team or sub-team;
Storypoint prediction at ticket creation4;
Better forecasting for completion of work in flight.

The problem was that none of these features were sellable on their own. The feedback was clear: managers would need all of these features as one, combined into a comprehensive engineering planning platform. And that’s just a really big lift, because all three of these are tricky technical problems – easy to do poorly, hard to do well. The amount of work that needed to be done before there was any ostensible willingness to pay left me skeptical.

Challenges: Who’s Your Customer?

As I dug into these product requests, I noticed a subtle tension. Tools for EMs don’t always just help them do their jobs. Sometimes they show when EMs are bad at their jobs.

For example, CTOs and EMs themselves are somewhat at odds with productivity analytics, because the product fundamentally exists to reveal deficiencies in their organizations. If it is revealed that Employee 23 hasn’t done any work in four months, then that’s not the only person getting fired.

The ideal customer appears to be not a CTO or EM, but a CEO who is skeptical about exactly what’s going on in their engineering organization and wants a neutral party to provide insights. They have the power to actually implement these tools, when they would die on the vine if brought to other stakeholders. However, while I met quite a few CEOs who got excited about the idea of the product, the practical reality is that this is just not an urgent enough priority for them. They buy it but don’t look at it, and that won’t last as a product.

Challenges: Context and Accuracy

Most of the products that I explored shared one fundamental difficulty: context. Engineering organizations are complicated beasts. The sheer amount of text you’ll find in Linear, GitHub comments, Slack Confluence, not to mention an entire codebase5, is massive, even for small companies. Actually parsing all of this and having an LLM figure out what specifically a Slack comment means or what a Linear ticket intends is very hard. You have to use RAG because the size of information is far beyond what you can squeeze into a context window, and running retrieval competently across all these disparate data sources and tiny text chunks6 is just beyond what’s feasible today. We will get there in the coming years, no doubt – but not on the near-term timeline that’s relevant to me as a builder.

Conclusion

Over the past two months, I talked to dozens of talented engineers, managers, and executives. I ran dozens of trials for engmanager.ai. What I found is a space where the business theory looks great, but the practical reality is grim. My bullet-point takeaways are:

In some industries, building a product as a suite of AI point solutions can work well. But that only applies when everyone has similar workflows, and each one of them is valuable enough to be a sellable product.
Tread carefully in applications where LLMs have to figure out context or relevance. The subjectivity of that task can make it a very hard problem.
A market size can look very large because there’s a huge sum total of salaries and you can create a few percentage points of efficiency. But the true market size is limited by what’s actually seizable, what you can sell. In this case, it’s way less.
Many markets look like one concrete thing from far away, and as you get close, they’re a gazillion little problems, and every customer wants to solve some small subset of them. That’s a startup graveyard.
If VCs are more excited about what you’re working on than the customers – listen to the customers.
Orchestration of AI agents is probably still the next big frontier in (engineering) management, but we don’t yet know what that’s going to look like. I have some thoughts about what to build here, and will share those in the coming weeks.

Or if you take a pure technical view, at the orchestration layer.

This might be the kind of thing that is very unpopular today and therefore not adopted, but at some point in the future this dynamic might change.

Maybe with vastly larger models that can intake a company’s entire Slack or GitHub history, this kind of information starts to become accessible in an accurate way. But that’ll take a while.

Linear kind of does this by suggesting similar tickets when you are creating new tickets. You still need to check the time-to-completion, but it’s better than nothing.

Especially if you need to take into consideration its version control history!

Consider the number of high-signal Slack messages that are <25 words…

Likely Wrong

Discussion about this post