Why do so many sprints fail?

This article is part of the Challenges with estimations and possible solutions series.

The productivity of a team is a random variable, represented as the amount of work a team can deliver within a window of time.

A common practice in SCRUM teams is to measure productivity through the use of Story Points where work items receive some numeric value. Then every sprint (a time window, for example: 2 weeks or 1 month) you sum up all the points finished in that period. This represents the productivity for that sprint.

The Team Velocity is then determined as the average of productivity of the latest number of sprints. Typically this uses 10 or less sample points (aka, the last 10 sprints or less). And to then safeguard the team against overburdening themselves, the velocity is used as a budget for the number of work items that can be fitted into a sprint.

The average is similar to the 50%-tile of all the samples (and one would say, that the intent here is the capture the 50%-tile of the productivity random variable). In order for the new sprint to be considered a success, all the work in the sprint has to be finished. Which from an absolute point of view has an intended 50% chance of succeeding (the actual productivity score during that sprint needs to be at least the velocity or higher to be able to finish all the allocated work).

Being more pragmatic about it, then the variance in the data (or the variance in the productivity random variable) also plays a factor. Let's imagine the productivity random variable is following a Normal Distribution (Referring to the Central Limit Theorem here. Note: from my experience having done metrics-based forecasting I do not believe summing a couple of weeks worth of work is going to turn any non-normally distributed productivity random variable into something that resembled a normal distribution) where the mean is at 20 story points for a 2 week sprint and a standard deviation of 5 story points. Now let's do some rough calculations based on the number of story points we may be short and how that could impact the timeline of actually finishing the work. Let's model X as being the number of story points below the mean of our random variable (so below 20) and to determine how much time we still have to do let's assume that we can linearly extrapolate from the 2 week sprint: X/20 * 10 workdays. That would give us the following table:

Number of subtasks to short (X)	Expected remaining time (# work days)	Probability (%)
1	0.5	7.93
2	1.0	7.62
3	1.5	7.03
4	2.0	6.24
5	2.5	5.32
6	3.0	4.36
7	3.5	3.43
8	4.0	2.60
9	4.5	1.89
10	5.0	1.32
11	5.5	0.88
12	6.0	0.57
13	6.5	0.35
14	7.0	0.21
15	7.5	0.12
16	8.0	0.07
17	8.5	0.04
18	9.0	0.02
19	9.5	0.01
20	10.0	0.01

If we're okay with being off by 10% of our original time budget, then anything that has a higher expected remaining time than 1 would be problematic. From the table you can see this has a 34.47% chance of happening.

For those interested, if we've gotten ourselves out of budget then the average total amount of time over budget would be 2.93 working days. To me, that doesn't sound highly problematic. However, there is also an emotional component. People don't like 'failing the sprint'. So this event that happens ~35% of the time for this hypothetical team drains some of the morale of the team. And we humans tend to be affected more by the negative event, than the (supposedly equally likely) event that they completely aced their sprint goal and have time left for some additional stuff.

Let's model the emotional impact as some metric, that is decremented by 2 in case the sprint goes over budget, remains the same if the sprint finishes according to expectation (1 day short to 1 day left for extra stuff, which happens) and increments by 1 for acing the sprint. We can tabulate that as:

Event	Probability	Score mutation
Need more than 1 day to finish the sprint	34.47%	-2
Finished sprint within expectations	31.06%	0
Aced sprint	34.47%	+1

To me, this sounds like a recipe for consistently demotivating your team over the long run. And I believe a demotivated team tends to be less productive, causing a negative spiral as the decrease in motivation needs time to be reflected into the Team Velocity (due to the moving average nature of this metric) increasing the likelihood of failing the sprint due to the calculated productivity metric tending to be optimistic compared to reality.

In case you have hardly any variance in your productivity metric, then the likelihood of finishing the sprint within expectations is increased. And similarly the probabilities of failing or acing the sprints decrease as well. That means the emotional impact over time is less impact, yet it's not 0.

So what can be done about this? I imagine a couple of things if you want to act on this model:

Influence the score mutation. Make acing the sprint a celebratory moment, as to effectively increase the score to be at least as impactful as failing the sprint. And failing the sprint 'less of a big deal' to decrease the impact.
Decrease the probability of failing the sprint such that it happens less often (and if following this 'team morale model' to the letter, it should be at most half as likely as acing the sprint to ensure equilibrium). You can do that by putting less work items in your sprint. Establish how likely you would like the sprint to fail, for example, we set this to 10% (note that for this hypothetical scenario, ~17.4% would be the equilibrium point). Using the normal distribution, we can calculate that the likelihood of producing less than 13.6 story points in a sprint is equal to 10%. So plan your sprint with 13 story points worth of work, instead of 20.