Estimating a backlog of 100 items in 1 hour? Challenge accepted.

The situation

The agile team was working in a still mostly project-based environment. Over the period of several months, they had created backlog items, both with nice business value and also items to get rid of some technical debt they had piled up, getting to a situation where they had about 125 backlog items, of which only a dozen were estimated and refined, ready for the next 1,5 sprints, "as Scrum prescribes".

But the program/project organisation had hit a snag. For some reason decisions were being delayed, and the PO did not get any decently described work for the team. The customer did not know what they wanted. The team would have to "wait". Maybe even for a few months. And before long, as happens in organisations where only a minority of teams have flipped to an agile way of working, the "resource allocation" was being questioned. How about taking some "resources" out of the team and "let them return once there is work again".

No-one was happy to even hear that question. And as the backlog showed: there was work! However the organisation needed to know: just how much work is there? For a sprint? 2 sprints? 10 sprints? We needed to know with enough certainty. And fast. That same afternoon there turned out to be a meeting about the "resource allocation and planning".

What we did

First, we set the PO to work. The older backlog items had not been evaluated for a while, and some might no longer be needed. So a first cleanup was done. 24 items were deleted for several reasons.

Then we went into refinement with the entire team.

I created paper cards with the backlog items, 1 item per card with the description and a few acceptance criteria. I added all previously estimated items as well, but without their estimations. I've come to know this team, and I knew they would want to see some objective "proof" that this system works. Benchmark items they were unaware of. For further details I added the Jira reference for when the team was really lost about the meaning of an item, since descriptions can sometimes be very vague and the basic information to clear up uncertainty might already be in some comments based on earlier questions asked to the customer.

The team split up in small groups and I let them start categorising the items by simply comparing them to each other and adding them to t-shirt sized columns: Small, Medium or Large. Also a column "not now" or "stop" was added for items that were no longer needed or not of any business value due to several reasons. Each group of 2-3 team members contained a mix of specialists, so they would get a quick view on e.g. combined back-end and front-end impact, but the entire team was not allowed to discuss together. Good enough was good enough. They were aware that to get 101 items estimated in just 60 minutes, there was less than a minute to decide for each item, and it had to be even shorter, in order to get to the story point system they were used to. If there was too much uncertainty caused by the need for expert domain knowledge or technical constraints, they could put the card back into a central pile and move on, but at the end all cards needed to be categorised.

Once this was done we took the column “small” and repeated the exercise for the S column only, but now with extra columns: XXS, XS, S. After that we looked at the column “medium” and continued the exercise but now with columns SM, MM, LM added to the sequence. And then column “large” got the same treatment, adding columns L, XL, XXL.

Then came the time to get it back to the story point system the team is already working with. The XXS column got the smallest estimate, 1, and the XS one got 2. Then the team had to compare columns 2 and 3: is the difference of work (taking into account risk, complexity and repetition) about 50% or at least double? If 50%, the next column gets a 3, if not, it goes straight to 5.  

We continued this way of questioning and adding the numbers of the Fibonacci row until all columns were covered.

At this point, you usually see some items switching columns, but not much, and certainly no more than one column. If you want to find out more about the why and how of estimating and forecasting, check out this earlier blogpost, where a variation of this exercise is explained to create a reference catalogue, as well as a number of other techniques you can consider.

The Conclusion

We finished within just about an hour. After the exercise we had 87 items left on the backlog, 3 of those were newly created by the PO and were a kind of natural refinement that happened when the team were looking at other items, the others were in the "not now" and "stop" piles, and validated as such by the PO. Based on the velocity of the team enough work to fill 16 sprints, including some unrefined epics. 

The benchmark items showed a very slight difference for a few items (especially in the 1-2-3 SP range). 2 items of the entire backlog were way off, both in the higher range, and in a short check afterwards it turned out they were both large items that had been categorised by team members with slightly less domain knowledge, and therefore a higher level of risk and complexity. A perfectly normal phenomenon.

The aftermath

The cool thing is that we now have a reference catalog. We can do away with endless planning poker or finger showing estimation sessions and keep refinement for what it was meant to be: actually discussing, drawing, splitting backlog items. And then quickly put them in the needed category. The overhead of having the full backlog estimated is now decimated, and the team have found a renewed energy to tackle the backlog items in a refreshing way!

And they are spreading the word.

Oh by the way, the "resource management" people were very satisfied and the meeting did not take long. The numbers were very clear and no further questions were asked. And the team lives on, untouched.