Experiments in an Agile World

Practical Experiments In An Agile World

In my previous blog post, I wrote about data-driven experiments.  Unfortunately, it lacked any practical application, and today I want to do something about that.  But let’s get one thing straight first.  This isn’t my favorite topic.  Why?  It focuses so heavily on data and so little on humans.  When we get too wrapped up in data, we lose sight of the human dimension, and that’s bad.  Never lose sight of what matters.  That’s not you.  That’s not data.  It’s your teams.  Keep this in mind when you run data-driven experiments.

Before I show you some of my own experiments, let me explain the questions I ask myself when building one:

  1. What problem (or behavior) are we attempting to solve (or highlight)? This is the heart of your experiment.  Never lose sight of this.  We will, and when we do, immediately bring it back to center as often as necessary.
  2. What data can we collect that addresses the question above? Collect this data and no more.  The more data we collect, the more time it’ll take, and we risk muddying the focus.  We also risk inundating our teams with too much data so it’s imperative pare it down to the absolutely minimum.
  3. Who will we share this data with? Honesty in your data is important.  If the team feels that they will be judged by those in power, they may unconsciously game the system and manufacture the data we want or expect to see.
  4. How can we create simple view into my data? The simpler, the better. Consider the burn down chart.  With just a few simple data points, it can inspire a great conversation.  It’s these conversations that make for a powerful experiment.

Work In Progress (WIP) Experiment

Multi-tasking has a cost, and it’s often a cost that many teams overlook.  In fact, this was one of my top 10 tips in a previous blog post.  While I was a scrum master for three teams, I wanted to highlight how much work each team had open on every day of the sprint.  Once a day and at the same time, we recorded the percentage of stories in progress by each team and created the graph you see below.  Notice that I also included an “optimal” range.  This range isn’t based in any science.  It’s simply where my gut told me our teams should be.  Here’s what they looked like after the first sprint of the experiment.


Here they are after the next sprint.


Notice how just the act of observation alone helped reduce multi-tasking.  It’s worth noting that with this experiment we weren’t attempting to adjust their behavior whatsoever.  We simply wanted to highlight what percentage of stories were open for every day of the sprint.  Of course, they knew what behavior I hoped to create, but I underemphasized this as much as possible during the course of the experiment.

Finally, here’s the results of the entirety of the experiment.


Some final thoughts on this experiment:

  • As previously mentioned, this experiment wasn’t intended to change any behaviors but to highlight an existing behavior.
  • Because this experiment wasn’t crafted to adjust behaviors, we were less concerned about creating any unintended side effects. I still made a point to underemphasize my intentions and instead focused them on the data so they could reach their own conclusions.
  • Teams were also interested in seeing how other teams compared so we shared the data across all my teams.
  • Data was not shared with management or executives. This wasn’t for any particular reason.  They supported the experiment, but the data wasn’t especially interesting for those not on a team.

Confidence Experiment

This experiment is more complicated than most and is certainly time intensive. Let’s talk about what inspired this experiment:

  • There was a great deal of churn in the sprint backlog in every active sprint.
  • This change of the sprint backlog was done by the product owner with little to no input from the team.
  • Logically, teams understood work shouldn’t roll over from one sprint to the next, to the next, and so on, but that’s what consistently occurred.
  • This constant change in priorities due to a changing sprint backlog was becoming a burden on the team.

For this experiment, we asked team members how confident they were that we’d complete everything in the sprint backlog before sprint end.  We did so every day, and we did so privately as to not bias other team members.  We also recorded how the sprint backlog changed on a daily basis to determine if there were any correlations between the change in the backlog and the team members’ confidence.

Below are some charts from the experiment.  This first chart is the team’s confidence for each day of the sprint.  Because we feared team members would feel compelled to report near-100% confidence, we masked their names during the course of the experiment.  I found out later that only a handful preferred the anonymity, but it did emphasize to the team how important honesty was in their confidence values.

If a team member reports 100% confidence, then it means s/he believed that 100% of the story points in the sprint will be complete by sprint end.  If a team member reports 60% confidence, then it means s/he believed 60% of the story points will be completed by sprint end.


I’m sure you notice the drop in confidence on day 2. The team ran into a snag on the 2nd day of the sprint, realizing they were blocked by another team.  This was an important teaching point for the product owner since he later realized he could have foreseen the blocker, and this experiment helped him quantify the impact on the team.  Here’s another graph showing a few pieces of data:

  1. (Blue) Average confidence by day of the team.
  2. (Orange) Percentage story points in the sprint by day. If over 100%, then this means work was inserted into the sprint after sprint start. If under 100%, then work was removed from the sprint after sprint start.
  3. (Gray) Percentage of work complete. If equal to 100% at sprint end, then all sprint work was completed by the team.


Some final thoughts on this experiment:

  • The team did a subpar job completing its sprint backlog in the graphs above.  Today, this same team hovers between completing 80% to 95% of its backlog every sprint.
  • We ran this experiment for a total of 3 sprints.
  • In retrospect, I wish we had explored other ways to represent the data since I don’t feel we did the data justice.
  • Teams were excited to see the outcome of this experiment. I believe they enjoyed the act of reflecting on their own confidence in completing their sprint backlog and comparing it to others on the team.
  • The conversations we had while looking at the charts were passionate and informative.  It also helped us make some impactful changes to better predict our capabilities so we considered it a tremendous success.
  • It helped highlight to the product owner group the importance of gaining team buy-in any time an adjustment to the sprint backlog is made while simultaneously highlight how often we were adding work to active sprints.
  • Teams began owning their sprint backlogs to a greater degree and leaning on product owners any time they asked to make a change.
  • Executives were surprised to see such low confidence values. I feel they were undervaluing the importance of collective ownership and team buy-in when it comes to the sprint backlog.

Each experiment we’ve crafted is different, and we’ve never repeated the same one twice.  With each, we started from a blank piece of paper and asked ourselves the questions that began this blog post.  We didn’t draw on experiments from others in the community, and I provide the examples above not to be recreated but to inspire something that’s uniquely yours.  Find data that your teams will enjoy. Measure it. Visualize it. Finally, talk about it with the team.  Use data as a tool to better your teams, and never use it as a weapon to control or judge.

Do you want to get notified when new posts are published? Leave your email below.

9 thoughts on “Practical Experiments In An Agile World”

  1. I remember these graphs! I think I misunderstood the definition of “confidence” when we did this. I thought that was my confidence level of us completing all of the story points, instead of the percentage of story points I feel confident that we’ll finish. But either way I’m glad we learned from this 🙂

    I was a huge fan of the WIP experiment too. It sure looked like we reduced multi-tasking!

    1. Hey, Betty! I mentioned in my post above that this experiment was complicated. Your confusion was part of the reason. Many were in the same boat, and it required a great deal of explanation. It’s funny to look back at the results of that experiment now, considering how far the team has come since then.

  2. On a separate but related note, I noticed that both experiments we were comparing our own “before” and “after”, which was probably the goal of the exercises, yet by natural also biased. Have you thought about crafting a “A/B testing” like experiment? With only a couple teams it’s harder to do, but the data nerd in me would really like to see how things will play out!

  3. Hi Tanner – coming to the party a bit late. Listening to you on Agile Uprising podcast caused me to check out your website. I’ve seen many interesting topics posted here and I aim to make trips here more frequently! A few months ago, I got inspired by this post which I found after learning about Standup Poker from Kalpesh Shah. This is very interesting work. I came up with my own variations and tried it out over the last 3 sprints of a 7 sprint release. For example, I used work items not points and a different polling frequency. I thought the best part was getting the team to talk more about their current plan to attack the sprint goal and how to adjust. Next, I tried daily polling for a sprint and then running standup poker just on day 8 of 2 weeks sprints to see how the team felt about it. I am still unsure if all the teams I serve want to continue this practice and I will find soon what they want to do. I know at least one team wants to continue in some form. Do you still practice Standup Poker with your teams? Has it evolved in any way? Have you had any additional learnings?

    The internal paper I wrote about the 3 sprint experiment had four conclusions.

    1. Team conversations about to how meet the sprint goal correlated to the teams delivering more work than initially planned.
    2. Higher confidence occurred when the team had early deliveries in the sprint and a constant flow of delivery. The simplest way to create the constant flow and learning is to create more small stories that be completed earlier in the sprint. Swarming on stories whenever possible is another way to create the constant flow.
    3. There was significant churn in the sprint goals towards the end of the release which correlated to reduced confidence.
    4. The confidence typically dropped over the sprint cycle. A personal observation, not shown in how the data was collected, is that the team members who did QA typically had lower confidence especially towards the back half of the sprint than the developers who voted. This could be construed as the team did not think of the confidence vote at the “team level” of reaching their goal.

    1. Hi, Vic. I had no idea that standup poker was a thing and is akin to one of the experiments I mention above. Thank you for teaching me something I didn’t know.

      Are my teams still doing it? Not any more. This experiment and a multitude of conversations and tweaks have led to our teams consistently delivering on their commitments, and in some cases over delivering. Some might worry they’re sand bagging by way of deliberately undercommitting, but I’m confident that’s not the case.

      Now, we’ve been experimenting with innovative ways to use trends over time to inform our conversations. In fact, I’ll be telling that story in an upcoming meetup in the Valley.


      Finally, I’m glad to see you were taking your learnings to tweak how you were conducting your work and to see how the team reacted. As far as your conclusions go, they sound similar to my own experiences.

  4. Thanks for your reply Tanner. I’m still trying to get through your site as I see lots of interesting titles in your posts!

    One of my teams stopped the Sprint Confidence vote and the other decided to do it on day 3, 6, and 8 with a quick throw and look around to see if they need to find out why. Their decision with the funny part being the team that stopped in my mind needed it and the other did not. Oh, well, it is their choice. No charting results or even me sticking around when they do this as I want the conversation to be for them to plan and let them ask me for help if they need it.

    I wish I could be at your meetup. I’d really like to hear your ideas on metrics and results. For me, there have been only 2 helpful metrics – cycle time which I have used to show the team that our focus on smaller stories is working and to estimate more consistently and the Team Morale Survey – http://teammetrics.apphb.com/

Leave a Comment

Your email address will not be published. Required fields are marked *