The Hidden Cost of Offshoring
The Hidden Cost of Offshoring Introduction Customers of our mid-sized Canadian software consultancy are generally well-established companies who have experience hiring vendors to help them with lar...
Your browser is unable to display this site correctly. Please try an up-to-date version of Chrome or Firefox instead.
Originally published 2011-01-21
In an effort to become more consistent with the way our company delivers estimates to clients and to each other, we decided to try an experiment aimed at discovering estimation variability and biases. The results were pretty surprising!
The experiment was divided into two parts:
This article presents the results of these experiments. A later article will go on to explore possible solutions to the software estimation problem.
For these experiments, participants were asked to estimate the build and unit-test portion of the effort only. Jonah’s standard estimation model focuses on the build effort because this is how delivery teams (who we use to do our estimations) actually think: build/unit test is the activity that delivery teams can most accurately estimate (as opposed to other activities like analysis, design, testing, deployment, management, and support).
Participants were presented with three successive estimation scenarios:
It should be noted that estimators were explicitly prohibited from considering scaffolding frameworks such rails, grails, or spring roo, which would tend to skew results dramatically depending on assumptions. Instead, the primary goal here is to generate a baseline estimate for a simple software problem, and secondarily to determine what factors affect both internal estimates and reported estimates.
The following is a summary of the results from the first three scenarios:
As expected, implementation technology and domain knowledge both help to reduce build estimates, though technology familiarity has a lesser effect: on average, technology familiarity reduces estimates by about 8% relative to the baseline, and domain familiarity reduces it by a further 27%
Anecdotally, non-technical estimators do not show as much of a disparity between technology and domain knowledge effects – a more linear pattern was typical of these estimators, who showed similar reductions for both technology and domain knowledge, on average.
It should also be noted that some estimators increased their estimates for scenario 3 relative to baseline (scenario 1), indicating that they felt they had underestimated the complexity of the domain before they were given information about it. Though this wasn’t the case for the majority of people, it does indicate that estimators make very different assumptions about the complexity of a problem when they have incomplete information.
It’s also clear that estimates vary widely across individuals. Standard Deviations range from 55% to 58% across the 3 scenarios. In normal distributions, a range of +/- 2 standard deviations accounts for about 95% of the observations. Assuming a normal distribution of results would imply that with 95% confidence, an estimate delivered at this level of accuracy could vary by as much as 116%. This is problematic for both estimators and clients, especially where the cost of the average estimate is significant.
For scenario 4, estimators were asked to again consider the parameters of scenario 3, which we assumed would generate the most accurate estimate of first three. In addition, they were asked “Given your estimate for the work, what do you report it will take to the various people who might request the estimate.” In other words – how do you adjust your reported estimate based on who’s asking?
The requesters were revealed to be a Team Lead, a project manager, your sales guy, your boss, and the client. How long do you report it will take you to build the software?
These results show that reported estimates do in fact vary depending on who’s asking; up to 28% relative to baseline. The effect is not as smooth as one might interpret from the graph above: many estimators did not change their estimates at all based on who was asking. Others increased their reported estimates by up to 100% relative to baseline, depending on how they thought the estimate would be used or interpreted. This is why the standard deviations are so high.
Interestingly, no-one decreased their reported estimates relative to what they thought it might take, depending on the requester; it turns out that machismo is not a part of Jonah’s estimation culture!
Anecdotal discussion suggests that those that padded their estimates did so because they wanted to give themselves a little margin for error, and further the amount of padding depended on one or more of the following:
We noticed that padding for team leads, project managers, and bosses was similar, and again for sales people and clients. Based on this, we (somewhat arbitrarily) segmented the above results into 3 bands, within each of which “similar” levels of padding were reported:
Recasting these and taking “average” padding values for each group gives us the following:
This segmentation attempts to quantify the effect of “contingency” and “perception management” on padding. “Contingency” padding was about 11% on average, and “perception management” padding was an additional 16%, for a total of 27%.
The high standard deviations from the four experiment scenarios presented indicate that estimates are highly variable depending on who is doing the estimating. A guided discussion was conducted to help uncover the reasons for this variability. The following is a summary of this discussion.
Estimators were asked what the biggest influencers on their estimates. We’ve already seen that knowledge of the technology stack, the domain, and who’s asking for the estimate affect reported estimates. There was also a general consensus that estimators wanted to be as honest as possible with their estimates – transparency of estimation was a stated goal of many involved in the experiment.
Among the other answers were the following:
In other words, each estimator carries a complex mental model with respect to estimates, especially with respect to the differences between the estimate itself and the reported estimate. It’s not surprising that people are somewhat reluctant to deliver estimates, even in the face of healthy specifications.
While the difference between the estimate and the “reported” estimate are interesting, it seems that the estimate we are really after is the one with no padding (“self” in the experiments above), especially since we want to make sure that the same padding isn’t added more than once to the same estimate, by different individuals making different assumptions as it’s passed up the chain. We’ve found the best way to uncover this estimate is to repeatedly ask the estimator for an unpadded estimate. They’ll usually protest, but give it up in the end (amidst a sea of caveats).
Without an established estimation process, we’ve seen that an estimate can be more than double another estimate, depending on the estimator alone! Who is responsible for this process, for reporting its outcome, and for its accuracy and fidelity?
From the perspective of our development staff, estimates generally originate from tech leads, client suggestions re: budget, sales engineers, or project templates. Many were concerned about the level of input into the estimates that they’d had before projects are sold. To the question “Who is responsible for the estimate?” the following were replies:
Curiously, no-one reported that they felt individually responsible for estimates others made, or even their own estimates when these were rolled into a larger package of estimates.
When asked “what would make you feel more responsible for your and your team’s estimates”, the following answers were reported:
Clearly, estimators feel more comfortable estimating for themselves, about domains they understand, using technologies with which they are familiar. They also want tools to help measure how well they are doing so that they can improve on their estimations over time. While these results are not surprising, the conditions that estimators need to improve on estimations over time are often either not possible or not enacted.
You might recall that the estimates from the quantitative experiment were for the build and unit test effort only. On any consulting project, there is also time set aside for discovery, analysis, design, integration testing, UAT, documentation, deployment, project management, and warranty support.
This begs the question “what percentage of the total effort is the build?”, as this has a major impact on the final estimate for all of the services to be delivered to the client.
Results were all over the map, here: 25%, 30%, 40%, 60%, 70%, 80%, “not more than 50%”, “60% regardless.” I didn’t bother to graph these.
The group began to better understand the point of the question when a follow-up was asked: “If the federal government asked you how much the total effort would be, how would you mark up your build estimate relative to a scenario in which an individual asked you what the total effort was?”, the assumption being that the federal government would be a much more formal, slow-moving, and difficult client than an individual might be. This question garnered a lot of contemplative ceiling-staring.
Obviously, the answer to this question has a huge effect on estimates that are delivered to clients, and will be the subject of a follow-up article on the software estimation methods we use at Jonah.
With respect to software build estimates, asking different people the same question leads to wildly different results. Even asking the same question to the same person multiple times leads to different results, as the estimator is cajoled into reconsidering their assumptions and thinking more deeply about the problem. It is clear that there is no single “estimate”. The answer really is “it depends.”
Technology, domain knowledge, and who’s asking all have measurable effects on an estimate, but none of these compare to the effect of the estimator themselves, including not only how much time the estimator thinks it will take, but the vast array of assumptions that the estimator has considered in support of the estimate.
So where does this leave the estimator in a sales context, when the prospective client asks for the estimate for all of the work? How much should one squeeze the estimate in a competitive situation? Where does the impulse for a developer to disregard the estimate begin ("I never said I could get it done in that amount of time")? What effect does the type of contract have on the estimate? How does one integrate the notion of the "type of client" into an estimate?
Certainly, software estimation should not be done haphazardly, and one’s approach should be defensible to both the client and to the delivery team. We’ve found that more accurate estimates are delivered if people discuss their mental models and negotiate with one another, rather than just asking for a number, marking it up and reporting it. Negotiating increases the accuracy of the estimate by helping to drive out assumptions. It also helps the team to collectively feel that they are stakeholders, and that they implicitly share a common purpose. The effect of this negotiation on estimation accuracy is unmistakable.
In a follow-up article, we'll discuss two models we use to estimate software projects, examine how these deal with the effect that the client has on a software estimate, and try to answer some of the more sticky questions above.
Jonah Group is a digital consultancy the designs and builds high-performance software applications for the enterprise. Our industry is constantly changing, so we help our clients keep pace by making them aware of the possibilities of digital technology as it relates to their business.