We have done user testing of Visma’s web site and we did not reach any overwhelming results. We actually had to scrap the quantitative data. Was this user test a total waste of money and time? No, but it was a costly lesson. Since we believe that sharing is caring, here is our story.
The test setup
A while ago, a decision to launch a new modern web site was taken and of course the usability engineer waved the UX-flag, -“We must not forget about usability!” It was concluded that a comparative usability test was to be conducted, the first test on the old web site and a second test once the new web site was launched.
Despite a tight time frame, the tests were planned and conducted. In total, twelve test persons were asked to “think aloud” while performing twelve tasks of various complexities on the web sites. The tests were recorded with Morae software and each test was followed by a de-brief together with the test persons, including a discussion and rating of the user experience of the respective web site. In order to later be able to measure the effect of changes, the success criteria of the tasks was chosen to be number of clicks and time per task.
Due to organisational changes, it so happened that the test drivers were not the same persons in the first test as in the second test. This, combined with a too tight time frame, was a strong drive for failure. Comparing the number of clicks and time per task in the tests was like comparing apples with oranges, since the test drivers had different views on when a task was actually completed. For example in the first test, the test driver considered the task as done, hence stopped the task when the test person browsed the “right web page” even though the test person did not realize that this was the “right web page”. Whereas in second test the test driver did not stop the test person until the test person expressed that the web page in question was his/hers solution to the task. The consequence of this is that the recorded time and mouse clicks per task favours the first test. Another parameter that affected the test’s reliability is the test persons’ knowledge of Visma’s products and services. Based on this, we decided to scrap the quantitative data.
Looking back, there are surely things that could have been done differently. The most obvious concern is that we should have practised what we preach, the UCD-process. Instead we fell into the tempting situation to satisfy our ‘web owners’ and the famous user called “Everyone” by conducting a cover-it-all test with an unrealistic large user scenario. We can only do so much without knowing who the users of our web site are.
Usability research takes time; there is a reason why Visma has an incorporated UCD-process and why it is an investment if applied properly. There are no shortcuts and we are the first to admit that we should have started with understanding the user context and mapping out the users and based on that we should have conducted a few user observations with a limited test scenario.
Valuable qualitative feedback
This blog post has been focusing on the negative aspects of the test, but it is worth mentioning that the test also brought a lot of valuable qualitative insights. Findings from the first test have already lead to improvements on the web site. Read more about the usability behind Visma’s new web site in Fredrik’s blog post.
UX and quantitative data
This test started off as a quantitative comparative test but ended as a qualitative user observation. Failure in one sense, success in another. Maybe this indicates that we should not devote ourselves to quantitative studies, but rather focus on qualitative user observations. From a usability perspective, does the quantitative data really bring any value except when it comes to convincing management of needed changes? Or, as Einstein (or was it Cameron?) said, “Not everything that can be counted counts, and not everything that counts can be counted”.