A quick and dirty content design user test
I saw a screen that broke pretty much every content design best practice I could think of. I wanted to change it but didn’t have any data to back up my hypothesis that making a change would pay off. All I knew was that the screen looked bad.
I made the case that we should align the screen to best practices without data, that best practices are best practices for a reason — they get results the vast majority of the time. The team agreed and we replaced the existing screen (version A) with a new version (B). It seemed like the right thing to do but I still wanted data, even after the fact, so I ran a quick and dirty test. It’s never too late to learn.
Use case
It was an onboarding screen where users choose the type of leave of absence they are taking. We show them different leave types and they choose one.
On the existing screen (A):
Leave type titles were not scannable
Actual leave type was not front loaded in the titles
Leave type titles were not concise
Details were hidden behind a click in a misuse of progressive disclosure
Clickable text for selecting a leave type was repetitive next to the title
Clickable text used HR jargon instead of language users use
Version A of the screen I was testing: Participants were asked to find specific leave types
My hypothesis was that aligning to best practices would help users move through the screen faster, the underlying assumption being that faster is better, especially in onboarding.
In the new version (B) I proposed:
Scannable leave type titles
Front loading the actual leave types
Concise leave type titles
Pulling details out from behind a click and into subtitles instead
Icon (chevron) instead of text to make a selection
Use language users use
Version B of the screen I was testing: Participants were asked to find specific leave types
Note: The leave types were in the same order in both versions for the test, even though they are not in these screenshots.
Test plan
I appreciated the team putting their faith in content design principles and moving ahead without data. But I had to know if it worked. Could I prove retroactively that indeed, aligning to best practices had a positive impact? If so, I would have a stronger case for doing it again in the future on other screens.
Task
I printed out versions A and B and asked participants to press with their finger on the paper, wherever they would naturally with a mouse, if they were looking to take a leave. I literally walked around with these papers all day long asking people to play along.
Metrics
Quantitative: I timed how long it took participants to press on the leave type they were looking for, in version A and version B. (I also recorded whether they got it right because if one version was way faster, but users made the wrong selection, I would not call that success.)
Qualitative: I asked them which version they preferred and why.
Mitigating bias
For each leave type, half of the participants saw A first and half saw B first, to mitigate bias around seeing the first version with fresh eyes and the second version with some context under their belt.
I asked each participant to find no more than 2 leave types because after that, they would be too familiar with the experience for me to get a true impression of their speed as a user starting onboarding for the first time.
Participants
11 participants tested 19 leave scenarios. It was a representative cohort of the actual user base in that they were:
Tech workers
Ages 20–45
Split men and women
Had been/are currently/will be on these leaves
Results
I discovered 3 meaningful insights in a test that took less than an hour to prepare, a couple of hours to conduct, less than an hour to analyze, and with no budget at all.
Finding #1
The first finding which jumps out from the raw data is that 100% of the time, participants chose the right leave!
Interesting anecdote: There was 1 participant whom I asked to “take leave because your wife is giving birth”. He got it right on both version A and B. However, his pregnant wife who was standing right there said that she had expected him to choose the family care leave instead. Making the wrong choice in this scenario was indeed a concern the team had! So it was interesting that while a test participant had the same concern, the actual “user” had no trouble at all.
Raw data from the quantitative portion of the test
Finding #2
The second finding is that in half of the tests A was faster, and in half B was faster. My hypothesis that B would be significantly faster was wrong! But you know the great thing about being wrong? It means your research has integrity and isn’t just contrived support for what you want to be true.
Taking it one step further, A and B each being faster half the time was spread evenly across leaves. In other words, we did not find that A was faster for medical leaves and B was faster for military leaves; rather, for every leave, A and B tied for speed.
Finding #3
The third finding came from the qualitative data: >70% preferred B! And they gave a lot of reasons, many of which repeated themselves from participant to participant.
Raw data from the qualitative portion of the test — Why participants prefer B
Even when B was not faster, people liked the experience of B better. For our product’s success, users’ subjective preferences matter.
The reason we expected B to win was wrong, but the hypothesis that B was better was right.
Recap of the versions I tested: A and B side by side
Insights
I am so glad I did this quick and dirty user research, I learned so much.
…about testing
Meaningful insights can come from cheap tests. Lack of access to sophisticated metrics is no excuse not to collect whatever data you can. It’s not all or nothing.
Tests can answer questions you never thought to ask. My original plan only included quantitative data around speed. But my first participant volunteered her preferences and why. It was gold, so I started asking everyone else. What I learned from the qualitative data will let me take the winning version and make it even better.
…about content design
Best practices are best practices for a reason. Version B aligned to best practices and version B won. It didn’t win for the reason I expected it to, but it won, which further supports applying best practices elsewhere, even without specific evidence around the potential impact.
Slower doesn’t have to be bad. Participants who took longer on version B still preferred version B. I was trying to prevent dropoff by moving them along quickly but ended up reaching the same end by moving them along confidently, without changing their speed at all.
I’ve talked a lot about the importance of sharing content design data with the community so that we all benefit from each other’s insights, so here ya go. Here is my contribution for today. I hope it helps you increase the impact of your content design practice and inspire you to share your own learnings with me :)
P.S. I was recently in an ice cream shop with my daughter and she pointed out essentially one of the same issues I was grappling with in the user test described in this post. She scanned the titles on the menu and saw cookies for 60 shekels and cookies for 14 shekels. “Why would anyone pay 60 when they could get the same item for 14??” she asked. It was only then that we read the subtitle of the more expensive cookies which specified that actually get 2 cookies and 3 scoops of ice cream for 60. The less expensive item by the same name had no subtitle, but I assume you get only 1 cookie and that’s it. There is definitely some copy work to be done here.