Kill the Pain Scale: Taking bias out of information - Notes From the Nerd Desk

Cat dressed as nurse caring for another cat
- by Elana Duffy, CEO, Pathfinder Labs

Since the ol' Nerd Desk can get a little cluttered with, well, nerd talk, here's a quick rundown of what's in - and why you should read - this month's blog:

Nonprofits, corporate employee support programs, government and other service providers... everyone has the same problem: at some point they need to report to whoever is funding their program about the real impact they have on the people receiving services. This should go beyond number of people who came through the door, but discovering real, fundamental changes in your participants as a result of your provided service is really, really hard. Impact - real impact - means figuring out how people actually feel and if that feeling is likely to last, but asking them directly to tell you about their feelings introduces more problems for you to combat. So to help in your quest for real, usable information you can share with your funders, we broke down the two primary forms of self-report survey bias, why they are important to identify and eliminate from your data, and what we do to help filter these biases out so you can get to the nitty-gritty of impact analysis.

Self-report bias? What even is that?

I would wager an MRE peanut butter packet that the last time you went to see a doctor you were asked to rate how you felt on a scale of 1 to 10. I hate that question. I know if I say a 7 I'll be seen faster, but I don't want to lie. I also know if I say a 3 I will probably just be given some Motrin and told to go away. Meanwhile, I have a high pain tolerance after years of injuries, so my 3 might be another person's 7 and therefore is it a lie to begin with?

Well, I said 3 for too long and by the time they realized it was actually a 7 and did an MRI, they couldn't salvage the remaining leg. Now I'm an amputee. True story!

Every time this type of question - a "self-report" - is asked, it elicits a different response based on many criteria: who is asking and why the question is being posed, the respondent's emotional state and past experiences, and a host of other considerations.

When rating your pain, for example, there is no way for the doctor to agree or disagree with what you say: it is your pain. But you want a particular outcome, so you might change your answer based on the result you are hoping to achieve. This is a cognitive process called outcome bias, and it can have serious downwind effects. In the case of nonprofits or services seeking honest feedback, this bias might cause a failure to identify problems if the reviewer perhaps believes there might be some sort of retribution for a critique. Some restaurants have banned Yelp reviewers (they aren't anonymous!) after leaving a bad review, so they might be concerned a volunteer organization taking offense at 3 stars would similarly extend fewer opportunities. (Note, this is why all feedback posted on www.pathfinder.vet is anonymous!)

And outcome bias isn't the only thing to be cautious of. Consider the case above where my high pain tolerance led me to report the pain of severe internal injury as a 3, whereas someone else might rate it a 7. Pain is subjective: we compare how we feel now to our personal baseline and the worst pain we've ever felt. The difference in reporting based on our difference in perception and experience is an example of subjectivity causing biased results. With self-report, how can a nonprofit know if great feedback is really a result of great programming, or if it's simply better than a reviewer's past experiences?

Subjectivity and outcome bias. Got it. But why do I care?

These cognitive biases are bad news: they don't give the analyst (or funder) a standard from which they can make real decisions to improve services or know what is already effective. 

A doctor can seek evidence and conduct tests, looking for a measurable statistic to resolve the pain, be it a 4 or an 8. If the test is repeatable (such as a blood test) and the statistic measurable (such as a cholesterol level), problems can be objectively identified, treated, and improved.

The program director seldom has this luxury: sure, a housing coordinator can say how many people they've put into a residence, but how can they answer a donor's question of what impact the housing had on the overall well-being of the person? The coordinator can ask on a scale of 1 to 5 how positive the person feels, and a person housed would surely say 5 because their immediate problem was solved (a subjective response to their recent negative situation) and - in case they need housing again in the future - they don't want to offend the housing coordinator (outcome bias). So while they get an immediate response that looks good, the coordinator misses simple things that could have lasting, positive impact on the person: maybe at some level the person want to feel like they are contributing something, or maybe the buildings could use a community area so residents could interact with their neighbors more. Bias affected the potential for the organization to have greater impact.

Filtering bias and finding standard, objective measures for personality and potential behavior is particularly difficult. Subjectivity becomes a problem right away, thanks to emotion. How comfortable are you in a room full of strangers, for instance, could elicit a different response every day it is asked: on a day you didn't get enough sleep you might cringe at the thought of having to socialize, and even though that is not how you feel every day you might give these questions a low enough rating you are classified an introvert with high neuroticism. Outcome bias is also looming. Say the test is part of a job application process or other evaluation, and the applicant knows the employer is seeking particularly extraverted individuals. They might answer all related questions as though they are very friendly and open.

Try this experiment: take this Big 5 analysis as honestly as possible. The test determines your personality tendencies through comparing your answers against thousands of others - a generally good metric. Record your scores, then take it again but with the intent to score high on a particular trait such as openness or emotional range. You can do this because as validated as the test is, the process itself is easily manipulated and - just as with the doctor - it is impossible to determine if someone is answering honestly. And while the doctor can run more tests, personality results generally lack standardized, objective options. 

Reducing self-report bias to uncover real information

This is where natural language processing (NLP) provides a significant advantage to entities looking to solve challenges related to individual impact, personality, and interpersonal relationships (like a service provider or nonprofit). Through text comparison of written patterns in syntax and word choice with tens of thousands of verified documents, your writing sample can show you are 20% less extraverted than the average adult in the US.

With a large enough writing sample from an individual - 500 to 1000 words or more in most cases - NLP can identify enough patterns to provide accurate information about likely personality attributes, a much more permanent measure than emotional or subjective response. This is the same premise as the Big 5 test you took, but the difference is the inability for you to influence the results like you did in the second test. NLP effectively removes the self-reporting and self-selecting bias from personality assessments; it becomes the blood and imaging tests of the doctor looking for the root of pain.

Even more advantageous is the ability to objectively understand change, and to measure the effectiveness of a treatment or - in the case of a nonprofit - service or program. After a course of statins and dietary changes, the doctor can take another blood test to see if cholesterol has dropped. Similarly, after the housing manager puts in a community room, NLP can assess a new writing sample and determine if the programs are helping the person become more confident and resilient, ultimately helping them apply for employment. A writing sample run through NLP algorithms is very hard to influence, so the housing program now gets an effective measure of how well their efforts are producing real impact.

We will likely never see the end of self-reported scales; they are easy to administer and give a quick glimpse into how a person feels at that moment. Even at Pathfinder Labs we still ask the 1 to 5 scale questions. But that's why we back it up with anonymized, analyzed math and tens of thousands of comparative samples. We want to change how feedback is used, providing a blood test for personality and motivation to help services increase their impact on our lives and our relationships.