In a popular TED Talk, Jen Golbeck reveals the so-called “curly fry conundrum.” She points to a study in which Facebook likes were tallied and analyzed in order to predict user traits – with “intelligence” being one of these. Eventually, the researchers landed on the five likes that could best predict high intelligence. Oddly enough, one of them was the page for curly fries. “How is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that’s being predicted?” Golbeck asks the audience.
As a computer scientist, Golbeck dissects the complicated (and sometimes creepy) data involved in social media – data that can provide insight into who we are and what we do/will do, even before we realize it ourselves. In this interview, the scientist discusses our brave new world of numeric mirrors, one in which our personalities and future actions are reflected in code. And while there’s much to be gained from all this knowledge, as she notes, we should make sure that it isn’t used against us.
Innovation & Tech Today: I remember when you described how Target had been able to predict pregnancy based on social media data. That kind of thing is a little disturbing.
Jen Golbeck: Yeah, so it gets way creepier. We just finished a paper – it’s under review now – where we can find people on the day they go to their first Alcoholics Anonymous meeting and predict if they’ll be sober or drinking again in 90 days with really high accuracy.
There’s a great paper from a colleague of mine that can predict on the day a woman gives birth if she’ll develop postpartum depression or not. And so we’re getting really good at kind of teasing out this signal about what people will do in the future, even when the people don’t know. And there’s great potential to do wonderful things with that technology. But obviously there’s also great potential for it to be completely abused, which I think is more concerning in the current political environment, where there’s less restraint about those kind of creepy intrusions than I feel like there used to be. And that might be naive of me…but I read a lot about it, so I feel like I spend half my time warning people and half my time building technology in the hope that it’ll be used in the good way.
I&T Today: Does this determinism make you cynical about your own online presence? Do you feel like you have less free will in terms of what you’re going to do when living virtually?
Jen Golbeck: Yeah, I can sort of feel like that. I don’t know. If I look at this study we did on alcoholism…. A lot of A.I. works where you dump in data into a black box and it spits out an answer. And we really don’t understand the inside of the black box. But computer science cares about getting the answer, which is fine. And when we did this alcoholism study we actually built all the internal space on what an addiction researcher might look at. So we can say, “Oh, you’re going to AA. It looks like it probably won’t work for you. Our algorithm says you’re likely to be drinking in 90 days. But here’s why: Because you have a social circle where everybody talks about getting drunk. You seem to have a poor ability to cope with stressful situations, and here’s a kind of therapy you can get for it.” So I think the algorithms are good at saying, “If everything keeps going the way it has, here’s what we think is going to happen.” But it doesn’t mean you can’t disrupt that. And I really like this work that we did, but I especially like it because it lets you know how to disrupt that prediction.
I&T Today: Sure. It’s possible that you can go, “Well, the Alcoholics Anonymous thing may not be wholly effective in preventing these habits in the future, but there are some other things you can do in addition to that – like changing your social circle – that may help your chances.” And so that’s what you’re saying is a positive for these predictive online observations.
Jen Golbeck: Yeah, that’s right, if we build them where they give insight and don’t just spit out the answer, right? I think it’s one of those things where you can offer people advice. I read all the tweets of the hundreds of people that were in this study that we did. It was all public stuff. And most of them really were close to hitting bottom. Their lives were screwed up; they really wanted to turn things around; they legitimately were trying to get better. And the fact is the majority of them don’t. Most people go back to drinking after their first time in AA. And so being able to say, “Look, if you really want to do this, yeah, go to AA, but also go see a therapist and work on this thing and build up your social support in this place and change this.” That’s really good…. At the same time, you might not want your boss running that algorithm on you. And so the kind of control over who can see these insights is a big problem with it going forward. But I think there’s so much potential for really good things too.
I&T Today: What would be a good practical legal step that we could take to take to ensure more online data privacy?
Jen Golbeck: One simple thing – we’re talking small steps: There has to be some requirement for much more explicit consent about my data being sold. Now there’s data brokers, and they gather data from all kinds of sources, and they buy it from some places and they aggregate it and sell it around. I don’t even have a right to see what the data brokers have on me, let alone stop them from selling it. I think we can take a very straightforward legislative step that puts some of that control back in the hands of people. We kind of see that with credit reporting now. Companies that you have credit lines with can ping your credit report every now and then. But not just any regular person can make a query and get ahold of it.
I don’t know if we need something that strict for this kind of data, which is shared publicly sometimes. But I think a step in that kind of direction – where I feel like I have more ownership and if some third party buys or uses my data, they can’t just keep going and selling it to someone else who sells it to someone else. That, I think, would be a really straightforward step. For me, the core issue with all of it is consent. Lots of people would be fine with that data being used in a variety of ways. But there’s no mechanism now that requires consent for a company to do something with your data. And I just feel like if we can take some more steps so people are consenting – like, they really understand what’s going to happen and they can consent – that makes it a much better place.
Interview by P.K. French