Privacy paranoia at NSF?
Inside Higher Ed has an article with Orwellian-sounding overtones: Data on Minority Doctorates Suppressed. The gist of it is that NSF has tightened up its privacy rules and will no longer be reporting information on the ethnicity of doctorates when the cell size is 5 or smaller.
The trouble is that basic reporting on the demographics of small fields runs up against the new rules, particularly for groups such as Native Americans:
So while we know that in 2005, six black people earned doctorates in earth, atmospheric and marine sciences, the NSF won’t reveal how many earned the degrees in 2006 (covered by the most recent report). Information about the number of Latinos earning degrees in some engineering fields is gone, as are data about a number of categories for black Ph.D.’s. For Native Americans, where the base is smaller, the impact of the new policy is especially dramatic. The report was stripped of information on how many doctorates were awarded to all but 6 of the 35 subfields for which data were collected.
Commenters on the story are aghast, and some rail about Bush administration-style information suppression going on in the NSF. I don't think the motives are quite so sinister. But I am wondering if I might be partly responsible for the changes.
In late 2006 I requested a big batch of Survey of Earned Doctorates data from NSF for the Graduate School Guide. It was expensive and a pain to get, largely because the existing privacy rules blocked access to lots of interesting things, but eventually I managed to get the numbers. The one thing I had no trouble getting was information on PhD demographics. Sex, ethnicity, and citizenship were considered public information, and no cell size restrictions applied. (It's not clear from the article whether the tightening applies only to ethnicity or to sex and citizenship as well, but I'd assume everything.)
Apparently nobody had ever requested institution and discipline-level data before. Although my request was approved, I heard through the grapevine afterwards that the data set caused great consternation within NSF and provoked lots of meetings and arguments. I asked for some additional data later on, but never got an answer.
I suspect that at least part of what happened was that NSF decided after the fact that they were uncomfortable with some of the information they gave me, but that they had no rules in place at the time to prevent the request. I'm speculating that the tightening (at least in part) is to address the perceived problem. If so, I'd be curious to hear what else has been changed.
NSF has an important responsibility to protect the privacy of the people who participate in the Survey of Earned Doctorates. They are asking for data from individuals, and they want to make sure that those participating never have reason to be concerned about their information being released inappropriately. The NSF does a very good job of protecting the privacy of those participating - the rules for data access are quite draconian.
The trouble is that the NSF protects their information so zealously that, at least in my opinion, they compromise the larger mission of their organization. Suppressing things like the number of Native American geophysics PhDs granted is just silly. In what possible way does that reveal anything interesting about any individual? (The only thing I can think of is that you might be able to learn whether a particular individual participated. But you could learn the same thing for cell sizes larger than 5 when there is full participation by the subgroup.)
Computer security professionals have a maxim that the only truly secure computer is one that is off. Some of the new privacy changes sound like a step in the direction of pulling the plug to prevent software viruses.
A more productive approach would be to modify the terms of the survey's privacy policy to favor greater data access, not less. Sure, they might lose a handful of responses from the truly paranoid, but my guess is that people who are that concerned about their privacy aren't going to fill out the survey anyway. Greater access could allow some NSF to do some truly useful things with their data.
There is some good news for any of you who are interested in PhD demographics: there is a better place to get the data. The IPEDS data set contains complete information on the demographics of doctorates, with no data suppression. What's more, IPEDS also has data on master's students and on undergraduates, their data are not limited to science and engineering, they have a much more detailed taxonomy of disciplines than NSF, and they make their data available at the institution rather than the national level. IPEDS data are much more timely than NSF's. And, best of all, you can get the full data set without having to pay NORC lots of money. The NSF may have just done you a big favor.
