October 12th, 2010

NRC Data Quality, Part 2


Geoff Davis

Over the weekend I received another irate email from a department chair bemoaning the grievous and irreparable harm the terrible errors on phds.org were causing his program. I investigated. It turned out that the data in question came from the NSF’s Survey of Graduate Students and Postdoctorates in Science and Engineering. This survey in turn got its information from… his department. Quite possibly from the chair himself.

The point? One of the big challenges the NRC faced in gathering its data is that departments are not always the most reliable sources of information on themselves. And if the departments don’t know what’s going on locally, who does?

While I do hear a fair amount of complaining about the quality of the NRC’s data, I don’t hear a lot of ideas for solutions. Say what you want about the NRC; the departments are the bigger problem.

For example, 23% of the departments reported that they don’t track where their graduates get jobs when they graduate. Imagine if this level of quality assurance existed in other industries: “No, we don’t track repair rates for the cars we manufacture, but one of our designers recently won an award!”

With the Survey of Earned Doctorates, the NSF makes a heroic effort to pick up the slack. They provide departments with surveys to give their graduating students, they follow up with non-respondents by mail and telephone, they do all the analysis, and they publish reports. Pretty much all the departments have to do is hand the surveys to their students at the proper time. The survey is well-designed and run by NORC, one of the best survey research firms in the country. Given that several of the public complaints about the accuracy of the latest NRC report have focused on SED data about student placements, it appears that many departments are unable to complete correctly their one minor piece of the process.

What exacerbates these department level problems is that departments receive little or no feedback on the numbers that they provide to NSF / NRC / DoE. If they erred in what they reported to NRC, they didn’t find out until 5 years later. If they mess up NSF reports, they don’t find out until somebody like me makes the data available in a more usable form than the dense spreadsheets the NSF issues (and then it’s somehow *my* problem).

One valuable service the NSF could provide would be to send programs a summary of the information they provided each year, ideally in the context of norms for other departments. That would help programs catch errors and also show them where they stand relative to peers on a regular basis. The current system in which programs only see their relative standings every decade or two means that programs are surprised and freak out. A 5 year delay is much too long for any useful corrective processes to be put in place – after 5 years, many of the people who reported the data in the first place have moved on or have long since forgotten the process.

The NRC would do better, too, by releasing their data in stages as they generated it. The NRC could have released about 2/3 of the data in their report in 2006 or 2007, as it came from surveys administered in early 2006. That would have given departments ample time to make corrections. The more challenging items such as counts of citations, publications, and grants, could have been released later, and finally the rankings.