GRADUATE SCHOOL GUIDANCE
POST DATE February 6, 2008, 2 AM
POSTED BY Geoff Davis
One key data set I used in building the Graduate School Guide was IPEDS, which is put out by the Department of Education's National Center for Education Statistics. IPEDS is incredibly useful for this kind of thing: the data set contains a near-complete list of all colleges and universities in the country together with detailed information about student enrollments and degrees granted. The data are well-documented, easy to access (you can download everything in CSV files or grab custom subsets of the data from an online tool), and relatively recent (2006).
NCES delivers tremendous value with IPEDS in several ways:
1) They provide a lot of summary statistics from the data to give an overview the current state of post-secondary education.
2) They make it easy for others to get ahold of the data, and they provide thorough documentation so that once you have the data, it's easy to work with. As a result, third parties such as The Chronicle of Higher Education delve more deeply into the data and deliver additional interesting insights.
3) They help participating institutions use the data set to compare themselves to peer institutions. In so doing, they implicitly give guidance on good questions for institutions to ask about themselves (e.g. how much financial aid do we give relative to our peers? how well do we retain students?)
4) They provide data to third parties who help students choose colleges. It's virtually certain that US News, Peterson's, Fiske, etc, all draw upon IPEDS data.
5) They run College Opportunities OnLine, a site that provides useful information directly to prospective college students.
6) They combine multiple data sets in ways that increase the value of all the components. For example, on the COOL web site they combine their core data on universities and graduations with data on libraries and on campus crime.
The NSF has similarly interesting sets of data in their Survey of Earned Doctorates, Survey of Doctorate Recipients, and the Survey of Graduate Students and Postdoctorates in Science and Engineering, but they have not been anywhere near as effective at extracting value from them.
Here's how NSF stacks up to NCES:
1) The NSF provides data reports in the form of very basic, annual (or bi-annual) summary statistics and in Science and Engineering Indicators.
2) NSF has a decent tool for extracting additional summary statistics in WebCaspar. Unfortunately, the data are aggregated at such a high level (nationally for most things), that you can't get at the most interesting information. Basically you can't get anything beyond a count of individuals at the level of a single institution.
It is possible to get ahold of more detailed data than what's in WebCaspar, but the process is difficult, expensive, and time-consuming. The data we used on our site cost $7,000 and took months for NORC to generate. There is a do-it-yourself alternative if you have a facility that satisfies NSF's security criteria (stringent), can handle NSF's audits, and have a license for SAS (expensive).
NSF has a responsibility to protect the privacy of survey participants, and to their credit, they take that responsibility very seriously. However, I think they tend to err so far on the side of caution that they detract from their own mission.
3) As far as I know, NSF doesn't make any effort to help universities use the data they generate in useful ways. I think this is an area in which NSF could take a real leadership role. For example, people have been expressing dismay for years about the length of time it takes to earn a PhD. The NSF has detailed data on exactly how long it has taken to earn pretty much every single PhD in the country. Providing institutions with stats on their own times to degree relative to their peers could be a powerful catalyst for inspiring improvements. They could provide similar motivation on placement rates, funding levels, and so on.
The value the NSF could provide here is not just in providing the data - it's in getting institutions to ask the right questions of themselves. I have talked to several deans who participated in the recent CGS study of attrition. One common thread I have heard is that they were able to make progress on reducing their attrition rates - the big problem they had was that nobody had ever looked at the issue before.
4) People love to hate US News's rankings. The chief complaint is the exclusive reliance on reputation. There has been sporadic talk over the years about institutions proposing alternatives, but the talk has never amounted to anything. As we have demonstrated here at phds.org, NSF's data can form part of a more balanced approach. Surely a set of US News rankings that incorporated outcome measures as well as reputational measures would be more helpful to students and less objectionable to faculty members than their current approach.
5) As far as I know, NSF does not use their data to provide any services targeted at prospective graduate students. I think that's a reasonable call on their part, since I don't think it's a great fit for the skill set that the organization possesses. However, if they were to make department level data sets available (aggregated over sufficient numbers of years to protect privacy) on a regular basis to third parties such as US News, Peterson's, phds.org, etc, they could ensure that would-be students would benefit from the NSF's hard work.
6) As graduate-school.phds.org demonstrates, the NSF data sets are much more interesting when combined. One frustration I had in working with the NSF's data sets is that they are not designed to work well with each other or with outside data such as IPEDS, so combining them took months of work.
Each of NSF's data sets uses a different, incompatible taxonomy of disciplines, all of which are different from the one used by IPEDS. The NSF uses an outdated list of institutions (FICE codes) that results in information for entire state university systems being lumped together and makes it difficult to combine with information from IPEDS, which uses an up-to-date list. Rethinking the NSF's field codes and updating their institution lists would make their data more valuable to end users and would simplify the lives of the people at the institutions who have to answer both NSF and NCES surveys.
So here's what I think NSF could do to remedy things:
I think the result would be universities receiving information they can use to make valuable improvements to their programs and prospective students getting better guidance on choosing suitable programs. That's a pretty big impact.