|
|
Graduate School
Fellow mathematician Jordan Ellenberg has an unusual take on the NRC's rankings: in Slate he compares the NRC's approach to ranking graduate programs to a new method psychologists are using for classifying mental illnesses.
The article is worth reading in full, but the gist of it is that there are two standard approaches to dealing with high dimensional data sets: you can cluster items into groups, and you can use statistical techniques to reduce dimensionality, typically by discarding dimensions that carry the least amount of information. The NRC uses one method, and Ellenberg thinks they might benefit from using the other.
The forthcoming Diagnostic and Statistical Manual of Mental Disorders (the DSM-V) is switching from a clustering-centric approach to a dimension reducing approach, replacing clusters like "narcissistic personality disorder" with a collection of 6 measurements ("negative emotionality, introversion, antagonism, disinhibition, compulsivity, and schizotypy"). This is apparently leading to grumblings from psychologists who find value in the clusters as opposed to the more abstract 6-dimensional vectors.
The NRC has also chosen a dimensionality reduction approach, boiling 20 program measurements down to a single quality dimension. Ellenberg suggests that a clustering approach might be more helpful, and cites a recent experiment:
The NRC, on the other hand, might have done better to toss the idea of rankings entirely, and just clustered the departments into natural groupings. The statistician Leland Wilkinson ran a quick and dirty clustering on the NRC data for math departments. He found that the departments broke up into five clusters: 10 elite departments, a big group of 59 upper-tier departments, 47 lower-tier departments, and two smaller clusters whose meaning, if any, isn't clear to me. This is much coarser information than a full ranking—but it has the advantage of not depending on politically contentious choices as to which criteria matter most.
It's an interesting idea, and I think there's some value to the approach. Indeed, the Carnegie Foundation already does something similar for universities, though probably not in a particularly statistically rigorous fashion. Having well chosen clusters would provide for saner comparisons - it doesn't really make sense to compare some kinds of programs directly, as they really cater to very different audiences with different goals.
That said, I very much doubt that the clustering approach would prove any more satisfactory than what the NRC actually did. Do you think that a prospective student or department chair would be any happier to learn that a program fell into a set of 59 "upper-tier departments" than to know that the program ranked between 16th and 27th on the NRC's quality scale?
While a clustering approach sidesteps the need to explicitly choose important criteria, there is very much a devil-in-the-details problem. Different clustering approaches can yield very different clusters. Even the simplest methods involve many choices - at the very least you have to choose a measure of similarity, and that in turn will emphasize and de-emphasize different program characteristics. You're essentially trading an explicit, principled choice about what's important for an implicit and opaque choice.
Regardless, I'd be curious to see more details of Wilkinson's approach. I imagine he just did some kind of k-means clustering - simple, but likely interesting.
A heartening holiday article in the NY Times this week: A Master’s for Science Professionals Sweeps U.S. Schools. The Professional Science Masters is catching on big time:
The degree, which a few universities quietly pioneered in the mid-1990s, combines graduate studies in science or mathematics and business management courses. In 2008, 58 universities were offering the professional science master’s degree, or P.S.M., according to the Council of Graduate Schools in Washington. By the start of this academic year, the number had nearly doubled to 103, and is set to climb further. The number is certain to grow because the professional science master’s degree is being adopted by at least six state university systems.
The great thing about the PSM is that interaction with industry plays a big role in the degree. Students spend time in internships so they learn skills that they can't get in universities, and industry gets technology transfer through students. More importantly, to run successful programs, universities have to build relationships with local companies, which is a great way for faculty members to get clued in about what kinds of skills working scientists outside of academia really need.
Kudos to the Sloan Foundation for getting the ball rolling and to the NSF for additional funding.
A PSM + a PhD sounds like a much more effective ticket to a great industry job than a regular PhD. Given the ratio of PhDs to faculty positions, we'll need a lot more PSMs.
So says the Measures of Effective Teaching Project, a major ($45M) effort to assess teacher quality funded by the Gates Foundation. Preliminary findings were released earlier in the week.
Key quote from the report:
When a teacher teaches multiple classes, student perceptions of his or her practice are remarkably consistent across different groups of students. Moreover, student perceptions in one class or one academic year predict large differences in student achievement gains in other classes taught by the same teacher, especially in math. In other words, when students report positive classroom experiences, those classrooms tend to achieve greater learning gains, and other classrooms taught by the same teacher appear to do so as well.
There's no reason to believe that graduate students (or undergraduates for that matter) would be any less able to assess the quality of their instruction. One important thing to note is that the questions asked were not about how much students liked their teachers:
Student feedback need not be a popularity contest. We asked detailed questions about various aspects of students’ experience in a given teacher’s classroom. Some questions had a stronger relationship to a teacher’s value-added than others. The most predictive aspects of student perceptions are related to a teacher’s ability to control a classroom and to challenge students with rigorous work.
Presumably control of the classroom is much less of an issue outside of K-12, but I would imagine it would not be too difficult to craft some useful questions about challenge in the classroom.
One interesting experiment: in their assessment of graduate programs, the National Academies asked a set of students in a few subjects a set of questions about their perceptions of the quality of their education. I'd be curious to see (1) how much assessments varied from department to department, (2) to what extent assessments agree with external assessments of quality, and most importantly (3) what departmental attributes are most strongly associated with student perceptions of quality.
Quiet on the blog is often good news: it means that web site improvements are in thw works. Today we've rolled out some nice speedups of the rankings site.
One of the hazards of working with great hardware and software when developing a site is that it's easy to forget that there are lots of people running old browsers on old machines. The rankings code we initially launched is very Javascript intensive. That's fine for modern browsers, but not so much for Internet Explorer, particularly IE 6 and 7. And about 10% of our visitors run IE 6 or 7 (Come on, people - IE 6 was released 9 years ago! It's time to upgrade!). I've been making some big optimizations to the code, and the JavaScript now runs much, much faster on IE. Enjoy!
If you're stuck with IE but want to experience the site properly as it was meant to be seen, try downloading Chrome Frame, an IE plugin written by Google that lets you render web pages using the much faster Chrome engine.
The Council of Graduate Schools has released their annual report on graduate enrollments here.
Here's their key point:
“The strong growth in first-time graduate enrollment is an indication of the continued high value of graduate education,” said CGS President Debra W. Stewart. “In particular, the 6.0% gain in first-time U.S. enrollment reflects the increasing necessity of a graduate degree to successfully compete in a 21st-century knowledge-based economy,” she added.
I'm skeptical. I think it's more a reflection of the fact that the labor market is terrible, so the opportunity cost of graduate school is low. If you dig into the figures, you'll see a few interesting things:
- The field with the biggest 1-year gain in applications is health sciences. I imagine these are people seeking professional degrees in areas like nursing / physical therapy / etc. (Figure 3.2)
- The biggest longer term growth is in lower tier schools, especially those classified as "Other" (Figure 3.3). Maybe this is all the online schools?
- Health sciences have had big growth in part time enrollments (Figure 3.7), which reinforces my belief that lots of these people are in essentially professional degree programs in health care.
- Health sciences doctoral enrollments have had substantial growth over the last decade. That I imagine is largely the result of the NIH budget doubling and its aftermath.
Those of you in Communication Research may have noticed something strange about the NRC R-rankings for the field on our site: until yesterday, the confidence intervals for programs almost all started at 1. That means that in our simulated rankings, almost all programs ended up in first place at least 25 times. Why?
The reasons are a bit technical: the regressions used to generate ranking weights for Communication Research programs had little predictive power, and the NRC's algorithm set all the weights to 0 in 39 iterations of their simulated rankings. The result was that all programs received the same quality score. How do you rank programs in the case of a tie? The NRC's methodology document does not say. We chose to report all programs as being tied for 1st place. They chose to report all programs as being tied for 41st (there were 83 programs). Both are reasonable choices, but ours leads to some strange side effects in rare cases like this; theirs has problems in other cases. We've since changed our algorithm and have dropped the degenerate rankings altogether.
The point is not to criticize the particular algorithmic choices the NRC made. Rather, it's that the NRC's methodology document, detailed as it may look, does not fully specify their algorithm. (This is not the only example, either). While it's valuable to have the sort of document the NRC produced that explains the algorithm and the rationale for the various components, without access to their source code, it's very difficult to reproduce what they did.
Increasingly science involves computation. If one want others to be able to reproduce what one has done in an experiment (a core feature of scientific research), one must describe not only the methods, but also the computational environment used, including the source code.
From the NRC's web site:
Some institutions, when they examine the spreadsheet for the database of the National Research Council study A Data-Based Assessment of Research-Doctorate Programs, may find data that are or appear to be incorrect. Over the next several weeks following public release of the report and database, we wish to be informed about potential mistakes, misunderstandings, and possible errors.
...
If it is determined that the source of the error was in the processing by the NRC of institutional data supplied by universities, we will collate the needed revisions, and we will attempt to rerun the illustrative rankings and publish a revised master spreadsheet. We ask that any information regarding possible errors of this sort be sent to us by November 1, 2010.
I think this is a great step, and a big improvement over how they handled things back in 1995. (The fact that the rankings were all printed up in a big book probably limited their ability to make corrections the last time around.)
If we get corrected data sets from NRC, we will post them on the site as soon as we can.
Over the weekend I received another irate email from a department chair bemoaning the grievous and irreparable harm the terrible errors on phds.org were causing his program. I investigated. It turned out that the data in question came from the NSF's Survey of Graduate Students and Postdoctorates in Science and Engineering. This survey in turn got its information from... his department. Quite possibly from the chair himself.
The point? One of the big challenges the NRC faced in gathering its data is that departments are not always the most reliable sources of information on themselves. And if the departments don't know what's going on locally, who does?
While I do hear a fair amount of complaining about the quality of the NRC's data, I don't hear a lot of ideas for solutions. Say what you want about the NRC; the departments are the bigger problem.
For example, 23% of the departments reported that they don't track where their graduates get jobs when they graduate. Imagine if this level of quality assurance existed in other industries: "No, we don't track repair rates for the cars we manufacture, but one of our designers recently won an award!"
With the Survey of Earned Doctorates, the NSF makes a heroic effort to pick up the slack. They provide departments with surveys to give their graduating students, they follow up with non-respondents by mail and telephone, they do all the analysis, and they publish reports. Pretty much all the departments have to do is hand the surveys to their students at the proper time. The survey is well-designed and run by NORC, one of the best survey research firms in the country. Given that several of the public complaints about the accuracy of the latest NRC report have focused on SED data about student placements, it appears that many departments are unable to complete correctly their one minor piece of the process.
What exacerbates these department level problems is that departments receive little or no feedback on the numbers that they provide to NSF / NRC / DoE. If they erred in what they reported to NRC, they didn't find out until 5 years later. If they mess up NSF reports, they don't find out until somebody like me makes the data available in a more usable form than the dense spreadsheets the NSF issues (and then it's somehow *my* problem).
One valuable service the NSF could provide would be to send programs a summary of the information they provided each year, ideally in the context of norms for other departments. That would help programs catch errors and also show them where they stand relative to peers on a regular basis. The current system in which programs only see their relative standings every decade or two means that programs are surprised and freak out. A 5 year delay is much too long for any useful corrective processes to be put in place - after 5 years, many of the people who reported the data in the first place have moved on or have long since forgotten the process.
The NRC would do better, too, by releasing their data in stages as they generated it. The NRC could have released about 2/3 of the data in their report in 2006 or 2007, as it came from surveys administered in early 2006. That would have given departments ample time to make corrections. The more challenging items such as counts of citations, publications, and grants, could have been released later, and finally the rankings.
In the wake of the new graduate school rankings, we have received a handful of emails from department chairs who are angry about alleged errors in the NRC's data. The emails by and large follow a pattern: "This specific piece of information about my program is wrong, therefore the data from the NRC are highly flawed" (and often "you should pull your site down immediately" or something similarly hyperbolic).
I'm definitely sympathetic - were I in the chairs' positions, I wouldn't want inaccurate information about my program up on the web. My hands are tied, though - I can't change the NRC's data. That's up to the NRC. Should the NRC choose to release corrections, we will happily update what we have on the site.
I think the larger question of the quality of the NRC's data bears some examination. Is it, as these angry chairs insist, "highly flawed"?
I am certain that the NRC's data set contains errors. Not because I think the NRC is particularly error prone, but rather because the data set in question is enormous and has involved thousands of people in its assembly. Of course there are mistakes.
Consider: the full data set contains ~70 pieces of information on each of 5,000+ programs - on the order of 350,000 items all told. Furthermore, the counts of publications, citations, and awards aggregate dozens of data points about each of perhaps 250,000 faculty members.
The NRC's results have been scrutinized by tens or perhaps hundreds of thousands of people since their release (we've had 110,000 visitors to our site since the NRC data was released, and I imagine the NRC has received comparable traffic). With that many eyes, it's not surprising that mistakes have been found. Given that departments have a strong interest in the numbers, it's also not surprising that they have complained loudly. If the NRC maintained a 99% accuracy rate, we're still talking 3,500 mistakes in their data set, and that will doubtless engender 3,500 or more angry blog posts.
So the NRC's data has some errors. Is it "highly flawed"? For concreteness, let's take a look at one of the more outspoken critics, the University of Washington. Note the pattern: based on a handful of alleged errors, they conclude, "The assessment is meaningless, and in no way representative of the accomplishments of UW CSE. Errors in the data affect (at least) UW CSE, many other computer science programs nationally, and many programs in other fields at the University of Washington."
The essence of UW's complaint is the following:
1) UW didn't follow the NRC's instructions and reported incorrect information to the NRC. The NRC reported what UW told them and hence is wrong.
2) There is a disagreement between UW's information on graduate outcomes and the NRC's
3) There is a disagreement between UW's count of awards and the NRC's.
There's not too much to (1). Of course the NRC is going to report incorrect numbers if they get erroneous information from departments in the first place. Is it the NRC's fault? It's hard to say, since I haven't read the instructions for the item in question, but how sympathetic are you to students who blame their mistakes on a test on your instructions?
(2) is more interesting. UW says that 40% of its graduates took academic positions; the NRC (after correcting an initial error) says 25% "had academic plans". Why the difference? The NRC's data on outcomes comes from the NSF's Survey of Earned Doctorates. The SED works like this: the NSF gets departments to administer a survey to their graduating PhDs. The results are compiled by the National Opinion Research Center (some serious survey experts) and published.
There are a few likely reasons for the discrepancy. Some students may not yet have gotten jobs by the time they graduate, so they won't count as having "academic plans". Some students don't respond to the question about outcomes and also won't be counted. Finally, the department may have administered the survey too early (e.g. some time in the spring before graduation), which would further skew the numbers. The real issue here is that the NRC is reporting the fraction of students who had lined up academic jobs at graduation (or whenever UW administered its survey), while UW is reporting the fraction of students who eventually got academic jobs. It's an important distinction, as these are different quantities. I think the NSF/NRC numbers are probably fine here, especially since all other programs are measured in the same way.
(3) looks like a genuine problem. UW indicates that the NRC's number is off by a factor of 10, which suggests the error might be a misplaced decimal point somewhere.
Bottom line: 1 mistake from UW, 1 from NRC, and 1 non-issue. Problems, to be sure, and I hope that the NRC issues a corrected data set, but not particularly strong evidence of a "highly flawed" study or that the numbers for UW are "meaningless".
View archives for January 2011.
|
|