Hi. I’m a Senior Researcher at Microsoft Research in New York City, and in the fall I’ll be joining the Management Science & Engineering Department at Stanford. I work in the area of computational social science, an emerging discipline at the intersection of computer science, statistics, and the social sciences. I’m particularly interested in applying modern computational and statistical techniques to study social and political policies. For example, I've recently been looking at stop-and-frisk, swing voting, filter bubbles, do-not-track, and media bias. I intermittently blog at Messy Matters. Before joining Microsoft Research, I studied at the University of Chicago (B.S. in mathematics) and at Cornell (M.S. in computer science; Ph.D. in applied mathematics), and worked at Yahoo Labs. If you would like to get in touch, please send me an email.
Precinct or Prejudice: Understanding Racial Disparities in New York City's Stop-and-Frisk Policy
With Justin Rao and Ravi Shroff. Under review.
Ideological Segregation and the Effects of Social Media on News Consumption
With Seth Flaxman and Justin Rao. Under review.
The Structural Virality of Online Diffusion
With Ashton Anderson, Jake Hofman, and Duncan J. Watts. Under review.
The Mythical Swing Voter
With David Rothschild, Andrew Gelman, and Doug Rivers. Under review.
Forecasting Elections with Non-Representative Polls
With Wei Wang, David Rothschild, and Andrew Gelman.
International Journal of Forecasting. To appear.
Political Ideology and Racial Preferences in Online Dating
With Ashton Anderson, Gregory Huber, Neil Malhotra, and Duncan J. Watts.
Sociological Science, Vol. 1, 2014.
Predicting Individual Behavior with Social Networks
With Daniel G. Goldstein.
Marketing Science, Vol. 33, 2014.
Sharding Social Networks
With Quang Duong, Jake Hofman, and Sergei Vassilvitskii.
Proceedings of the Fifth Conference on Web Search and Data Mining (WSDM 2012).
Respondent Driven Sampling—Where We Are and Where Should We be Going?
With Richard White, Amy Lansky, David Wilson, Wolfgang Hladik, Avi Hakim and Simon DW Frost
Sexually Transmitted Infections, Vol. 88, No. 6, 2012, 397-399.
The Structure of Online Diffusion Networks
With Duncan J. Watts and Daniel G. Goldstein.
Proceedings of the 13th ACM Conference on Electronic Commerce (EC 2012).
Who Does What on the Web: Studying Web Browsing Behavior at Scale
With Jake Hofman and M. Irmak Sirer
Proceedings of the 6th International Conference on Weblogs and Social Media (ICWSM 2012).
Predicting Consumer Behavior with Web Search
With Jake Hofman, Sébastien Lahaie, David Pennock, and Duncan Watts
Proceedings of the National Academy of Sciences, Vol 107, No. 41, 2010, 17486-17490.
Real and Perceived Attitude Agreement in Social Networks
With Winter Mason and Duncan Watts
Journal of Personality and Social Psychology, Vol. 99, No. 4, 2010, 611-621.
Prediction Without Markets
With Daniel Reeves, Duncan Watts, and David Pennock
Proceedings of the 11th ACM Conference on Electronic Commerce (EC 2010).
Anatomy of the Long Tail: Ordinary People With Extraordinary Tastes
With Andrei Broder, Evgeniy Gabrilovich, and Bo Pang
Proceedings of the Third Conference on Web Search and Data Mining (WSDM 2010).
Collective Revelation: A Mechanism for Self-Verified, Weighted, and Truthful Predictions
With Daniel Reeves and David Pennock
Proceedings of the 10th ACM Conference on Electronic Commerce (EC 2009).
CentMail: Rate Limiting via Certified Micro-Donations
With Jake Hofman, John Langford, David Pennock, and Daniel Reeves
Proceedings of the 6th Conference on Email and Anti-Spam (CEAS 2009).
Short version at WWW 2009, Developer's Track
Respondent-Driven Sampling as Markov Chain Monte Carlo
With Matthew Salganik
Statistics in Medicine, Vol. 28, No. 17, 2009, 2202-2229.
Social Search in “Small-World” Experiments
With Roby Muhamad and Duncan Watts
Proceedings of the 18th International World Wide Web Conference (WWW 2009).
Predictive Indexing for Fast Search
With John Langford and Alex Strehl
Advances in Neural Information Processing Systems (NIPS 2008).
Pricing Combinatorial Markets for Tournaments
With Yiling Chen and David Pennock
Proceedings of the 40th ACM Symposium on Theory of Computing (STOC 2008).
Horseshoes in Multidimensional Scaling and Local Kernel Methods
With Persi Diaconis and Susan Holmes
Annals of Applied Statistics, Vol. 2, No. 3, 2008, 777-807.
An Invisible Minority: Asian-Americans in Mathematics
Notices of the American Mathematical Society, Vol. 53, No. 8, 2006, 878-882.
Analysis of Top to Bottom-k Shuffles
Annals of Applied Probability, Vol. 16, No. 1, 2006, 30-55.
Mixing Time Bounds via the Spectral Profile
With Ravi Montenegro and Prasad Tetali
Electronic Journal of Probability, Vol. 11, 2006, 1-26.
Eluding Carnivores: File Sharing with Strong Anonymity
With Emin Gün Sirer, Mark Robson, and Doğan Engin
Proceedings of the 11th ACM SIGOPS European Workshop. 2004.
Modified Logarithmic Sobolev Inequalities for Some Models of Random Walk
Stochastic Processes and Their Applications, Vol. 114, 2004, 51-79.
With a vast amount of information now collected on our online and offline actions — from what we buy, to where we travel, to who we interact with — we have an unprecedented opportunity to study complex social systems. This opportunity, however, comes with scientific, engineering, and ethical challenges. In this hands-on course, we develop ideas from computer science and statistics to address problems in sociology, economics, political science, and beyond. We cover techniques for collecting and parsing data, methods for large-scale machine learning, and principles for effectively communicating results. To see how these techniques are applied in practice, we discuss recent research findings in a variety of areas.
MS&E 125: Introduction to Applied Statistics — Spring 2015
An increasing amount of data is now generated in a variety of disciplines, ranging from finance and economics, to the natural and social sciences. Making use of this information requires both statistical tools and an understanding of how the substantive scientific questions should drive the analysis. In this hands-on course, we learn to explore and analyze real-world datasets. We cover techniques for summarizing and describing data, methods for statistical inference, and principles for effectively communicating results.