Machine Learning: Applications and Opportunities in Social Science Research

Part of the ICPSR 2022 Summer Workshops at Berkeley.

ICPSR Summer Program Logo

Instructor: Christopher Hare, UC Davis

The field of machine learning is most commonly associated with “big data”: how we can use massive datasets to make better predictions about things like credit card fraud, Netflix recommendations, and the like. Though machine learning has been most influential in its commercial and medical applications, a growing number of social scientists are taking advantage of these methods for data of all types to: (1) uncover patterns and structure embedded between variables, (2) test and improve model specification and predictions, and (3) perform data reduction. This course covers the mechanics underlying machine learning methods and discusses how these techniques can be leveraged by social scientists to gain new insight from their data. Specifically, the course will cover: decision trees, random forests, boosting, k-means clustering and nearest neighbors, support vector machines, kernels, neural networks, and ensemble learning. We will also discuss best practices concerning tuning, error estimation, and model interpretability. Software: The course will use R to demonstrate the theoretical properties and empirical applications of these methods, and so participants should have some basic familiarity with R or similar statistical computing environments (such as Stata, SAS, or Python). An advanced programming background is not required or assumed. Prerequisites: Participants should also have some prior exposure to linear regression models.

UC Berkeley Faculty, Students and Staff are eligible for ICPSR Member pricing.

These workshops will all be held in-person at Social Science Matrix, 8th floor Social Sciences Building, UC Berkeley campus or you may attend virtually.

To register and for further information, go to and choose the “Short Workshops” tab. Or contact Eva Seto, Associate Director Matrix via e-mail to

View Map