Statistics is the Least Important Part of Data Science | Andrew Gelman, PhD
October 12th, 2020
57 mins 1 sec
Season 6
About this Episode
Andrew is an American statistician, professor of statistics and political science, and director of the Applied Statistics Center at Columbia University.
He frequently writes about Bayesian statistics, displaying data, and interesting trends in social science.
He’s also well known for writing posts sharing his thoughts on best statistical practices in the sciences, with a frequent emphasis on what he sees as the absurd and unscientific.
FIND ANDREW ONLINE
Website: https://statmodeling.stat.columbia.edu/
Twitter: https://twitter.com/StatModeling
QUOTES
[00:04:16] "We've already passed peak statistics..."
[00:05:13] "One thing that we sometimes like to say is that big data need big model because big data are available data. They're not designed experiments, they're not random samples. Often big data means these are measurements. "
[00:22:05] "If you design an experiment, you want to know what you're going to do later. So most obviously, you want your sample size to be large enough so that given the effect size that you expect to see, you'll get a strong enough signal that you can make a strong statement."
[00:31:00] "The alternative to good philosophy is not no philosophy, it's bad philosophy. "
SHOW NOTES
[00:03:12] How Dr. Gelman got interested in statistics
[00:04:09] How much more hyped has statistical and machine learning become since you first broke into the field?
[00:04:44] Where do you see the field of statistical machine learning headed in the next two to five years?
[00:06:12] What do you think the biggest positive impact machine learning will have in society in the next two to five years?
[00:07:24] What do you think would be some of our biggest concerns in the future?
[00:09:07] The thee parts of Bayesian inference
[00:12:05] What's the main difference between the frequentist and the Bayesian?
[00:13:02] What is a workflow?
[00:16:21] Iteratively building models
[00:17:50] How does the Bayesian workflow differ from the frequent workflow?
[00:18:32] Why is it that what makes this statistical method effective is not what it does with the data, but what data it uses?
[00:20:48] Why do Bayesians then tend to be a little bit more skeptical in their thought processes?
[00:21:47] Your method of evaluation can be inspired by the model or the model can be inspired by your method of evaluation
[00:24:38] What is the usual story when it comes to statistics? And why don't you like it?
[00:30:16] Why should statisticians and data scientist care about philosophy?
[00:35:04] How can we solve all of our statistics problems using P values?
[00:36:14] Is there a difference in interpretations for P-Values between Bayesian and frequentist.
[00:36:54] Do you feel like the P value is a difficult concept for a lot of people to understand? And if so, why do you think it's a bit challenging?
[00:38:22] Why the least important part of data science is statistics.
[00:40:09] Why is it that Americans vote the way they do?
[00:42:40] What's the one thing you want people to learn from your story?
[00:44:48] The lightning round