Staff research scientist at Google Brain

I am interested in designing high-performance machine learning methods that make sense to humans. Quanta magazine described well why I am doing what I am doing. Thank you John Pavlus for writing this piece!

My focus is building interpretability method for already-trained models or building inherently interpretable models . In particular, I believe the language of explanations should include higher-level, human-friendly concepts so that it can make sense to everyone .

I gave a talk at the G20 meeting in Argentina in 2019. One of my work TCAV received UNESCO Netexplo award 2019, was featured at Google I/O 19' and in Brian Christian's book on The Alignment Problem. I gave keynote at ECML 2020.

Stuff I help with:
        Workshop Chair at ICLR 2018
        Area chair NIPS 2017, NeurIPS 2018, 2019, 2020, ICML 2019, 2020, ICLR 2020, AISTATS 2020
        Senior program committee at AISTATS 2019
        Steering committee and area chair at FAT* conference
        Program committee at ICML 2017/2018, AAAI 2017, IJCAI 2016 (and many other conference before that...)
        Executive board member of Women in Machine Learning.
        Co-organizer 3rd ICML 2018 Worshop on Human Interpretability in Machine Learning (WHI), 1st ICML 2016 Worshop on Human Interpretability in Machine Learning (WHI), 2nd ICML 2017 Worshop on Human Interpretability in Machine Learning (WHI). and NIPS 2016 Worshop on Interpretable Machine Learning for Complex Systems.
I gave a couple of tutorials on interpretability:
        Deep Learning Summer school at University of Toronto, Vector institute in 2018 (slides, video)
        CVPR 2018 (slides and videos)
        Tutorial on Interpretable machine learning at ICML 2017 (slides, video).

Google Scholar



On Completeness-aware Concept-Based Explanations in Deep Neural Networks

TL;DR: Let's find set of concepts that are "sufficient" to explain predictions.

Chih-Kuan Yeh, Been Kim, Sercan O. Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar
[Neurips 20]


Debugging Tests for Model Explanations

TL;DR: Sanity check2.

Julius Adebayo, Michael Muelly, Ilaria Liccardi, Been Kim
[Neurips 20]


Concept Bottleneck Models

TL;DR: Build a model where concepts are built-in so that you can control influential concepts.

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang
[ICML 20] [Featured at Google Research review 2020]


Explaining Classifiers with Causal Concept Effect (CaCE)

TL;DR: Make TCAV causal.

Yash Goyal, Amir Feder, Uri Shalit, Been Kim [arxiv]


Towards Automatic Concept-based Explanations

TL;DR: Automatically discover high-level concepts that explain a model's prediction.

Amirata Ghorbani, James Wexler, James Zou, Been Kim
[Neurips 19] [code]


BIM: Towards Quantitative Evaluation ofInterpretability Methods with Ground Truth

TL;DR: Ever thought "well, interpretability methods are hard to evaluate."? No more! We open source dataset, models and metrics that you can quantitatively evaluate your interpretability methods. We compare many widely used methods and publish their rankings.

Sharry Yang, Been Kim [arxiv] [code]


Visualizing and Measuring the Geometry of BERT

TL;DR: Studying geometry of BERT to gain insights behind their impressive performance.

Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg
[Neurips 19] [blog post]


Evaluating Feature Importance Estimates

TL;DR: One idea to evaluate attribution methods.

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim
[Neurips 19]


Human Evaluation of Models Built for Interpretability

TL;DR: What are the factors of explanation that matter for better interpretability and in what setting? A large-scale study to answer this question.

Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Samuel Gershman and Finale Doshi-Velez
[HCOMP 19] (best paper honorable mention)


Human-Centered Tools for Coping with Imperfect Algorithms during Medical Decision-Making

TL;DR: A tool to help doctors to navigate medical images using medically-relevant similarties. This work uses a part of TCAV idea to sort images with concepts.

Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, Michael Terry
CHI 2019 (best paper honorable mention)
[pdf] [bibtex]


Interpreting Black Box Predictions using Fisher Kernels

TL;DR: Answering "which training examples are most responsible for a given set of predictions?" Follow up of MMD-critic [NeurIPS 16]. The difference is that now we pick examples informed by how the classifier sees them!

Rajiv Khanna, Been Kim, Joydeep Ghosh, Oluwasanmi Koyejo
[pdf] [bibtex]


To Trust Or Not To Trust A Classifier

TL;DR: A very simple method that tells you whether to trust your prediction or not, that happens to also have nice theoretical properties!

Heinrich Jiang, Been Kim, Melody Guan, Maya Gupta
NIPS 2018 (poster)
[pdf] [code] [bibtex]


Human-in-the-Loop Interpretability Prior

TL;DR: Ask humans which models are more interpretable DURING the model training to learn more interpretable model for the end-task.

Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez
NIPS 2018 (spotlight)
[pdf] [bibtex]


Sanity Checks for Saliency Maps

TL;DR: Saliency maps are a type of post-training interpretability method to explain 'evidence' of predictions. But it turns out that they have little to do with the model's prediction! Some saliency maps are visually indistinguishable before and after we randomize the weights of the network (i.e., producing garbage predictions)

Julius Adebayo, Justin Gilmer, Ian Goodfellow, Moritz Hardt, Been Kim
NIPS 2018 (spotlight)
[pdf] [bibtex]


Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

TL;DR: We can learn human-concepts in any layer of already-trained neural networks. Then we can do hypothesis testing with them to get quantitative explanations.

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
ICML 2018
[pdf] [code] [bibtex] [slides]

2 Sundar Pichai (CEO of Google)'s presenting TCAV as a tool to build AI for everyone at his keynote speech at Google I/O 2019 [video]


The (Un)reliability of saliency methods

TL;DR: Existing saliency methods could be unreliable. We should be careful using them.

Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, Been Kim
NIPS workshop 2017 on Explaining and Visualizing Deep Learning
[pdf] [bibtex]


SmoothGrad: removing noise by adding noise

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, Martin Wattenberg
ICML workshop on Visualization for deep learning 2017
[pdf] [code] [bibtex]


QSAnglyzer: Visual Analytics for Prismatic Analysis of Question Answering System Evaluations

Nan-chen Chen and Been Kim
VAST 2017
[pdf] [bibtex]


Towards A Rigorous Science of Interpretable Machine Learning

Finale Doshi-Velez and Been Kim
[to appear] Springer Series on Challenges in Machine Learning: "Explainable and Interpretable Models in Computer Vision and Machine Learning", arxiv in 2017
[pdf] [bibtex]


Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Been Kim, Rajiv Khanna and Sanmi Koyejo
Neural Information Processing Systems 2016
[pdf] [NIPS oral presentation talk slides] [talk video] [bibtex] [code]


Diff-clustering: Interpretable embedding example-based clustering

Been Kim, Peter Turney and Peter Clark
under review


Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction

Been Kim, Finale Doshi-Velez and Julie Shah
Neural Information Processing Systems 2015
[pdf] [variational inference in gory detail] [bibtex]


iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction

Been Kim, Elena Glassman, Brittney Johnson and Julie Shah
coming soon (see my thesis for details).


Bayesian Case Model:
A Generative Approach for Case-Based Reasoning and Prototype Classification

Been Kim, Cynthia Rudin and Julie Shah
Neural Information Processing Systems 2014
[pdf] [poster] [bibtex]

This work was featured on MIT news and MIT front page spotlight.


Scalable and interpretable data representation for
high-dimensional complex data

Been Kim, Kayur Patel, Afshin Rostamizadeh and Julie Shah
AAAI Conference on Artificial Intelligence 2015
[pdf] [bibtex]


A Bayesian Generative Modeling with Logic-Based Prior

Been Kim, Caleb Chacha and Julie Shah
Journal of Artificial Intelligence Research 2014
[pdf] [bibtex]


Learning about Meetings

Been Kim and Cynthia Rudin
Data Mining and Knowledge Discovery Journal 2014

[arxiv] [pdf] [bibtex]

This work was featured in Wall Street Journal.


Inferring Robot Task Plans from Human Team Meetings:
A Generative Modeling Approach with Logic-Based Prior

Been Kim, Caleb Chacha and Julie Shah
AAAI Conference on Artificial Intelligence 2013
[pdf] [bibtex] [video]

This work was featured in:
"Introduction to AI" course at Harvard (COMPSCI180: Computer science 182) by Barbara J. Grosz.
[Course website]
"Human in the loop planning and decision support" tutorial at AAAI15 by Kartik Talamadupula and Subbarao Kambhampati.
[slides From the tutorial]


Multiple Relative Pose Graphs for Robust Cooperative Mapping

Been Kim, Michael Kaess, Luke Fletcher, John Leonard, Abraham Bachrach, Nicholas Roy, and Seth Teller
International Conference on Robotics and Automation 2010
[pdf] [bibtex] [video]


Human-inspired Techniques for Human-Machine Team Planning

Julie Shah, Been Kim and Stefanos Nikolaidis
AAAI Technical Report - Human Control of Bioinspired Swarms 2013
[pdf] [bibtex]



Interactive and Interpretable Machine Learning Models for Human Machine Collaboration

Been Kim
PhD Thesis 2015
[pdf] [bibtex] [slides]