The book provides a timely coverage of the paradigm of knowledge distillation—an efficient way of model compression. Knowledge distillation is positioned in a general setting of transfer learning, which effectively learns a lightweight student model from a large teacher model. The book covers a variety of training schemes, teacher–student architectures, and distillation algorithms. The book covers a wealth of topics including recent developments in vision and language learning, relational architectures, multi-task learning, and representative applications to image processing, computer vision, edge intelligence, and autonomous systems. The book is of relevance to a broad audience including researchers and practitioners active in the area of machine learning and pursuing fundamental and applied research in the area of advanced learning paradigms.