Download or read online books in PDF, EPUB and Mobi Format. Click Download or Read Online button to get book now. This site is like a library, Use search box in the widget to get ebook that you want.

Core Concepts in Data Analysis Summarization Correlation and Visualization

Core Concepts in Data Analysis  Summarization  Correlation and Visualization Author Boris Mirkin
ISBN-10 0857292870
Release 2011-04-05
Pages 390
Download Link Click Here

Core Concepts in Data Analysis: Summarization, Correlation and Visualization provides in-depth descriptions of those data analysis approaches that either summarize data (principal component analysis and clustering, including hierarchical and network clustering) or correlate different aspects of data (decision trees, linear rules, neuron networks, and Bayes rule). Boris Mirkin takes an unconventional approach and introduces the concept of multivariate data summarization as a counterpart to conventional machine learning prediction schemes, utilizing techniques from statistics, data analysis, data mining, machine learning, computational intelligence, and information retrieval. Innovations following from his in-depth analysis of the models underlying summarization techniques are introduced, and applied to challenging issues such as the number of clusters, mixed scale data standardization, interpretation of the solutions, as well as relations between seemingly unrelated concepts: goodness-of-fit functions for classification trees and data standardization, spectral clustering and additive clustering, correlation and visualization of contingency data. The mathematical detail is encapsulated in the so-called “formulation” parts, whereas most material is delivered through “presentation” parts that explain the methods by applying them to small real-world data sets; concise “computation” parts inform of the algorithmic and coding issues. Four layers of active learning and self-study exercises are provided: worked examples, case studies, projects and questions.



Clusters Orders and Trees Methods and Applications

Clusters  Orders  and Trees  Methods and Applications Author Fuad Aleskerov
ISBN-10 9781493907427
Release 2014-06-11
Pages 404
Download Link Click Here

The volume is dedicated to Boris Mirkin on the occasion of his 70th birthday. In addition to his startling PhD results in abstract automata theory, Mirkin’s ground breaking contributions in various fields of decision making and data analysis have marked the fourth quarter of the 20th century and beyond. Mirkin has done pioneering work in group choice, clustering, data mining and knowledge discovery aimed at finding and describing non-trivial or hidden structures—first of all, clusters, orderings and hierarchies—in multivariate and/or network data. This volume contains a collection of papers reflecting recent developments rooted in Mirkin’s fundamental contribution to the state-of-the-art in group choice, ordering, clustering, data mining and knowledge discovery. Researchers, students and software engineers will benefit from new knowledge discovery techniques and application directions.



Concise Computer Vision

Concise Computer Vision Author Reinhard Klette
ISBN-10 9781447163206
Release 2014-01-04
Pages 429
Download Link Click Here

This textbook provides an accessible general introduction to the essential topics in computer vision. Classroom-tested programming exercises and review questions are also supplied at the end of each chapter. Features: provides an introduction to the basic notation and mathematical concepts for describing an image and the key concepts for mapping an image into an image; explains the topologic and geometric basics for analysing image regions and distributions of image values and discusses identifying patterns in an image; introduces optic flow for representing dense motion and various topics in sparse motion analysis; describes special approaches for image binarization and segmentation of still images or video frames; examines the basic components of a computer vision system; reviews different techniques for vision-based 3D shape reconstruction; includes a discussion of stereo matchers and the phase-congruency model for image features; presents an introduction into classification and learning.



Perspectives on Data Science for Software Engineering

Perspectives on Data Science for Software Engineering Author Tim Menzies
ISBN-10 9780128042618
Release 2016-07-14
Pages 408
Download Link Click Here

Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics. At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community’s leaders gathered to share hard-won lessons from the trenches. Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid. Presents the wisdom of community experts, derived from a summit on software analytics Provides contributed chapters that share discrete ideas and technique from the trenches Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data Presented in clear chapters designed to be applicable across many domains



Data Mining Concepts and Techniques

Data Mining  Concepts and Techniques Author Jiawei Han
ISBN-10 0123814804
Release 2011-06-09
Pages 744
Download Link Click Here

Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data



Sets Logic and Maths for Computing

Sets  Logic and Maths for Computing Author David Makinson
ISBN-10 9781447125006
Release 2012-02-27
Pages 283
Download Link Click Here

This easy-to-follow textbook introduces the mathematical language, knowledge and problem-solving skills that undergraduates need to study computing. The language is in part qualitative, with concepts such as set, relation, function and recursion/induction; but it is also partly quantitative, with principles of counting and finite probability. Entwined with both are the fundamental notions of logic and their use for representation and proof. Features: teaches finite math as a language for thinking, as much as knowledge and skills to be acquired; uses an intuitive approach with a focus on examples for all general concepts; brings out the interplay between the qualitative and the quantitative in all areas covered, particularly in the treatment of recursion and induction; balances carefully the abstract and concrete, principles and proofs, specific facts and general perspectives; includes highlight boxes that raise common queries and clear confusions; provides numerous exercises, with selected solutions.



Data Science for Business

Data Science for Business Author Foster Provost
ISBN-10 9781449374280
Release 2013-07-27
Pages 414
Download Link Click Here

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today. Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making. Understand how data science fits in your organization—and how you can use it for competitive advantage Treat data as a business asset that requires careful investment if you’re to gain real value Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way Learn general concepts for actually extracting knowledge from data Apply data science principles when interviewing data science job candidates



Algorithms for Data Science

Algorithms for Data Science Author Brian Steele
ISBN-10 9783319457970
Release 2016-12-25
Pages 430
Download Link Click Here

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials. This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.



Data Mining for Business Analytics

Data Mining for Business Analytics Author Galit Shmueli
ISBN-10 9781118877524
Release 2016-05-11
Pages 464
Download Link Click Here

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® presents an applied and interactive approach to data mining. Featuring hands-on applications with JMP Pro®, a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting. Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® also includes: Detailed summaries that supply an outline of key topics at the beginning of each chapter End-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material Data-rich case studies to illustrate various applications of data mining techniques A companion website with over two dozen data sets, exercises and case study solutions, and slides for instructors www.dataminingbook.com Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro® is an excellent textbook for advanced undergraduate and graduate-level courses on data mining, predictive analytics, and business analytics. The book is also a one-of-a-kind resource for data scientists, analysts, researchers, and practitioners working with analytics in the fields of management, finance, marketing, information technology, healthcare, education, and any other data-rich field. Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University’s Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 journal articles, books, textbooks, and book chapters, including Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner®, Third Edition, also published by Wiley. Peter C. Bruce is President and Founder of the Institute for Statistics Education at www.statistics.com He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective and co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner ®, Third Edition, both published by Wiley. Mia Stephens is Academic Ambassador at JMP®, a division of SAS Institute. Prior to joining SAS, she was an adjunct professor of statistics at the University of New Hampshire and a founding member of the North Haven Group LLC, a statistical training and consulting company. She is the co-author of three other books, including Visual Six Sigma: Making Data Analysis Lean, Second Edition, also published by Wiley. Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years. He is co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner®, Third Edition, also published by Wiley.



Spatial Analysis

Spatial Analysis Author Tonny J. Oyana
ISBN-10 9781498707640
Release 2015-07-28
Pages 323
Download Link Click Here

An introductory text for the next generation of geospatial analysts and data scientists, Spatial Analysis: Statistics, Visualization, and Computational Methods focuses on the fundamentals of spatial analysis using traditional, contemporary, and computational methods. Outlining both non-spatial and spatial statistical concepts, the authors present practical applications of geospatial data tools, techniques, and strategies in geographic studies. They offer a problem-based learning (PBL) approach to spatial analysis—containing hands-on problem-sets that can be worked out in MS Excel or ArcGIS—as well as detailed illustrations and numerous case studies. The book enables readers to: Identify types and characterize non-spatial and spatial data Demonstrate their competence to explore, visualize, summarize, analyze, optimize, and clearly present statistical data and results Construct testable hypotheses that require inferential statistical analysis Process spatial data, extract explanatory variables, conduct statistical tests, and explain results Understand and interpret spatial data summaries and statistical tests Spatial Analysis: Statistics, Visualization, and Computational Methods incorporates traditional statistical methods, spatial statistics, visualization, and computational methods and algorithms to provide a concept-based problem-solving learning approach to mastering practical spatial analysis. Topics covered include: spatial descriptive methods, hypothesis testing, spatial regression, hot spot analysis, geostatistics, spatial modeling, and data science.



Probability and Statistics for Computer Scientists Second Edition

Probability and Statistics for Computer Scientists  Second Edition Author Michael Baron
ISBN-10 9781498760607
Release 2015-09-15
Pages 449
Download Link Click Here

Student-Friendly Coverage of Probability, Statistical Methods, Simulation, and Modeling Tools Incorporating feedback from instructors and researchers who used the previous edition, Probability and Statistics for Computer Scientists, Second Edition helps students understand general methods of stochastic modeling, simulation, and data analysis; make optimal decisions under uncertainty; model and evaluate computer systems and networks; and prepare for advanced probability-based courses. Written in a lively style with simple language, this classroom-tested book can now be used in both one- and two-semester courses. New to the Second Edition Axiomatic introduction of probability Expanded coverage of statistical inference, including standard errors of estimates and their estimation, inference about variances, chi-square tests for independence and goodness of fit, nonparametric statistics, and bootstrap More exercises at the end of each chapter Additional MATLAB® codes, particularly new commands of the Statistics Toolbox In-Depth yet Accessible Treatment of Computer Science-Related Topics Starting with the fundamentals of probability, the text takes students through topics heavily featured in modern computer science, computer engineering, software engineering, and associated fields, such as computer simulations, Monte Carlo methods, stochastic processes, Markov chains, queuing theory, statistical inference, and regression. It also meets the requirements of the Accreditation Board for Engineering and Technology (ABET). Encourages Practical Implementation of Skills Using simple MATLAB commands (easily translatable to other computer languages), the book provides short programs for implementing the methods of probability and statistics as well as for visualizing randomness, the behavior of random variables and stochastic processes, convergence results, and Monte Carlo simulations. Preliminary knowledge of MATLAB is not required. Along with numerous computer science applications and worked examples, the text presents interesting facts and paradoxical statements. Each chapter concludes with a short summary and many exercises.



The R Book

The R Book Author Michael J. Crawley
ISBN-10 9781118448960
Release 2012-11-07
Pages 1080
Download Link Click Here

Hugely successful and popular text presenting an extensive and comprehensive guide for all R users The R language is recognized as one of the most powerful and flexible statistical software packages, enabling users to apply many statistical techniques that would be impossible without such software to help implement such large data sets. R has become an essential tool for understanding and carrying out research. This edition: Features full colour text and extensive graphics throughout. Introduces a clear structure with numbered section headings to help readers locate information more efficiently. Looks at the evolution of R over the past five years. Features a new chapter on Bayesian Analysis and Meta-Analysis. Presents a fully revised and updated bibliography and reference section. Is supported by an accompanying website allowing examples from the text to be run by the user. Praise for the first edition: ‘…if you are an R user or wannabe R user, this text is the one that should be on your shelf. The breadth of topics covered is unsurpassed when it comes to texts on data analysis in R.’ (The American Statistician, August 2008) ‘The High-level software language of R is setting standards in quantitative analysis. And now anybody can get to grips with it thanks to The R Book…’ (Professional Pensions, July 2007)



The Art of Data Analysis

The Art of Data Analysis Author Kristin H. Jarman
ISBN-10 9781118413340
Release 2013-04-17
Pages 190
Download Link Click Here

A friendly and accessible approach to applying statistics in the real world With an emphasis on critical thinking, The Art of Data Analysis: How to Answer Almost Any Question Using Basic Statistics presents fun and unique examples, guides readers through the entire data collection and analysis process, and introduces basic statistical concepts along the way. Leaving proofs and complicated mathematics behind, the author portrays the more engaging side of statistics and emphasizes its role as a problem-solving tool. In addition, light-hearted case studies illustrate the application of statistics to real data analyses, highlighting the strengths and weaknesses of commonly used techniques. Written for the growing academic and industrial population that uses statistics in everyday life, The Art of Data Analysis: How to Answer Almost Any Question Using Basic Statistics highlights important issues that often arise when collecting and sifting through data. Featured concepts include: • Descriptive statistics • Analysis of variance • Probability and sample distributions • Confidence intervals • Hypothesis tests • Regression • Statistical correlation • Data collection • Statistical analysis with graphs Fun and inviting from beginning to end, The Art of Data Analysis is an ideal book for students as well as managers and researchers in industry, medicine, or government who face statistical questions and are in need of an intuitive understanding of basic statistical reasoning.



File System Forensic Analysis

File System Forensic Analysis Author Brian Carrier
ISBN-10 9780134439549
Release 2005-03-17
Pages
Download Link Click Here

The Definitive Guide to File System Analysis: Key Concepts and Hands-on Techniques Most digital evidence is stored within the computer's file system, but understanding how file systems work is one of the most technically challenging concepts for a digital investigator because there exists little documentation. Now, security expert Brian Carrier has written the definitive reference for everyone who wants to understand and be able to testify about how file system analysis is performed. Carrier begins with an overview of investigation and computer foundations and then gives an authoritative, comprehensive, and illustrated overview of contemporary volume and file systems: Crucial information for discovering hidden evidence, recovering deleted data, and validating your tools. Along the way, he describes data structures, analyzes example disk images, provides advanced investigation scenarios, and uses today's most valuable open source file system analysis tools—including tools he personally developed. Coverage includes Preserving the digital crime scene and duplicating hard disks for "dead analysis" Identifying hidden data on a disk's Host Protected Area (HPA) Reading source data: Direct versus BIOS access, dead versus live acquisition, error handling, and more Analyzing DOS, Apple, and GPT partitions; BSD disk labels; and Sun Volume Table of Contents using key concepts, data structures, and specific techniques Analyzing the contents of multiple disk volumes, such as RAID and disk spanning Analyzing FAT, NTFS, Ext2, Ext3, UFS1, and UFS2 file systems using key concepts, data structures, and specific techniques Finding evidence: File metadata, recovery of deleted files, data hiding locations, and more Using The Sleuth Kit (TSK), Autopsy Forensic Browser, and related open source tools When it comes to file system analysis, no other book offers this much detail or expertise. Whether you're a digital forensics specialist, incident response team member, law enforcement officer, corporate security specialist, or auditor, this book will become an indispensable resource for forensic investigations, no matter what analysis tools you use.



An Introduction to Statistical Learning

An Introduction to Statistical Learning Author Gareth James
ISBN-10 9781461471387
Release 2013-06-24
Pages 426
Download Link Click Here

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.



R Data Analysis and Visualization

R  Data Analysis and Visualization Author Tony Fischetti
ISBN-10 9781786460486
Release 2016-06-24
Pages 1783
Download Link Click Here

Master the art of building analytical models using R About This Book Load, wrangle, and analyze your data using the world's most powerful statistical programming language Build and customize publication-quality visualizations of powerful and stunning R graphs Develop key skills and techniques with R to create and customize data mining algorithms Use R to optimize your trading strategy and build up your own risk management system Discover how to build machine learning algorithms, prepare data, and dig deep into data prediction techniques with R Who This Book Is For This course is for data scientist or quantitative analyst who are looking at learning R and take advantage of its powerful analytical design framework. It's a seamless journey in becoming a full-stack R developer. What You Will Learn Describe and visualize the behavior of data and relationships between data Gain a thorough understanding of statistical reasoning and sampling Handle missing data gracefully using multiple imputation Create diverse types of bar charts using the default R functions Familiarize yourself with algorithms written in R for spatial data mining, text mining, and so on Understand relationships between market factors and their impact on your portfolio Harness the power of R to build machine learning algorithms with real-world data science applications Learn specialized machine learning techniques for text mining, big data, and more In Detail The R learning path created for you has five connected modules, which are a mini-course in their own right. As you complete each one, you'll have gained key skills and be ready for the material in the next module! This course begins by looking at the Data Analysis with R module. This will help you navigate the R environment. You'll gain a thorough understanding of statistical reasoning and sampling. Finally, you'll be able to put best practices into effect to make your job easier and facilitate reproducibility. The second place to explore is R Graphs, which will help you leverage powerful default R graphics and utilize advanced graphics systems such as lattice and ggplot2, the grammar of graphics. You'll learn how to produce, customize, and publish advanced visualizations using this popular and powerful framework. With the third module, Learning Data Mining with R, you will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. The Mastering R for Quantitative Finance module pragmatically introduces both the quantitative finance concepts and their modeling in R, enabling you to build a tailor-made trading system on your own. By the end of the module, you will be well-versed with various financial techniques using R and will be able to place good bets while making financial decisions. Finally, we'll look at the Machine Learning with R module. With this module, you'll discover all the analytical tools you need to gain insights from complex data and learn how to choose the correct algorithm for your specific needs. You'll also learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, and so on. Style and approach Learn data analysis, data visualization techniques, data mining, and machine learning all using R and also learn to build models in quantitative finance using this powerful language.



Training Students to Extract Value from Big Data

Training Students to Extract Value from Big Data Author National Research Council
ISBN-10 9780309314404
Release 2015-01-16
Pages 66
Download Link Click Here

As the availability of high-throughput data-collection technologies, such as information-sensing mobile devices, remote sensing, internet log records, and wireless sensor networks has grown, science, engineering, and business have rapidly transitioned from striving to develop information from scant data to a situation in which the challenge is now that the amount of information exceeds a human's ability to examine, let alone absorb, it. Data sets are increasingly complex, and this potentially increases the problems associated with such concerns as missing information and other quality concerns, data heterogeneity, and differing data formats. The nation's ability to make use of data depends heavily on the availability of a workforce that is properly trained and ready to tackle high-need areas. Training students to be capable in exploiting big data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. Analysis of big data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop those skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data-science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data-science program. Training Students to Extract Value from Big Data summarizes a workshop convened in April 2014 by the National Research Council's Committee on Applied and Theoretical Statistics to explore how best to train students to use big data. The workshop explored the need for training and curricula and coursework that should be included. One impetus for the workshop was the current fragmented view of what is meant by analysis of big data, data analytics, or data science. New graduate programs are introduced regularly, and they have their own notions of what is meant by those terms and, most important, of what students need to know to be proficient in data-intensive work. This report provides a variety of perspectives about those elements and about their integration into courses and curricula.