Georsgis Blog

↑ Grab this Headline Animator

Thursday 28 February 2019

Mastering Data Analysis with R

Gain clear insights into your data and solve real-world data science problems with R – from data munging to modeling and visualization

Preface R has become the lingua franca of statistical analysis, and it's already actively and heavily used in many industries besides the academic sector, where it originated more than 20 years ago. Nowadays, more and more businesses are adopting R in production, and it has become one of the most commonly used tools by data analysts

and scientists, providing easy access to thousands of user-contributed packages.
Mastering Data Analysis with R


Mastering Data Analysis with R will help you get familiar with this open source ecosystem and some statistical background as well, although with a minor focus on mathematical questions. We will primarily focus on how to get things done practically with R. As data scientists spend most of their time fetching, cleaning, and restructuring data, most of the first hands-on examples given here concentrate on loading data from files, databases, and online sources.
Then, the book changes its focus to restructuring and cleansing data—still not performing actual data analysis yet. The later chapters describe special data types, and then classical statistical models are also covered, with some machine learning algorithms.

What this book covers 



Chapter 1, Hello, Data!, starts with the first very important task in every data-related task: loading data from text files and databases. This chapter covers some problems of loading larger amounts of data into R using improved CSV parsers, pre-filtering data, and comparing support for various database backends. 

Chapter 2, Getting Data from the Web, extends your knowledge on importing data with packages designed to communicate with Web services and APIs, shows how to scrape and extract data from home pages, and gives a general overview of dealing with XML and JSON data formats.



Chapter 3, Filtering and Summarizing Data, continues with the basics of data processing by introducing multiple methods and ways of filtering and aggregating data, with a performance and syntax comparison of the deservedly popular data.table and dplyr packages. 

Chapter 4, Restructuring Data, covers more complex data transformations, such as applying functions on subsets of a dataset, merging data, and transforming to and from long and wide table formats, to perfectly fit your source data with your desired data workflow. 



Chapter 5, Building Models (authored by Renata Nemeth and Gergely Toth), is the first chapter that deals with real statistical models, and it introduces the concepts of regression and models in general. This short chapter explains how to test the assumptions of a model and interpret the results via building a linear multivariate regression model on a real-life dataset. 

Chapter 6, Beyond the Linear Trend Line (authored by Renata Nemeth and Gergely Toth), builds on the previous chapter, but covers the problems of non-linear associations of predictor variables and provides further examples on generalized linear models, such as logistic and Poisson regression. 



Chapter 7, Unstructured Data, introduces new data types. These might not include any information in a structured way. Here, you learn how to use statistical methods to process such unstructured data through some hands-on examples on text mining algorithms, and visualize the results. 



Chapter 8, Polishing Data, covers another common issue with raw data sources. Most of the time, data scientists handle dirty-data problems, such as trying to cleanse data from errors, outliers, and other anomalies. On the other hand, it's also very important to impute or minimize the effects of missing values. 

Chapter 9, From Big to Smaller Data, assumes that your data is already loaded, clean, and transformed into the right format. Now you can start analyzing the usually high number of variables, to which end we cover some statistical methods on dimension reduction and other data transformations on continuous variables, such as principal component analysis, factor analysis, and multidimensional scaling. 



Chapter 10, Classification and Clustering, discusses several ways of grouping observations in a sample using supervised and unsupervised statistical and machine learning methods, such as hierarchical and k-means clustering, latent class models, discriminant analysis, logistic regression and the k-nearest neighbors algorithm, and classification and regression trees. 

Chapter 11, A Social Network Analysis of the R Ecosystem, concentrates on a special data structure and introduces the basic concept and visualization techniques of network analysis, with a special focus on the igraph package

Chapter 12, Analyzing a Time Series, shows you how to handle time-date objects and analyze related values by smoothing, seasonal decomposition, and ARIMA, including some forecasting and outlier detection as well. 



Chapter 13, Data around Us, covers another important dimension of data, with a primary focus on visualizing spatial data with thematic, interactive, contour, and Voronoi maps. 

Chapter 14, Analyzing the R Community, provides a more complete case study that combines many different methods from the previous chapters to highlight what you have learned in this book and what kind of questions and problems you might face in future projects. Appendix, References, gives references to the used R packages and some further suggested readings for each aforementioned chapter.



What you need for this book 
All the code examples provided in this book should be run in the R console, which needs to be installed on your computer. You can download the software for free and find the installation instructions for all major operating systems at http://r-project.org. Although we will not cover advanced topics, such as how to use R in Integrated Development Environments (IDE), there are awesome plugins and extensions for Emacs, Eclipse, vi, and Notepad++, besides other editors. Also, we highly recommend that you try RStudio, which is a free and open source IDE dedicated to R, at https://www.rstudio.com/products/RStudio.
Besides a working R installation, we will also use some user-contributed R packages. These can easily be installed from the Comprehensive R Archive Network (CRAN) in most cases. The sources of the required packages and the versions used to produce the output in this book are listed in Appendix, References. To install a package from CRAN, you will need an Internet connection. To download the binary files or sources, use the install.packages command in the R console, like this:

> install.packages('pander') Some packages mentioned in this book are not (yet) available on CRAN, but may be installed from Bitbucket or GitHub. These packages can be installed via the install_ bitbucket and the install_github functions from the devtools package. Windows users should first install rtools from https://cran.r-project.org/bin/windows/ Rtools.

After installation, the package should be loaded to the current R session before you can start using it. All the required packages are listed in the appendix, but the code examples also include the related R command for each package at the first occurrence in each chapter: > library(pander)
We highly recommend downloading the code example files of this book (refer to the Downloading the example code section) so that you can easily copy and paste the commands in the R console without the R prompt shown in the printed version of the examples and output in the book.
If you have no experience with R, you should start with some free introductory articles and manuals from the R home page, and a short list of suggested materials is also available in the appendix of this book.

Who this book is for


 If you are a data scientist or an R developer who wants to explore and optimize their use of R's advanced features and tools, then this is the book for you. Basic knowledge of R is required, along with an understanding of database logic. If you are a data scientist, engineer, or analyst who wants to explore and optimize your use of R's advanced features, this is the book for you. Although a basic knowledge of R is required, the book can get you up and running quickly by providing references to introductory materials.

Download Link


No comments:

Contact us

Name

Email *

Message *

Follow us on Facebook and YouTube