Summary

Intro to Big Data with R and R-Studio is a course which will introduce the participant to the field of Big Data, teach some of the basic concepts, as well as teach and R Studio software packages. Participants will learn the difference between messy, clean and tidy data, and learn techniques to transform messy data into tidy data. Participants will learn the basic fundamentals of the R programming language, along with statistics. They will learn basic statistical tests to analyze data sets including; probability, distributions, inference and regression. Participants will also learn basic visualization techniques in R with built in plotting functions like ggplot2, as well as how to create disseminate information to others using the data visualization software Tableau.

To view classes materials please Click Here

Learning Outcomes

● Describe the role of Big Data in society, and state how data is used in a real world environment.
● Describe various tools a data scientist uses and demonstrate how to use an open source software package called R-Studio, a GUI (graphical user interface) for the CLI (command line interface) software R.
● Utilize the data visualization software Tableau, and create a dashboard and storyboard in Tableau which allows data to tell a story.
● Utilize basic statistical parameters related to normal distributions, and show how data which follows this pattern can be used and analyzed.
● Demonstrate how to turn unstructured data (messy and clean data) into structured data (tidy data).
● Demonstrate how to live link R, Excel and Tableau to a database, and update the software as the database updates in real time.
● Demonstrate how to search for online databases, find open data sources, and search the data for answers to questions.
● Utilize the open software R programming language, write functions, loops small programs to both clean and present data in a visual format.
● Install and utilize various R libraries such as ggplot2, lubridate, dplyr, tidyr, stringr, xml, reshape2 etc.
● Show how to web scrape data, clean it, and present the data to a user in a readable, often visual, format which utilizes tools and techniques learned throughout the course.

What should I know?

It is assumed that the student is computer literate and has the ability to to use a command line interface, but previous programming knowledge or experience is not needed.

Instructor

Michael Harris
Michael Harris is an assistant professor at Bunker Hill Community College. He has been teaching at BHCC as an adjunct since 2008, and full time since 2014. He in the process of creating the first Big Data AS degree in the country for BHCC. His background is in mechanical engineering, and he has worked on projects such as the SUGV robot for the US Army at iRobot, and the AIA space telescope on the SDO satellite at the Harvard-Smithsonian Center of Astrophysics.