This course provides practical foundation level training that enables immediate and effective participation in big data and other analytics projects. It includes an introduction to big data and the Data Analytics Lifecycle to address business challenges that leverage big data. The course provides grounding in basic and advanced analytic methods and an introduction to big data analytics technology and tools, including MapReduce and Hadoop. The extensive labs throughout provide many opportunities for students to apply these methods and tools to real-world business challenges as a practicing Data Scientist. The course takes an “Open”, or technology-neutral approach, and includes a final lab in which students address a big data analytics challenge by applying the concepts taught in the course in the context of the Data Analytics Lifecycle. The course prepares the student for the Proven™ Professional Data Scientist Associate (EMCDSA) certification exam, and establishes a baseline of Data Science skills that can be enhanced with additional training and further real-world experience.

Learning Outcomes

● Deploying the Data Analytics Lifecycle to address big data analytics projects
● Reframing a business challenge as an analytics challenge
● Applying appropriate analytic techniques and tools to analyze big data, create statistical models, and identify insights that can lead to actionable results
● Selecting appropriate data visualizations to clearly communicate analytic insights to business sponsors and analytic audiences
● Using tools such as: R and RStudio, MapReduce/Hadoop, in-database analytics, Window and MADlib functions
● Explain how advanced analytics can be leveraged to create competitive advantage and how the data scientist role and skills differ from those of a traditional business intelligence analyst

What Should I Know

To complete this course successfully and gain the maximum benefits from it, a student should have the following knowledge and skill sets:

● A strong quantitative background with a solid understanding of basic statistics, as would be found in a statistics 101 level course
● Experience with a scripting language, such as Java, Perl, or Python (or R)
● Experience with SQL

Phil Kazanjian

Phil is as an Adjunct Professor at Bunker Hill Community College in Boston MA. He has a background in plastics engineering, but his passion is working in the field of IT. He holds more than 15 IT industry certifications in disciplines like VoIP, security, wireless, networking, service provider, virtualization, data center, and one of his latest ventures, Big Data. He is also currently developing BHCCs security capstone lab for implementation into NetLab. At night he works as an on-site trainer teaching Telco employees how to provision secure robust networks types using live equipment (Juniper, Cisco, Linux, etc.). He is EMC Data Science Associate (EMCDSA) certified and love hands-on tasks that involve using cutting edge technologies.