General Overview
Syllabus
Key Facts
Getting Help
Assessments
University Policies
Lectures
1 - Shell I
2 - Shell II
3 - Git I
4 - Git II
5 - Markdown
6 - SQL I
7 - SQL II
8 - R and RStudio
9 - R Data Types
10 - Flow, Functions
11 - R Data I/O
12 - data.table
13 - dplyr
14 - Big Data / Parallel
15 - Efficient R I
16 - Efficient R II
17 - Visualization I
18 - Visualization II
19 - Shiny Dashboards
20 - Packages I
21 - Packages II
22 - Rscript and r
23 - Docker I
24 - Docker II
25 - C++ I
26 - C++ II
27 - Rcpp I
28 - Rcpp II
29 - Rcpp III
30 - Recap
Schedule
Resources
Course Websites
Reading Material
Code + Data Snippets
Frequently Asked Questions
Changelog
Built by
Dirk Eddelbuettel
using
Hugo
and
Learn
, and hosted in this
GitHub
org.
DSPM
>
Lectures
> 14 - Big Data / Parallel
Overview
Core Material
Additional Resources
14 - Big Data / Parallel
Overview
R is single threaded
It math libraries may not be
parallel package as perfect start
mclapply
parLapply
simple benchmarking
mention foreach, future, …
big data / external memory / bigmemory
Core Material
Slides
Video 18: parallel R Introduction
and
R Code
Vignette of R package ‘parallel’
(also included in every R installation)
Additional Resources
Shorter
tutorial
by Matt Jones
Longer
comprehensive tutorial
by Jonathan Dursi (with
sources
)
Textbook
Parallel Computing for Data Science
book by Norm Matloff