class: center, middle, inverse, title-slide # The R Infrastructure ## How we build stuff ### Jeroen Ooms ### 2018/09/14 --- background-image: url(utrecht.jpg) background-position: 50% 50% # Infrastructure is never done... --- # Hello World About me: PhD Statistics UCLA 2014 (Jan de Leeuw, Mark Hansen). Currently I am postdoc at UC Berkeley with the [rOpenSci](https://ropensci.org/) group. data:image/s3,"s3://crabby-images/eaada/eaada7690cbf5686b175d8bdb5e7353b38deb761" alt="team" --- background-image: url(unconf.jpg) background-position: 50% 50% # The rOpenSci (extended) Family --- # CRAN Packages [data:image/s3,"s3://crabby-images/5899b/5899b4e0fad2fa8b069b56bb59226f5d99c515a2" alt="pkgscreen"](https://cran.r-project.org/web/checks/check_results_jeroen_at_berkeley.edu.html) --- background-image: url(screen3.png) background-position: 50% 50% # Also this --- class: inverse, center, middle # PART I: Base Infrastructure --- # Base Dependencies To the R user, the dependency system looks mostly like this: data:image/s3,"s3://crabby-images/bb7c9/bb7c906255c980b2726047133f0f8e662a47217d" alt="diagram" --- # Base Dependencies However R itself also depends on other software: data:image/s3,"s3://crabby-images/d0bf1/d0bf12cf6f7a7ee5252b437559ab2e599469ab04" alt="depends" --- # Base Dependencies So the reality is more like this: data:image/s3,"s3://crabby-images/4c470/4c470745c800e1bb359c30a2f868c5fba15e8cc3" alt="diagram2" --- class: inverse, center, middle # What are all these libraries used for? --- # BLAS / LAPACK: Linear Algebra Most statistical methods involve matrix calculations (QR, Cholesky SVD, etc). R uses high performance BLAS / LAPACK routines for linear algebra. data:image/s3,"s3://crabby-images/1884a/1884a18d52e62d323a3266ffaa612ceac4c5e388" alt="blas" --- # BLAS / LAPACK: Linear Algebra For example when R calculates `\({\displaystyle {\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} ,}\)` ```r # define X matrix and y vector X <- as.matrix(cbind(1,cars$speed)) y <- as.matrix(cars$dist) solve(t(X) %*% X) %*% t(X) %*% y ``` ``` [,1] [1,] -17.579095 [2,] 3.932409 ``` ```r # As done by lm() coef(lm(dist~speed, data = cars)) ``` ``` (Intercept) speed -17.579095 3.932409 ``` --- # LIBCURL: Networking R uses libcurl for downloading files over FTP/HTTP/HTTPS. This functionality is used in e.g. `download.file()` and `install.packages()`. data:image/s3,"s3://crabby-images/5e095/5e0957117b0155e5aa12bef7017bec8d6447388f" alt="libcurl" --- # LIBCURL: Networking Note that https uses an SSL connection, which requires encryption support. ```r install.packages("MASS", repos = 'https://cloud.r-project.org') ``` ``` trying URL 'https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.5/MASS_7.3-50.tgz' Content type 'application/x-gzip' length 1163764 bytes (1.1 MB) ================================================== downloaded 1.1 MB ``` --- # LIBICU: Text and Encoding ICU (International Components for Unicode) is a used for converting text encoding and string comparision. data:image/s3,"s3://crabby-images/7d88d/7d88dfb720212809f3e791e81c90063a42fda33b" alt="icu" --- # PCRE: Regular Expressions R exposes several regular expression functions such as `grep` `regexpr`, but also uses regular expressions internally. data:image/s3,"s3://crabby-images/911e5/911e586e547327a9bb91be6c7cb046ee61c9e3f7" alt="pcre" --- # LIBICU: Text and Encoding This should yield the same results as in other languages: ```r validate_ip_address <- function(x){ ip_addr_rexex <- "\\b(?:\\d{1,3}\\.){3}\\d{1,3}\\b" grepl(ip_addr_rexex, x) } validate_ip_address(c("127.0.0.1", "1.1.1.1", "1000.1.1.1")) ``` ``` [1] TRUE TRUE FALSE ``` And results need to be consistent across platforms and locales: ```r authors <- c("Hadley Wickham", "Gábor Csárdi", "谢益辉") enc2native("Gábor Csárdi") == authors ``` ``` [1] FALSE TRUE FALSE ``` ```r grep("益", authors, value = TRUE) ``` ``` [1] "谢益辉" ``` --- # CAIRO: graphical rendering Cairo is used to render graphics, i.e. to convert the shapes, attributes and text from the R graphics device into a bitmap image. data:image/s3,"s3://crabby-images/8bb70/8bb709aa5ed463f9b1f9f208965d02b70e545b7e" alt="cairo" --- # FONTCONFIG: (via cairo) Finding Fonts Want your plot axis to show sans-serif italic labels? Fontconfig finds the appropriate font that is available on your system. data:image/s3,"s3://crabby-images/13384/133846ef7c5f094fa47d8b58cb316ecffd4d5ba5" alt="freetype" --- # FREETYPE: (via cairo) Rendering Text Freetype then combines the text with font data (font, size, style) to render the actual figures (glyph image) that form the readable characters in your graphic. data:image/s3,"s3://crabby-images/c6881/c6881d601769ed689240d4a6552ccc836992fb72" alt="freetype" --- # LIBPNG, LIBJPEG, LIBTIFF A bitmap is merely a matrix of pixels. Additional libraries are needed to export the bitmap to various image formats that other software will understand. data:image/s3,"s3://crabby-images/bf2c5/bf2c5e89d1c1b2da1776bcb54ceb21f254b3f665" alt="libpng" --- # CAIRO, FREETYPE, FONTCONFIG, LIBPNG Think for a second about the calculations required to determine the color of each pixel in a bitmap image based on a few simple simple shapes. ```r plot.new() plot.window(xlim = c(0, 100), ylim = c(0, 100)) polygon(c(10, 40, 80), c(10, 80, 40), col = 'hotpink') text(40, 90, labels = 'My drawing', col = 'navyblue', cex = 3, family = "Times") symbols(c(70, 80, 90), c(20, 50, 80), circles = c(10, 20, 10), bg = c('yellow', 'orange', 'red'), add = TRUE, lty = 'dashed') ``` data:image/s3,"s3://crabby-images/f8552/f855269a230e3c1bbbbb4ecf9cd77b46b2faf440" alt=""<!-- --> --- # Why External Libraries R relies on external libraries to do the heavy lifting. This is great because these libs are: - Widely used - Portable (work on all systems) - Performant - Thoroughly tested - Well maintained - Free It would be impossible to implement all this functionality ourselves in R. --- class: inverse, center, middle # I don't remember installing these things? --- # Static vs Dynamic Linking The way these libraries are installed depends on your operating system. On operating systems that install R via a package manager (Linux, Homebrew), R dynamically links to the shared libraries. The package manager automatically installs the dependencies when the user installs R. <table> <tr> <th></th> <th>Native Compiler</th> <th>Native Package Manager</th> <th>Linking</th> </tr> <tr> <th>Linux</th> <td>yes</td> <td>yes</td> <td>dynamic</td> </tr> <tr> <th>MacOS</th> <td>yes</td> <td>no</td> <td>static</td> <td></td> </tr> <tr> <th>Windows</th> <td>no</td> <td>no</td> <td>static</td> </tr> </table> On systems that do not have a native package manager (Windows, MacOS), we have to statically link the libs into the R binaries that we ship in the installer. --- # Dynamic Linking data:image/s3,"s3://crabby-images/ee95f/ee95feee74a0738db34cc00eede30b7079e85d18" alt="apt1" --- # Dynamic Linking data:image/s3,"s3://crabby-images/8687e/8687e8f4a2e40c79f3cf06ef8518f87006210a67" alt="apt2" --- # Static Linking With static linking, external libraries get embedded into the binaries (`R.dll` in this case). data:image/s3,"s3://crabby-images/6e822/6e82224574240a0ef90525d2642e0c68c9e2c907" alt="rbig" --- # Building R for Windows Windows does not have a native compiler nor package manager. To build R for Windows: -- 1. Install build environment with compiler (e.g. rtools) -- 2. Build required libraries with this compiler -- 3. Build base R with this compiler and static link to the libs -- 4. To build R packages, we need same compiler, same R, and same libs. --- # Building R for Windows The scripts and libs used to build base R for Windows and the installer are open source: data:image/s3,"s3://crabby-images/156c5/156c5801fcec6c1f119eada05f0c5ba9c9bf120a" alt="base1" --- # Building R for Windows The readme explains how you can build R locally. Or you can look at the script. data:image/s3,"s3://crabby-images/b7d2c/b7d2ca47db6b369263e7fc578d323174fe5c462a" alt="base2" --- # Building R for Windows The script runs every night on appveyor and on success, the installers get uploaded to CRAN. data:image/s3,"s3://crabby-images/16516/165167e7f18527fe4c693a53557c608383faa2fe" alt="base3" --- class: inverse, center, middle # PART II: CRAN and Scalability --- # Building Packages Just like R itself, many R packages take advantage of external libraries. A few of the older CRAN packages that use external libraries: <table> <tr> <th>R Package</th> <th>Required libs</th> <th>CRAN release</th> </tr> <tr> <td>RMySQL</td> <td>libmysqlclient</td> <td>2000</td> </tr> <tr> <td>XML</td> <td>libxm2</td> <td>2000</td> </tr> <tr> <td>RCurl</td> <td>libcurl</td> <td>2004</td> </tr> <tr> <td>gmp</td> <td>gmp</td> <td>2004</td> </tr> <tr> <td>Rmpfr</td> <td>libmpfr</td> <td>2009</td> </tr> </table> -- On Linux, the user has to install the required libraries manually when installing the R package. -- For Windows and MacOS, CRAN (or any other repo) can build so called binary packages that include the __statically linked external library__, just like we did for base R. --- # Building Packages Statically linked binary packages make installing R packages easy on Windows / MacOS: data:image/s3,"s3://crabby-images/f3643/f36437c4861cb7f8511f15d68e39312a7b666fd4" alt="xml2" --- # Building Packages But: it is not trivial to make this work. -- Somebody has to build (and occasionally update) the library and it's dependencies using the same compiler and flags for static linking with the R package. -- Building even a single library can be a lot of work. Unfortunately these libraries are also becoming increasingly complex and interdependent. -- data:image/s3,"s3://crabby-images/0dea1/0dea1fe29ed6da30753ca53eaf77b81864b686ca" alt="fortran" --- # CRAN growing On top of that, the number of packages on CRAN is growing rapidly: data:image/s3,"s3://crabby-images/991bc/991bc363bdc777558b64a9438b9a4dd57f0cc669" alt="cran" Many of packages require one or more external libraries... --- # The rwinlib Organization The Github organization 'rwinlib' is an archive of static libraries for libs used by numerous CRAN packages that were built with Rtools on Windows. data:image/s3,"s3://crabby-images/d4fc4/d4fc488116a0a521c197d2dcb3c9a23fd607f827" alt="rwinlib" --- # Database Drivers Most databases have specialized clients libraries: <table> <tr> <th>R Package</th> <th>Required libs</th> </tr> <tr> <td>RMySQL, RMariaDB</td> <td>libmariadb + openssl</td> </tr> <tr> <td>RPostGres, RPostgreSQL</td> <td>libpq + openssl</td> </tr> <tr> <td>RODBC, odbc</td> <td>unixodbc</td> <tr> <td>redux, rredis, RcppRedis</td> <td>hiredis</td> </tr> </tr> <tr> <td>mongolite</td> <td>mongo-c-driver + openssl + libsasl</td> </tr> </table> --- # GDAL: spatial abstraction library A complex example is GDAL (Geospatial Data Abstraction Library) that can read and write 100+ different spatial data formats (think maps and sattelite images). data:image/s3,"s3://crabby-images/f26b3/f26b33e31cae5a5002d0bafab3386098ab9abdf7" alt="gdal" --- # GDAL: spatial abstraction library The current rwinlib GDAL2 stack depends on no less than 35 additional driver libraries! It is used to build the R binary packages for `sf`, `rgdal`, and `rgeos` on Windows. data:image/s3,"s3://crabby-images/2f946/2f9460741d3b133dcc90bc520c57db3b5ccd0dca" alt="gdallibs" --- # GDAL: spatial abstraction library Package authors sometimes request extra features from the libraries: data:image/s3,"s3://crabby-images/a0763/a076321d438d23250d9fb877bc09550a7ff698ab" alt="gdaledzer" --- # GDAL: spatial abstraction library And now R users on Windows can access open access EU sattelite images in `sf` and `rgdal`! data:image/s3,"s3://crabby-images/f6ef5/f6ef57a0e1037803d4413c4683485f294f9e1434" alt="tweet" --- # Imaging, Graphics, and Vision Another example: At rOpenSci we are developing on a suite of packages to expose high quality images libraries in R across various applications and fields: - Spatial (as seen before) - Medical (MRI) - Graphics and post processing - Vision - OCR - Animation and Video - Rendering pdf, svg All of these tools use high quality open source libraries. We provide the R interfaces. --- # OCR (TESSERACT) ```r library(magick) image_read("https://jeroen.github.io/images/receipt.png") %>% image_resize('50%') ``` <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="113" /> ```r library(tesseract) numbers <- tesseract(options = list(tessedit_char_whitelist = "$.0123456789")) text <- ocr("https://jeroen.github.io/images/receipt.png", engine = numbers) cat(text) ``` ``` $90.52 $81.52 $9.00 $90.52 ``` --- # Vision (OPENCV) OpenCV has built-in filters for detecting human shapes... data:image/s3,"s3://crabby-images/6ecc9/6ecc97232ca37eab5e286295f23e1e59c5f47745" alt="vision1" --- # Vision (OPENCV) Or faces: data:image/s3,"s3://crabby-images/1f0b4/1f0b47946ece6a8efbde173fceaf3f5737cf35cf" alt="vision2" --- # Animated Graphics ```r library(gganimate) p <- ggplot(airquality, aes(Day, Temp)) + geom_line(size = 2, colour = 'steelblue') + transition_states(Month, 4, 1) + shadow_mark(size = 1, colour = 'grey') animate(p, fps = 25, width = 800, height = 350) ``` data:image/s3,"s3://crabby-images/573c5/573c5ae329950b1dcb37ee624db997db5e5f1a14" alt=""<!-- --> --- class: inverse, center, middle # PART III: Ongoing Developments --- background-image: url(road.jpg) background-position: 50% 50% # Infrastructural Work --- # RTOOLS 40 We are currently beta-testing a new version of Rtools that includes a full build environment and package manger. data:image/s3,"s3://crabby-images/6d7b7/6d7b7990c6d7e589d22e4740a62fb925717f477c" alt="rtools40" --- # RTOOLS 40 This will make it easier to build, distribute and install external libraries on Windows. data:image/s3,"s3://crabby-images/26c25/26c250728eef1d3aa43066de799be05b4142fdbf" alt="rtools-packages" --- # RTOOLS 40 The system will also it possible to automate building libs on AppVeyor. This makes things more transparent, maintainable and reproducible. data:image/s3,"s3://crabby-images/ff448/ff4487382f53342667472e02d1a5ce859720b213" alt="rtoolsav" --- # RTOOLS 40 A beta version of rtools 40 and a version of R that has been configured for rtools40 is available from CRAN: https://cloud.r-project.org/bin/windows/testing/rtools40.html data:image/s3,"s3://crabby-images/ecc66/ecc66ac7625246c790b938153ab295a0f50d2514" alt="rtoolscran" --- # RHUB R-hub is a service for building and checking R packages. Part of the project is indexing the system requirements (including libraries) for R packages, and expose this via an API: data:image/s3,"s3://crabby-images/4cbab/4cbab5e00bc5339b0cb36bb486324a48b5b399d4" alt="sysreqdb" --- # RHUB R-hub uses this to automatically install the correct software and libraries on each of the supported operating systems, before building the R package. data:image/s3,"s3://crabby-images/5c274/5c274baaa609734ee3ea1a4dece816ce7e768d02" alt="rhubbuilders" --- # VISION The overal idea is to create an infrastructure that can support the increasingly complex R packages with powerful system libraries, while reducing the maintenance work for the repository maintainers. --- background-image: url(artist.jpg) background-size: cover # Artist Impression