The R-universe Build Infrastructure

# The R-universe Build Infrastructure
## Bioc Devel Forum
### Jeroen Ooms
### 2021-11-18

---

# What is R-universe?

???

Welcome! This talk is about R-universe.

I haven't talked to this group before and I wasn't quite sure what you would find interesting.

I thought it would be interesting to talk a bit about the build infrastructure behind r-universe.

Perhaps we can discuss how this compares the bioconductor.

There are many other things we can discuss. I don't use Bioconductor myself, but I am aware that one of it's unique features is the centralized release cycle, which is different from CRAN.

Maybe we can talk a bit about that and if that is something that would work with r-universe repositories.

---

# Research / software publishing for everyone !

![website2](images/website2.png)

???

R-universe is a ambitious new platform by rOpenSci for publishing research and research software, written in R.

The platform is free and anyone can sign up, whether you are a novice R user, student, researcher, or professional package developer.

There are currently about 350 registered individuals or organizations, together maintaining several thousands of R packages and articles.

The major difference with something like CRAN is that everyone manages their personal repository.

Therefore, there is no need for gate-keeping. If you have some R code or an rmarkdown article that you think is worth sharing, 
regardless of what it is, you can sign up and publish it in your universe.

---
class: inverse, center, middle

# Previous talks / information about R-universe

???

Today I will mostly talk about infrastructure. But if you want to learn more about other aspects of the project, there are a few good resources:

---
class: fullpage

# Project Homepage

![homepage](images/runiverse-homepage.png)

???

The best starting point is the project homepage on the ropensci website.

Here we also link to several relevant talks and blog posts.

---
class: fullpage

# Previous talks: useR 2021 keynote (introduction)

![rstudio-talk](images/user2021.png)

???

The recording from the keynote at useR2021 gives a general introduction to the project.

Gives an overview of what there currently is and what our plans are.

---
class: fullpage

# Previous talks: rstudio-conf 2021 (metrics)

![rstudio-talk](images/rstudio-talk.png)

???

Talk from rstudio-conf from January this year that focusses a bit on the metrics part.

The talk is also linked on our homepage.

In the talk I try to explain why we believe it is important for
organizations and potential users of software to get a sense of the health
and the quality of the software that they want to build on.

---
class: fullpage

# More information: ropensci.org technotes

![commitstatus](images/setupblog.png)

???

From our homepage you can find several talks and blog posts that explain what is possible and how to get started.

---
class: fullpage

# More information: ropensci.org technotes

![homepage-build](images/homepage-build.png)

---
class: fullpage

# More information: ropensci.org technotes

![homepage-articles](images/homepage-articles.png)

---
class: inverse, center, middle

# What is R-universe?

---

# What is R-universe

Experimenting with new ideas for _publishing_ and _discovering_  research / software in R, based on 10 years of rOpenSci experience:

- Personal CRAN-like repos
  - Extensible build system based on git
  - Live reproducible Rmd articles 
  - Easily browsing research and software
  - Dashboards
  - API access
  - Global feeds
  - Software health metrics
  - Dependency network analysis
  - Finding software citations
  - Etc
  
These ideas are still developing as things are taking shape.

???

R-universe is an umbrella project, under which we are experimenting with many ideas that we have developed over the past years in rOpenSci, to take open-science with R to the next level.

In essence, R-universe is an _open_ platform for publishing and discovering research and research software written in R.

The platform has many features and components:

At it's core, R-universe provides everyone with a personal CRAN-like package repository, based on git, which is backed by a modern build system.

It allows you to publish packages with software but also other research material, such as automatically rendered Rmarkdown articles.

And then on top of that, the R-universe platform provides extensive dashboards and feeds and apis and metrics, and so on, to make this content accessible and discoverable, both for humans but also programatically.

---

# How it works (in a nutshell)

Every organization or user has a unique domain for publishing their content.

## `https://<user>.r-universe.dev`

Where `<user>` is a GitHub account name (user or organization), e.g.: https://ropensci.r-universe.dev

On this domain you can find:

???

How does it work?

In R-universe, every _user_ or _organization_ has a unique subdomain under r-universe.dev, for publishing their content.

The subdomain is mapped to your github account name, so it can either be an individual account or an team account.

For example, I publish personal content is under jeroen.r-universe.dev and rOpenSci content under ropensci.r-universe.dev.

- __CRAN-like repo__ with any R packages + win/mac binaries
 - Live rendered __Rmd articles__
 - Programmable __API access__
 - Interactive __dashboard__ for browsing and monitoring activity
 - Everything is __fully automated__
 - Metrics and other meta functionality (planned later 2021)
 
The R-universe platform connects these different universes with each other, though global feeds, cross referencing maintainers, etc. A bit like GitHub! But for R :-)

???

On this domain you can find a cranlike repository with R packages owned by this user or organization.

These include binaries for Windows and MacOS, in the same structure as CRAN itself.

On the domain you can also find Rmarkdown articles published by this user or organization, and a lot of meta-data.

When you open the domain in a browser, you can browse the content visually using a dashboard, but the same data is also accessible programmatically using HTTP APIs.

Everything is automatically generated and updated from this user's git repositories.

---
class: inverse, center, middle

# Example!

---
class: fullpage

# Example: the ggseg universe

![ggseg-builds2](images/ggseg-builds0.png)

???

Let's look at an example.

Here we see the `ggseg` universe.
ggseg is suite of R packages developed at the University of Oslo for brain and cognition research.

The owner if this universe is the ggseg organization on GitHub, hence the URL for the universe is ggseg.r-universe.dev.

---

# Example: the ggseg universe (packages)

![ggseg-builds3](images/ggseg-builds3.png)

???

The _builds_ tab shows an overview of recently updated packages, including the version and the maintainer, and the date of the most recent commit.

It also shows this green badge, when the package is available from CRAN.

And in the top it shows example code on how to install a package from this universe in R. So users can easily copy paste that.

---
class: fullpage

# Example: the ggseg universe (packages)

![ggseg-builds4](images/ggseg-builds4.png)

???

And then this column shows if binary packages are available for Windows and MacOS, which is the case for all packages.

If the icon is gray, it means that there was some warning or error running R CMD check, but the binary package is still available.

If there is no binary package you would see a red cross icon.

---
# Example: the ggseg universe (packages)

### Users install simply using `install.packages`.

```r
# Enable this universe
options(repos = c(
    ggseg = 'https://ggseg.r-universe.dev',
    CRAN = 'https://cloud.r-project.org'))

# Install some packages
install.packages('ggseg3d')
```

???

As said, the top of the page showed example code of how a user would install a package form this universe.

The easiest way is setting the `repos` option in R as shown above.

So we also specify CRAN as the second repository, which may be needed for dependencies of this package.

With this code, the dependencies of the package that are available from the ggseg universe are taken from there, and the remaining dependencies are installed from CRAN.

## No `remotes` / `devtools`/ `rtools` required !!

???

Note that all of this is done using the base R package manager, exactly as when you install from CRAN. On Windows and MacOS, the user does not need any complicated tools and libraries to build packages from source.

---
class: fullpage

# Example: the ggseg universe (packages)

![ggseg-packages](images/ggseg-packages.png)

???

This shows a more detailed information about packages, mostly taken from the package description files.

---
class: fullpage

# Example: the ggseg universe  (articles)

![ggseg-articles1](images/ggseg-articles1.png)

???

Articles get automatically rendered on our build system using the vignette system in R.

However, articles don't have to be limited to software documentation.

As I will explain later, you can publish any Rmarkdown article in your universe. For example a research paper or tutorial, or homework assignment.

---
class: fullpage

# Example: the ggseg universe  (articles)

![ggseg-article](images/ggseg-article.png)

???

You can easily browse these articles within the dashboard.

The content of these articles is taken from vignettes, rendered with in a consistent html theme,
that is the same across the system.

So hopefully this makes it more pleasant to browse and read different articles.

---
class: fullpage

# Example: the ggseg universe  (APIs)

![ggseg-api0](images/ggseg-api0.png)

???

Open science means that content is not locked in to some platform, but accessible in standard formats.

All of the data from the dashboard, and much more information about the packages and articles
can be retrieved though APIs.

The API tab shows some documentation from the most important endpoints.

---
class: fullpage

# Example: the ggseg universe (APIs)

![ggseg-api0](images/ggseg-api2.png)

???

That looks like this. The APIs are still under development but you can play around with it.

---
class: fullpage

# Example: the ggseg universe  (APIs)

![ggseg-json](images/ggseg-json.png)

???

For example the `/packages` endpoint gives all the artifacts and information from the database about a given version of a given package.

So in this example it returns all the data about ggseg. Which is a json list, where each entry is a package file. So in this case there will probably be 5 entries: one source package, two windows binaries and two macos binaries.

---
# Example: the ggseg universe  (APIs)

Use `jsonlite` to read data in R:

```r
library(jsonlite)
ggseg <- fromJSON('https://ggseg.r-universe.dev/packages/ggseg/1.6.3')
```

Aggregate data uses ndjson so you need `jsonlite::stream_in()`:

```r
library(jsonlite)

# All package description data
descriptions <- stream_in(url('https://ggseg.r-universe.dev/stats/descriptions'))

# All articles
vignettes <- stream_in(url('https://ggseg.r-universe.dev/stats/vignettes'))

# Check runs
checks <- stream_in(url('https://ggseg.r-universe.dev/stats/checks'))
```

???

And of course you can read all of this in R. I'll leave it up to you to try and run this code.

Once thing to note is that many endpoints in R-universe use ndjson format, which is a streaming version of json. And therefore you need to use `stream_in` if you want to read with jsonlite.

Or an equivalent function to read ndjson from your favorite json library.

---
class: fullpage

# Global feed: builds

![globalbuilds](images/global-builds.png)

???

The homepage on r-universe.dev shows a global feed of all package commits across all universes. So that is fun to look at to see what people are working on.

---
class: fullpage

# Global feed: articles

![globalarticles](images/global-articles.png)

???

Similarly there is a global feed of all the articles, that have been recently created or updated.

This is actually very cool to look at, I have discovered several cool new packages and features by looking at the feed of article updates.

---
class: fullpage

# Global feed: maintainers

![globalmaintainers](images/global-maintainers.png)

???

The homepage also shows a maintainer view.

So this is an aggregate across all universe, grouped by the maintainer.

---
class: inverse, center, middle

# R-universe Infrastructure

---
# The R-universe build system

We distinguish 3 core parts of the infrastructure:

- __Source monorepos__: Manage package repositories as monorepos with a central registry.
 - __CI build workflow__: extensible chain to build R package binaries, docs, and other things.
 - __Deployment server__: A high-performance “cranlike” package server with APIs for metadata and frontends

The user only provides a registry file:

```json
[
  {
    "package": "taxize",
    "url": "https://github.com/ropensci/taxize",
    "branch": "master"
  },
  {
    "package": "rplos",
    "url": "https://github.com/ropensci/rplos",
    "branch": "master"
  },
...
```

???
The owner of the universe only supplies a so-called registry file, which specifies the list of git repositories where the packages are hosted, and which branch we should be tracking.

---
class: fullpage

# Step 1: Source monorepo

![monorepo](images/monorepo.png)

???

The system automatically creates a monorepo under the r-universe GitHub organization.

Each package in your repository is a submodule in the monorepo.

Our systems automatically sync with any changes in your registry file or any of the package repositories.

This monorepo contains the canonical source *state* of the repository.

It shows what currently is, or was in your repository at any given point in time.

---
class: fullpage

# Step 1: Source monorepo

![monorepo2](images/monorepo2.png)

???

You can look at the commit log of the monorepo to see exacly when packages were updated.

---
class: fullpage

# Step 2: CI Workflow

![monorepo2](images/workflow.png)

???

Every time a package gets updated, it triggers the CI workflow on GitHub actions.

This consists of 10 steps or something.

This currently runs on GitHub actions. In theory we could also self-host it, but GitHub actions is very convenient for this.

We build the source package, pkgdown documentation, binary packages for Windows and mac.

Along the way we also collect some metadata about the package, such as information about vignettes, system dependencies, if the package passed checks and so on.

The final step is where we deploy it to the package server.

---
class: fullpage

# Step 3: The package server

![npm](images/npm.png)

???

I don't know exactly how BioConductor works. But this is the piece I spent most time on.

---
class: fullpage

# Step 3: The package server

![ssh](images/ssh.png)

---

# Process Overview

![diagram](images/r-universe-diagram.svg)