Systemic 2 Cookbook: Parallel computation with R

This post is the first in the “Systemic Cookbook” series, which will show how to use Systemic 2 for scientific purposes (especially its more advanced features not readily accessible from its user interface). More comprehensive documentation for Systemic is forthcoming.

All posts in this series will be collected in the “Systemic 2” category.

Systemic 2 automatically parallelizes so-called “embarrassingly parallelizable” workloads that do not need shared data or synchronization. Such workloads include bootstrap resampling and periodograms (each resampled data set is analyzed independently), Markov Chain Monte Carlo (each chain runs in parallel, and only synchronizes at infrequent intervals), jackknife, and other algorithms. Some algorithms can even be launched on Xgrid-connected clusters (Why, Apple?!).

The majority of embarrassingly parallelizable workloads, however, will be better defined by the user. In the rest of this article, I will show how to easily parallelize dynamical integration over a grid using an R script. (The following problem could be just as easily set up using C, but I will be showing how to accomplish it in R since the code will be most transparent in a high-level language.)

Setting up dynamical integration over a grid

The (somewhat artificial) problem is as follows. Imagine that we would like to investigate the survival time of a two-planet system, as we increase the eccentricity of the outer planet. Intuitively, we expect the system to become unstable faster as the eccentricity increases.

Let’s first set up a sample system and integrate it forward in time with Systemic2.

To run this script, type

Rscript scriptname.r

from the shell (edit the script beforehand to point it to the correct path to Systemic).

This script sets up a two-planet system around a 0.95 solar mass star, integrate it forward 50,000 years using the SWIFT backend and plot it to a PDF file. The “k” object is called a kernel within Systemic; it contains all the data and functions associated with the planetary system (internally, it is a handle returned by the C library).integration_1e5 Next, we create a grid of eccentricity and longitude of periastron angles. The rationale here is to average the survival times for each eccentricity, over the longitude of pericenter in order to remove the dependency on the phase angle (again, this is an artificial exercise!).

This part of the script involve some idiomatic R functions (not too dissimilar from the strategies used by Matlab, Numpy and other vectorized languages). To create a grid of eccentricity/phase angles (a cartesian product), I used the expand.grid function. Subsequently, I loop over the (e, lop) pairs using mapply. Mapply (m-apply) applies a function over each element of the two vectors and collects the return values of the function.

The function called by mapply sets up a kernel with the given eccentricity and longitude angle, and integrates the system forward in time using kintegrate. Kintegrate is the Systemic function that integrates a system forward in time,  returning an object containing information about the dynamical evolution of the system. One of the fields of the object is “survival.time”, how long the system survives (in this case, whether there is a close encounter or an ejection of one of the bodies).  Then, we collect the survival times, group them by eccentricity (using split) and take the median (using sapply).  Finally, we plot the eccentricity vs. the survival time.

Parallelizing the problem

Recent R builds include the parallel package. The parallel package includes various implementations of apply (map) functions that are drop-in replacements for the builtin R functions (e.g. the *apply family of functions). Parallelization is achieved by transparently launching a number of “worker” processes, executing the function on each worker and finally collecting the result (the workers are heavyweight processes rather than threads, so you will see several R processes if you inspect your running programs).

Screen Shot 2014-01-16 at 8.36.07 PM

Here is the revised script:

Note that the only change was using the “mcmapply” (multicore-mapply) function instead of the “mapply” builtin! Since this problem is eminently parallelizable, the speedup is linear in the number of physical cores.

Closing words

In this case, parallelization was trivial; in other cases, it might be more involved (for instance, when balancing is required).