When the demands of Big Data analytics surpass the core count and memory available on your biggest server, you’re usually left with three dismal options: spend money you don’t have on new hardware; devote time you can’t spare rewriting code to run across clusters; or delay insights you can’t put off by shrinking the size of your problems to fit the limits of your hardware.Read More
INTRODUCTION - R IN-MEMORY
When we started searching for large scale Open R benchmarks we were surprised to find few good workloads for multi-terabyte sized TidalScale systems. We ended up writing our own R Benchmark that allowed us to scale R workloads to arbitrarily large in-memory sizes. In the process we learned a few tips and tricks that we thought we'd share for how to run large workloads using Open Source R.
Like many statistical analytic tools, R can be incredibly memory intensive. A simple GAM (generalized additive model) or K-nearest neighbor routine can devour many multiples of memory size compared to the starting dataset. And, R doesn't always behave nicely when it runs out of memory.