Gartner has a vision for intelligent infrastructure. We complete it. READ THE PAPER >

The TidalScale Blog

    Scale Up Open Source R on TidalScale

    Posted by Chuck Piercey on Nov 22, 2016 3:50:38 PM

    Like many statistical analytic tools, R can be incredibly memory intensive. A simple GAM (generalized additive model) or K-nearest neighbor routine can devour many multiples of memory size compared to the starting dataset. And, R doesn't always behave nicely when it runs out of memory.
    There are techniques to get around memory limitations, like using partitioning tools or sampling down. But these require extra work. It would be really nice to run elegantly simple R analytics without that hassle. 

    Using a really big, public dataset, from CMS.gov, Chuck will show GAM, GLM, Decision Trees, Random Forest and K Nearest Neighbor routines that were prototyped and run on a laptop then run unchanged on a single simple Linux instance with over a Terabyte of RAM against the entire dataset. This big computer is actually a collection of smaller off-the-shelf servers using TidalScale to create a single, virtual server with several terabytes of RAM.

    Read the R performance whitepaper

     

     

    Topics: R, TidalScale