Why You Need a BFC (Part 1)

If you’re familiar at all with TidalScale, then you know we believe people should fit the computer to the problem, rather than the other way around.  We believe in new technologies that can be adopted easily, in leveraging advances in cost-effective hardware, and in automation. We believe you shouldn’t have to invest in new hardware to solve large or difficult computational problems. We believe commodity, industry-standard technologies hold remarkable power and possibilities that are just waiting to be tapped.

TidalScale’s HyperKernel software is the real-world manifestation of these beliefs. You can use it to turn a set of commodity systems into a big flexible computer, or BFC. (Please don’t look up alternative definitions of BFC on Urban Dictionary or elsewhere. Those other definitions are…um, not the least bit relevant to our discussion here.)

For our purposes, a BFC is the result of aggregating large amounts of compute power, memory, networking bandwidth and I/O resources into what appears to be a single large traditional computer.  (In reality, it’s a single virtual machine operating across multiple nodes which makes all the aggregated resources available to the application.) A BFC is ideal for customer applications that need a lot of memory or a high ratio of memory to processors.  A BFC reduces the time it takes to deliver solutions and services, while lowering IT costs.

We call this aggregated system a Software-Defined Server – the last piece in the software-defined datacenter puzzle. But if your preferred terminology is BFC, then go for it. We won’t judge. In fact, for the purposes of this blog, we’ll go right there with you.

With TidalScale’s HyperKernel software, you can expect certain characteristics from your BFC. Among them:

  • Virtualization of CPU, memory, network, and I/O resources
  • Mobilization of these resources, i.e., the resources are free to flow across the distributed system
  • Distributed coherent shared memory.  The guest utilizes a single flat linear memory address space that spans many nodes.
  • Preserved x86 order of execution
  • Applications that do not need to be modified or recompiled
  • Operating systems that do not need to be modified or recompiled.  They may be downloaded and run unmodified.  No special drivers or tools are required.
  • Scaling can be from one node to many nodes, aggregating hundreds or thousands of processors, and tens of terabytes of memory
  • Increasing or decreasing the scale of the guest as needed by the application

The F is for flexible

An essential aspect of any BFC is flexibility. When it comes to TidalScale, flexibility takes many forms, including:

  • The flexibility to size your system dynamically as workloads change.
  • The ability to configure aggregated resources in virtually any configuration needed to meet your needs, or in multiple combinations of configurations so different models or analyses can run simultaneously.
  • The flexibility to deploy existing server, storage and networking assets within the datacenter in ways that would be impossible with traditional virtualization.
  • The ability to extract more useful life out of aging systems.

Take dynamic system scaling – a key aspect of TidalScale Software-Defined Server technology and, by extension, BFCs.  The Linux community, for instance, has developed (and continues to develop) features that support hot plug capabilities for CPU, memory and I/O devices. TidalScale emulates the relevant hardware interfaces that are required for these Linux features.

The TidalScale hyperkernel is able to add, remove or move virtualized resources between nodes on demand or dynamically. These changes are invisible to Linux. Coupled with Linux hot plug support, customers may be able to automatically scale the size of physical memory, number of virtual processors, and the set of available I/O devices up and down.

These capabilities allow us to offer features for dynamic scaling, replacement of nodes without shutting down the guest, and reconfiguration of resource pools among multiple TidalScale instances.

BFCs and Reliability

Today, Linux and other operating systems do not provide for sudden failures of stateful components like a CPU or a bank of memory.  Transparent hardware fault tolerance has never succeeded 100% without special solutions in an OS together with its applications.  For example, when Tandem required a fault tolerant DBMS, it didn’t just run DB2 or SQL Server in a fault tolerant manner.  From an architectural point of view, TidalScale acts as an extension of the hardware, so we cannot provide transparent full fault tolerance without new features in the OS and application layer that would have to be very application specific.

However, we can imagine future Linux development that will enable a complete successful approach to fault tolerance, then our hardware emulation would support it.

Today, the state of the art for fault tolerance provides support using N+1 architectures that support load balancing and transactional reliability.  TidalScale systems can provide “warm spare” takeover of functions, but each TidalScale system fails as a unit, just as servers do today. 

TidalScale supports a single guest, which may be a very large single instance of Linux.  Customers should use the same resilience architecture for a TidalScale-hosted Linux system as they do for any other single critical Linux system.  Customers running critical applications on a TidalScale hosted Linux should deploy a resilience architecture that may include the following:

  • Deploy multiple TidalScale hosted Linux instances as necessary to support an N+1 resilience architecture for critical operations.
  • Configure multiple network ports, switch, and router infrastructure to protect against a single point of failure.
  • Incorporate storage and backup solutions appropriate to protect their data.
  • Monitor their TidalScale hosted Linux systems and applications as necessary.

Next time, we’ll look at how TidalScale addresses performance, efficiency, network latency and other dimensions that impact the ability of BFCs to deliver all the benefits of a Software-Defined Server.

Topics: TidalScale, virtualization, in-memory performance, data center