Chandana Gopal, IDC’s Future of Intelligence Agenda Program Lead, contributed to this blog.
There can be no intelligence – organic or artificial – without a physical structure to process and manipulate information. What is true for people, is also very much the case when thinking about the intelligence of an enterprise as a whole. The future enterprise’s intelligence will be highly dependent on its ability to harness the power of artificial intelligence (AI). While far from the only capability requirement of enterprise intelligence, AI is a key enabler of broader enterprise intelligence, which IDC defines as an organization’s ability to synthesize insights from data, the capacity to learn, and the ability to deliver insights across all levels of the enterprise (to people and machines) at scale. In the modern enterprise, these capabilities are enabled by a mix of data culture, data literacy, and technology. In selecting infrastructure technology, organizations should consider the following:
- The ability to synthesize information requires infrastructure for data ingest and preparation
- The capacity to learn demands infrastructure for AI model training and model deployment monitoring.
- And the delivery of insights at scale is only possible with infrastructure for inferencing, meaning infrastructure that can run AI applications and AI-enabled applications at centralized and decentralized, stationary and mobile locations and ‘things’.
Let’s look a bit deeper at this last point.
Delivering insights at scale means that an enterprise has the ability to reliably run near real-time AI applications and AI-enabled applications that are subject to demand fluctuations. In other words: sometimes a few thousand users are tasking the infrastructure that run these applications and sometimes it is tens of thousands – or even hundreds of thousands of users. There are all kinds of factors at play to deliver insights at scale.
- What are the data volumes? Does the application take a simple piece of text as input, a large data set, or is the input a stream of video or audio? The more data, the heavier the task.
- How large and complex is the algorithm that the inferencing is taking place on? A large, complex algorithm will have to do heavier lifting to finish an inferencing task than a small and lean one.
- Is the application meant to deliver a response in real time (in milliseconds), near real-time (in seconds), or at longer intervals (in minutes)?
- How many concurrent users (people or machines/things) must the application be able to handle?
And then there is, what I call, the “hop effect.” As AI applications become more interactive and complex in their service value, they need to – in near real time – execute a sequential workflow in which they inference on multiple AI models that run on different parallelized systems in the datacenter – or cloud. Essentially they are “hopping” from one system to the next with the lowest possible latency, heavily tasking the infrastructure’s processors, co-processors, interconnects, and network, all of which were coordinated to execute the application with parallel processing.
The fact that businesses are now moving from AI model training toward AI inferencing at production scale is driving some profound changes in computing:
- What we call performance-intensive computing or PIC is growing rapidly, both on-premises, at the edge, and in the cloud.
- Workload accelerators, such as GPUs, FPGAs, and ASICs are becoming the norm.
- There is a growing trend toward using purpose-built datacenter and edge platforms, with systems that are specially designed and built for AI deployment.
- Another type of coprocessor, so-called function off-load accelerator, is gaining traction. This accelerator is designed to perform a specific function (for example, network acceleration or security) to free up the CPU and improve overall performance. It lays the foundation for composable infrastructure.
Purpose-built platforms for AI have evolved into a spectrum of AI systems, from workstations to quantum computers, with in between individual servers or cloud instances; tightly connected clustered servers; and supercomputers. And on top of these purpose-built infrastructure solutions, server OEMs, storage OEMs, and cloud service providers are also architecting more sophisticated stacks, with:
- A control plane where you’ll find things like orchestration, compute resources like acceleration or as-a-Service, and the type of compute.
- A data plane where you have the data persistence and data management with their various components.
- The application plane with all the tools, such as frameworks, optimizers, libraries, etc.
AI projects often fail, and infrastructure is sometimes to blame. Or rather, the lack of appropriate, purpose-built infrastructure. Integration can be a big problem; processors, co-processors, interconnects, networks can be bottlenecks; there can be latency issues with storage; or maybe some critical layers of that stack we just discussed are missing. It is not too hard to come up with at least a dozen infrastructure-related pain points that can slow down or hamper an AI training or inferencing exercise.
How has infrastructure technology responded to these challenges? The approach that is gaining a lot of traction, is the convergence of three workloads – data analytics, AI, and modeling & simulation – onto one infrastructure design, the earlier-described PIC approach. PIC borrows a lot of elements from what is generally known as HPC, but it is more focused on the fact that in the case of AI, it is necessary to chop up the workloads into many smaller chunks and distribute them in a parallelized fashion within a processor or co-processor, within the server, between servers in the form of clusters, and between clusters across datacenters or clouds. Some of the components of PIC are:
- Co-processors that deliver in-processor parallelization with thousands of cores
- Compute clusters, storage clusters, and ethernet or Fibre Channel to connect servers and storage
- InfiniBand or ethernet to connect clusters, and fast interconnects to connect processor and co-processors
- And things like openMP, an API for shared memory parallel programming; distributed file systems; OpenACC, a programming standard for parallel computing; and a software-defined compute layer to see all the compute resources as a single element
PIC has attracted a lot of workloads that benefit from it. IDC research (IT Infrastructure plans for 2021 Survey, IDC, December 2020) shows that more than 40% of respondents use PIC for data analytics; 38% use it for fraud detection and cybersecurity; 33% use PIC for deep learning training and 32% use it for deep learning inferencing. To sum it up, PIC is no longer confined to traditional workloads like modeling & simulation. It has been democratized and is breaking through as an infrastructure approach for some of the most critical workloads that enable businesses to evolve toward greater enterprise intelligence.
And as a result, the AI processor business has become very crowded, with many established and new players. We are seeing new technologies beyond digital enter this space. Some companies are building optical AI processors, for example, others are working on an analog AI processor, which will enable a much more biological brain-like processing. Yet others are leaping to quantum computing as the next platform for AI. In other words, performance-intensive computing will see some dramatic new technology platforms in the next, say, five years. This is a remarkable time of technological progress as the world is developing the actual “brain” for AI and ushering in the future of enterprise intelligence.
Further reading:
IDC’s Worldwide Performance-Intensive Computing Taxonomy, 2021 – Mar 2021
Performance-Intensive Computing (PIC) Market Trends – Mar 2021
Performance-Intensive Computing as a Service Adoption Trends – Apr 2021
AI Infrastructure Stack Review, 1H20: The Rapid and Varied Rise of the AI Stack – May 2020
If you would like to learn more about IDC’s “Future of X” practices, visit our website at https://www.idc.com/FoX