GPU Split on VMware

Share this post on:

Introduction

In the modern data center, GPUs are no longer just for high-end rendering; they are the engines driving AI, Large Language Models (LLMs), and VDI (Virtual Desktop Infrastructure). However, a physical GPU is an expensive, high-capacity resource that is often underutilized if dedicated to a single Virtual Machine (VM).

In VMware vSphere 8.x (ESXi/vCenter), GPU Partitioning (specifically through NVIDIA vGPU technology) allows you to “carve” a physical GPU into multiple virtual instances, enabling high-density, cost-effective scaling.


1. The Methods of GPU Sharing

To understand partitioning, you must first distinguish it from other VMware GPU methods:

MethodBest ForSharing?Performance
DirectPath I/OHeavy AI TrainingNo (1:1 mapping)100% (Bare Metal)
vGPU (Time-Slicing)VDI & Mixed WorkloadsYes (Software Slices)~95%
MIG (Multi-Instance GPU)Multi-tenant AI / InferenceYes (Hardware Isolation)100% per slice

2. Core Technology: NVIDIA vGPU

Most “GPU Splitting” on VMware utilizes the NVIDIA vGPU software. It operates by installing a manager (VIB) directly into the ESXi hypervisor, which then acts as a traffic cop, distributing GPU cycles to multiple VMs.

How it Splits: vGPU Profiles

When you assign a GPU to a VM in vCenter, you choose a Profile (e.g., grid_a100-4c).

  • The Number (4): Represents the amount of Frame Buffer (VRAM) in GB allocated to that VM.
  • The Letter (C, Q, or B): Represents the workload type (C for Compute/AI, Q for Quadro/Design, B for Business/VDI).

3. Advanced Splitting: MIG (Multi-Instance GPU)

For newer architectures (Ampere and later, like the A100 or H100), NVIDIA introduced MIG. Unlike standard vGPU which uses “time-slicing” (sharing the same cores over tiny fractions of a second), MIG physically partitions the GPU hardware into isolated “instances.”

  • Isolation: If one VM crashes its GPU driver, it doesn’t affect other VMs on the same physical card.
  • Predictability: Each partition has its own dedicated high-speed memory and compute cores.

4. Implementation Workflow

Configuring this in a VMware environment follows a specific “Host-to-VM” path:

Step A: Host Preparation

  1. BIOS Settings: Ensure SR-IOV and Above 4G Decoding are enabled on your physical server.
  2. Install the VIB: Download the NVIDIA vGPU Manager for ESXi. Upload the file to host using WinsSCP. SSH to EXSi host and use the CLI to install: esxcli software vib install -v /tmp/NVD-vGPU_ESXi_8.0.vib
  3. Host Graphics Settings: In vCenter, navigate to Host > Configure > Graphics. Set the “Default graphics type” to Shared Direct.

Step B: VM Configuration

  1. Add PCI Device: Edit VM settings and add a “New PCI Device.”
  2. Select vGPU Profile: Select the NVIDIA GRID vGPU and choose your desired slice (e.g., 2GB or 4GB).
  3. Reserve Memory: vSphere requires you to “Reserve all guest memory” for any VM using a vGPU.

5. New in 2026: vSphere 8 Update 3+ Enhancements

As of the latest updates in 2025 and early 2026, VMware has significantly smoothed the “splitting” experience:

  • Heterogeneous Profiles: You can now run different sizes of vGPU profiles (e.g., a 2GB slice and an 8GB slice) on the same physical GPU, provided they are the same series.
  • GPU Statistics in vCenter: You no longer need to jump into the ESXi shell (nvidia-smi) to see utilization. vCenter now provides native GPU performance charts.
  • Live Migration (vMotion): vGPU-enabled VMs can now be live-migrated between hosts without dropping the session, a massive win for maintenance windows.

Summary

GPU splitting via VMware ESXi transforms a static hardware asset into a flexible cloud resource. By moving from Pass-through (one user) to vGPU/MIG (many users), organizations can reduce hardware costs by up to 60-80% while maintaining the performance required for modern AI workloads.

Loading