is now the recommended stable driver for Linux x86_64 and arm64-sbsa platforms using CUDA 13.2. Mandatory Driver Version

cudaMemcpyWithAttributesAsync for more flexible transfers and batch asynchronous copy APIs ( cuMemcpyBatchAsync ) for variable‑sized transfers between multiple source and destination buffers.

What’s New and Important in CUDA Toolkit 13.0 - NVIDIA Developer

Previous drivers treated a kernel launch as a monolithic block. If a high-priority AI inference task arrived while a graphics or compute kernel was running, latency spiked. R570 introduces per-warp priority queues . Early benchmarks show a 40% reduction in tail latency for real-time LLM token generation when the GPU is also handling background compute.

: Developers can now express matrix-tile operations directly inside native C++ structures via NVIDIA Developer Docs . The driver dynamically resolves lower-level parallelization, asynchronous register data transfers, and memory tiling, allowing code written for older architectures to scale inherently to Hopper or Blackwell layers.

This provides the development environment (compilers, libraries, and tools) used by programmers to build GPU-accelerated applications.

The driver introduces native kernel optimizations for FP8 and FP4 precision formats. By tightly integrating Transformer Engine libraries into the driver's runtime compiler, the system handles mixed-precision math with higher registers efficiency. Tensor Cores execute matrix multiplications back-to-back with minimized register spilling, speeding up token-generation loops. 🔬 High-Performance Computing (HPC)

Even if you don’t need new features, upgrade to R570.100 for this security fix.

: Solved severe mathematical regression bugs where the cublasLtMatmul() function incorrectly ignored specific scaling pointers during NVFP4 matrix multiplications.

For developers, CUDA 13.3's C++ Tile programming and compiler autotuning deliver immediate performance gains with minimal code changes. For IT administrators, the steady driver updates ensure that GPU‑accelerated workloads remain stable and secure. And for anyone planning next‑gen hardware purchases, the early BOOT_42 patches signal that Rubin support is already in active development.