Amazon EC2 P4d is powered by Nvidia A100 Tensor Core GPUs and AWS Petabit Networks
Amazon Web Services (AWS) recently announced the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P4d instances powered by Nvidia GPUs. Recall that an instance is understood as an instance of a virtual machine running in the cloud that provides infrastructure as a service (IaaS), which implements such a service model in cloud computing, when the user is provided with a certain typical resource with specific capabilities. AWS estimates that P4d instances provide 3x the performance and 2.5x more GPU memory for machine learning and supercomputing workloads than previous generation P3 instances. And all this – at a lower cost (depending on the configuration and tariff plan, savings can be up to 60%).
AWS Announces Availability of Amazon EC2 P4d Instances
The P4d Instance provides access to eight Nvidia A100 Tensor Core GPUs and 400Gbps network (16x the P3). With AWS Elastic Fabric (EFA) and Nvidia GPUDirect RDMA (Remote Direct Memory Access), customers can aggregate P4d instances into EC2 UltraClusters. This enables access to supercomputer-grade performance by scaling P4d instances up to more than 4,000 A100 GPUs (double that of any other cloud service provider) by leveraging AWS’s non-blocking petabit-scale network infrastructure integrated with Amazon FSx for high-performance storage Luster.
The improved P4d performance speeds up the training of machine learning models, and the extra GPU memory helps customers train larger, more complex models. Customers can run containerized applications on P4d instances using AWS Deep Learning Containers with libraries for Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). For more complete control, customers can use P4d instances through Amazon SageMaker, giving developers and data scientists the ability to quickly build, train, and deploy machine learning models. P4d has support for all major machine learning frameworks including TensorFlow, PyTorch, and Apache MXNet, giving customers the flexibility to choose the environment that best suits their application.