Deep Learning Training with Intel Habana Gaudi

AI is projected to have a lasting impact on just about every area and aspect of the world. From our everyday lives to infrastructure and finding cures for diseases. There are endless opportunities in this field and if you can imagine it, you can bet someone somewhere is working on it. I was very excited to join AI Field Day 3 (AIFD3) on May 18-20 to hear about some of the new technologies in this field. Tech Field Day events are focused on enterprise IT and there are many developments in different areas of Enterprise IT that are connected with AI.

The last session at AIFD3 was from Habana – an Intel Company. Habana was originally founded in 2016 to create AI Processors and in December 2019 the company was acquired by intel. The subsidiary to Intel – Habana Labs – focuses on developing disruptive solutions that are shaping the future of AI and Deep Learning and they are focusing on AI in the cloud, on premise and on Hybrid workloads.

Affordable Deep Learning Training with Intel Habana Gaudi 2

Sree Ganesan, Head of Software Products for Habana Labs, presented how Intel Habana is seeing a huge demand for expanding deep learning training and at AIFD3 we got at sneak peak of next generation products from Intel Habana. Gaudi 2 is a deep learning training accelerator available in the cloud with AWS EC2 DL1 and on-premises in a solution with Supermicro X12 Gaudi Server and DDN AI 400X2 storage. Gaudi 2 was just launched at Intel Vision in May 2022.

A recent study showed that 56% of respondents reported that cost is the most significant challenge to getting started with deep learning training. Intel Habana asked themselves: how do we give more developers access to more affordable training? This is now the challenge that Intel Habana are trying to address and through Gaudi 2, Habana is addressing the deep learning training challenge with respect to cost and efficiency.

The AI accelerator is designed from the ground-up to solve the deploring training problems. It has a heterogeneous computer architecture and has a matrix math engine for all of the matrix multiplication.

One of the unique things with Gaudi 2 is that it also one of the first chips that integrates 10 ports of 100Gb Ethernet RoCE ports. This gives Gaudi 2 the ability to scale out very efficiently. Gaudi 2 is based on an industry standard which means no proprietary interface and therefore customers will not face the issue of being locked into one vendor. In addition, the fact that NICs are integrated means customers are saving costs as well.

If you’re looking to build a cluster, it will be based on standard off-the-shelf ethernet switches to connect across multiple servers which is a straightforward approach to build an infrastructure with the accelerator fabric just using Ethernet.

Getting started with Deep Learning training

As Ganesen stated, “All the architecture is cool but the software stack that accompanies makes it easy for the end user to take advantage of the hardware.” Intel Habana have focused on developing the SynapseAI software suite to optimize for performance and ease of use.

Intel makes use of the two most popular frameworks PyTorch and Tensorflow and have integrated their software stack with it so their users can use the tools they’re used to working with. Their users are primarily Data Scientists and ML developers, and Intel Habana are focusing on making sure it is easily accessible plus easy to get started with.  

Intel Habana is at the beginning of their journey in building an interactive community around the technology and are offering different kinds of support to help people get started who want to be familiar with Habana. There is a possibility of self-driven training where you can be up and running working with the portfolio. The Gaudi developer site offers a wide variety of free training which includes documentation, videos, tutorials, user guides and you can also get an opportunity to play with different models. Ganesen makes it very clear that there’s going to be a period of time where Intel understands that users will be ramping up their knowledge and will need a lot of support in that process. Intel is going to be there to support the community in various different ways.

The goal is to minimize the barrier to entry and make it as easy as possible to get started.

If you’re curious to know more, you can visit the AI Field day page where you will find all the sessions from AIFD3 on demand. If you have questions or comments to this blog post you can connect with me on Twitter or LinkedIN.

About Tech Field Day events

Tech Field Day is a series of invite-only technical meetings between influencers and sponsoring enterprise IT companies. Companies share their products and innovations through presentations, demos, roundtables, and more. Over 2-3 days, a panel of delegates interact with these different tech companies on-site in areas like Silicon Valley or remote. The sessions are live-streamed and recordings are shared across Tech Field Day channels like YouTube and Twitter.

*Disclaimer: I am invited to participate as a Tech Field Day Delegate as a guest of Gestalt IT.  I did not receive any compensation to write this post, nor was I requested to write this post. The above post is written of my opinion and not that of Gestalt IT.