By Simon Brandeis, Software Engineer – Hugging Face
By Philipp Schmid, Tech Lead – Hugging Face
By Julien Simon, Chief Evangelist – Hugging Face
By James Yi, Sr. AI/ML Partner Solutions Architect – AWS

Hugging Face

The emergence of generative artificial intelligence (AI) has garnered significant attention on a global scale, fueling the proliferation of applications rooted in this cutting-edge technology.

Numerous enterprises, many possessing robust AI and machine learning (ML) capabilities, have promptly embraced this concept and embarked on the transformative journey of infusing their products with generative AI capabilities.

To achieve this, they often opt for foundation models (FMs) from Amazon SageMaker JumpStart or Amazon Bedrock, systematically crafting end-to-end solutions that leverage the full spectrum of MLOps tools available on the Amazon Web Services (AWS) ecosystem.

Organizations with limited expertise and personnel resources in this field express a keen interest in expeditiously evaluating and utilizing advanced FMs. Unfortunately, they frequently encounter challenges and complexities along this path.

Hugging Face Platform provides no-code and low-code solutions to train, deploy and publish state-of-art generative AI models for production workloads on managed infrastructure. In 2023, the Hugging Face Platform became available on AWS Marketplace, allowing AWS customers to directly subscribe to connect their AWS account with their Hugging Face account.

This allows customers to directly pay for Hugging Face usage with their AWS account, a new integrated billing method that makes it easy to manage payment for usage of all managed services by all members of your organization. Hugging Face has provided step-by-step guidance for customers how to subscribe and connect their Hugging Face account with their AWS account.

In this post, we will delve into the premium features and managed services provided by the Hugging Face Platform, elucidating the value they to customers.

Hugging Face is an AWS Specialization Partner with Competencies in Generative AI and Machine Learning. Its mission is to democratize machine learning through open source, open science, and Hugging Face products and services.

Inference Endpoints

Inference Endpoints from Hugging Face offers an easy and secure way to deploy generative AI models for use in production, empowering developers and data scientists to create generative AI applications without managing infrastructure. It simplifies the deployment process to a few clicks, including handling large volumes of requests with auto scaling, reducing infrastructure costs with scale-to-zero, and offering advanced security.

Here are some of the most important features of Inference Endpoints:

  • Easy deployment: Deploy models as production-ready APIs with just a few clicks, eliminating the need to handle infrastructure or MLOps.
  • Cost efficiency: Benefit from automatic scale-to-zero capability, reducing costs by scaling down the infrastructure when the endpoint is not in use, while paying based on the uptime of the endpoint, ensuring cost-effectiveness.
  • Enterprise security: Inference Endpoints prioritizes security and enables secure model deployment. It deploys models in secure offline endpoints accessible only through direct virtual private cloud (VPC) connections, backed by SOC2 Type 2 certification, and offering BAA and GDPR data processing agreements for enhanced data security and compliance. Customers can select from Public, Protected, and Private endpoints based on their security needs. An overview of security measures is provided in the Inference Endpoints Security & Compliance documentation.
  • LLM optimization: Optimized for large language models (LLMs) enables high throughput with Paged Attention and low latency through custom transformers code and Flash Attention powered by Text Generation Inference.
  • Comprehensive task support: Out-of-the-box support for Transformers, Sentence Transformers, and Diffusers tasks and models, and easy customization to enable advanced tasks like speaker diarization or any machine learning task and library.

Below are a few steps to deploy models for production:

Select Your Model

Select the model you want to deploy. You can deploy a custom model or any of the 300,000+ Transformers, Diffusers, or Sentence Transformers models available on the Hugging Face Hub for natural language processing (NLP), computer vision (CV), or speech tasks.

Instance Configuration

Select an AWS region close to your data in compliance with your requirements (such as Europe, North America, or Asia Pacific) and also select CPU/GPU instance types from the list.

Figure 1 – Instance configuration of Inference Endpoints.

Select Your Security Level

  • Protected Endpoints are accessible from the internet and require valid authentication.
  • Public Endpoints are accessible from the internet and do not require authentication.
  • Private Endpoints are only available through an intra-region secured AWS PrivateLink direct connection to a VPC and are not accessible from the internet.

Create and Manage Your Endpoint

Click Create and your new endpoint is ready in a couple of minutes. You can easily define auto scaling, access logs and monitoring, set custom metrics routes, manage endpoints programmatically with API and command line interface (CLI), and rollback models.

Figure 2 – Deployed Inference Endpoint example.

Hugging Face Spaces

Hugging Face Spaces offer a simple way to host machine learning demo apps directly on your profile or your organization’s profile. This allows you to create your ML portfolio, showcase your projects at conferences or to stakeholders, and work collaboratively with other people in the ML ecosystem.

Hugging Face Spaces support a variety of frameworks to build your apps, the most popular being the Python frameworks Gradio and Streamlit. But you can use your favorite technologies and frameworks with the Docker SDK.

Figure 3 – List of machine learning apps in Hugging Face Spaces.

To make a new Space, visit the Spaces main page and click on Create New Space. Along with choosing a name for your Space, selecting an optional license, and setting your Space’s visibility, you’ll be prompted to choose the software development kit (SDK) for your Space.

The Hugging Face Hub offers four SDK options: Gradio, Streamlit, Docker, and static HTML. You can start to build and deploy a simple ML application following the guidance on Hugging Face Spaces Overview.


Hugging Face AutoTrain (also known as AutoTrain Advanced) is a no-code tool for training state-of-the-art models for NLP tasks, for computer vision tasks, and even for Tabular tasks. It is built on top of the awesome tools developed by the Hugging Face team, and it’s designed to be easy to use.

Getting started with AutoTrain is easy; all you need is a Hugging Face account. Grab your Hugging Face write token from and create a new Docker-based space with the AutoTrain template.

Figure 4 – Creating a Hugging Face Space with the AutoTrain template.

Remember to keep your space’s visibility as _private_.

Once you have created the space, you can choose from a variety of tasks, upload the data in proper format, choose appropriate hardware, choose hyperparameters, and start training.

Figure 5 – Creating an AutoTrain task.

Using AutoTrain (Advanced), you can train multiple jobs (different hyperparameter combinations) simultaneously, thus saving time and money. All the trainings run in Hugging Face Spaces so you can monitor trainings. The spaces shut themselves down automatically once the training is over.

Once the models are trained, they’re saved to your Hugging Face account as private repositories and you are free to do whatever you want with them. The trained models are also compatible with other Hugging Face services like Inference Endpoints, so for example you can train a chat model using AutoTrain and deploy it using Inference Endpoints, without writing a single line of code.


Getting started with generative AI using the Hugging Face Platform on AWS opens a world of possibilities for businesses of all sizes. Throughout this post, we’ve explored the premium features that make this integration accessible and valuable for customers.

With Hugging Face tools like Inference Endpoints, Spaces, and AutoTrain, even organizations with limited resources can swiftly and effectively implement generative AI into their solutions. These features empower businesses to innovate, streamline processes, and stay competitive in an ever-evolving landscape.

The Hugging Face Platform, coupled with the AWS ecosystem, represents a dynamic synergy that help usher in a future where cutting-edge AI technologies are within reach for everyone.

Learn more about Hugging Face in AWS Marketplace.


Hugging Face – AWS Partner Spotlight

Hugging Face is an AWS Specialization Partner whose mission is to democratize machine learning through open source, open science, and Hugging Face products and services.

Contact Hugging Face | Partner Overview | AWS Marketplace