Leveraging AI for Seamless Hair Color Modifications
By: Bushra J., Khagendra A., Michael T., Mansuba T., Montia L., Luis V.
As a part of Fellowship.AI, we were tasked to create a user-friendly tool that performs seamless hair color transformations that maintain the integrity of the original picture's style and quality. By addressing this challenge, we seek to empower users to effortlessly enhance their images with hair root touch-ups while maintaining the image's high resolution and natural appearance in the final result.
In the world of digital imaging, the possibilities seem limitless, especially with the continuous advancements in Machine Learning (ML) and Artificial Intelligence (AI). Hair color transformation and manipulation are two of the most fascinating applications of these technologies. In this blog post, we will journey through the innovative world of AI-powered hair color manipulation, exploring the intricate pipeline and technologies behind this transformative process.
Data Collection
In machine learning-driven hair color transformation, the quality of our dataset serves as the bedrock upon which our models are built. Hugging Face served as our primary source for acquiring image data. Through their platform, we collected 66 images in a variety of colors (both dyed and natural), styles, and textures. This diverse selection ensured that our dataset captured the full spectrum of hair color possibilities, laying the groundwork for robust model training.
The journey begins with an image waiting to be transformed. The first step in the pipeline is segmenting the hair and creating a hair mask that specifies different regions based on their hair color. This hair segmentation mask is the baseline for precise color manipulation. The minority hair color mask allows us to target the specified areas for transformation.
First, we utilize a pre-trained hair segmentation model to isolate the hair pixels of front-facing smartphone pictures accurately. Following this, the Selfie MultiClass model effectively masks out the entire hair without omissions. Once the complete hair mask is generated, KMeans clustering is applied to isolate minority hair color from the majority, creating a distinct mask.
This lightweight model, developed by Google researchers in 2019 (version 2019-01-14), efficiently segments hair in selfies captured by smartphone cameras. It is optimized for mobile AR applications, prioritizing speed and accuracy for close-up shots typical of selfies. However, it may not reliably segment hair if the subject is too far from the camera (beyond 5 feet/1.5 meters), if there are multiple people in the image, or if the hairstyle involves thin or long pieces (like mohawks or elongated braids). Additionally, large occlusions like headwear may affect segmentation quality. While the model has been extensively trained and tested across various smartphone camera conditions, performance may vary in low-end devices, low light, or with motion blur.
We utilized two Vision Transformer Neural Network models via the Mediapipe image segmenter API to predict real-time segmentation masks for human subjects from images captured by the subjects. These models, designed with customized bottleneck and decoder architectures for real-time performance, classify each pixel, including background, hair, body, Face, clothes, etc. They support single and multiple people in the frame, selfies, and full-body images. By leveraging these models, we achieved more accurate results in generating whole hair masks from various input images.
Minority Hair Color Mask Generation using K-Means Clustering
We attempt to isolate the undyed hair roots from the rest of the hair using a K-Means clustering model. We cluster the hair pixels into two groups based on their RGB (Red, Blue, Green) values. This results in a smaller, minority color pixel cluster (i.e., the attempted extraction of undyed hair root pixels) and the remaining majority color pixel cluster. Finally, we create an image segmentation mask from pixels in the minority color region and feed them into the inpainting model as input. These pixels constitute the region that will be "touched up" by the AI model.
We utilize stable diffusion alongside control net techniques in our image editing process. Stable diffusion is a text-to-image generative AI model alongside inpainting that smooths change in color or intensity across an image, ensuring a seamless transition between different regions. Control nets, on the other hand, provide us with precise control over the adjustment process, allowing us to target specific areas—such as the minority hair color—in our image and modify them to match the majority color tone accurately. This stable diffusion and control net combination empowers us to achieve natural-looking results, seamlessly integrating adjustments for a visually pleasing outcome.
Stable Diffusion Realistic Vision Inpaint Model
This model is renowned for its ability to generate high-quality, realistic images. This model serves as a cornerstone in our color transformation journey. By utilizing the power of this model, we can transform the minority hair color and match it with the rest of the hair.
ControlNet Inpaint Model
The ControlNet Inpaint model further enhances the performance of the Stable Diffusion Inpaint Model by accurately matching the minority hair color with the rest of the hair. With its ability to fill in missing or altered regions of an image, this model ensures smooth transitions and natural-looking results, enhancing the overall realism of our transformed hair color.
ControlNet Canny Model
Incorporating edge detection techniques, the ControlNet Canny model adds an extra layer of refinement to our color transformation process. By identifying and preserving the intricate contours and edges of the hair, this model enhances the fidelity of our transformations, resulting in polished and professional-looking outcomes. The ControlNet Canny Models preserves the hairstyle and controls the transformation by only changing the hair color.
Results
The following picture shows the results of Hair Root Touch-up through the pipeline, using the Stable Diffusion Model "Uminosachi/realisticVisionV51_v51VAE-inpainting" and two units of ControlNet "lllyasviel/control_v11p_sd15_inpaint" and "lllyasviel/control_v11p_sd15_canny."
Loss Function
We created a loss function in the service of future, more focused experimentation efforts.
To compute it, we mask the minority and (its complementary) majority color hair regions of an output image fed through the pipeline and converted to grayscale. We then compute the Wasserstein Distance, also known as the Earth Mover's Distance, between the probability distributions of each region's pixel intensity values. This metric expresses the difference between the two distributions as the amount of "work" it would take to convert one distribution to another under an "optimal transport" scheme.
It represents how much the touched-up minority color pixels deviate from the "destination" majority color pixels concerning color and texture. We want the hair in a touched-up image to have a single dyed color while preserving its original "naturalness" (i.e., minor variations of RGB values/grayscale intensities between nearby pixels) as much as possible.
Future Improvements:
- Collecting higher-resolution images
- Training a LoRA
- Leveraging the loss function to guide prompt engineering and hyperparameter tuning
Conclusion
As we conclude our journey through AI-powered hair color manipulation, we reflect on the remarkable strides made possible by modern technology. Each step in our pipeline exemplifies the fusion of artistry and innovation, from the intricate segmentation of hair to the nuanced color transformations achieved through advanced AI models. As we continue pushing the boundaries of digital imaging, the possibilities for creative expression are limitless. Join us as we explore new frontiers in hairstyling, guided by the transformative power of AI and machine learning.