Whole Body GANs for Full Body Model Rendering in Fashion and Design

The main goal of this research is to create a Generative Adversarial Network model that would be capable of creating realistic full body images where the user could tweak everything be it cloth material, design, colour or the model altogether.

Background
Let’s give you a sneak peek of how the black box named GAN works.
A generative adversarial network (GAN) is a machine learning (ML) model in which two neural networks compete with each other to become more accurate in their predictions. The two neural networks that make up a GAN are referred to as the generator and the discriminator. The generator can generate new data instances and the discriminator can discriminate between different kinds of data instances. The generator learns to generate plausible data. The generated instances become negative training examples for the discriminator. The discriminator learns to distinguish the generator's fake data from real data. The discriminator penalizes the generator for producing implausible results.
Here’s a quick documentation from Google explaining how GANs work.

Major Challenges
Our Fellowship Team had two major challenges in this track preparing a whole body human dataset and Creating a model without a prior codebase. Let’s talk more about them in detail. We need to prepare a whole body human dataset because we don’t have any pre trained model available on Whole Body Images.
From our past experience with GANs, Stylegan 2 ADA tells us that we need only few good quality images (5k images should suffice). The images should be front orientation full body images that would not miss any key feature like face, bottom half portion, etc.
The second obstacle was a major one as there were little to none implementation of GANs for whole body images. Luckily our team found a ‘Stylegan Human’ implementation which was trained on whole body images and additionally it has implementation of InsetGAN.

InsetGAN
While GANs can produce photo-realistic images in ideal conditions for certain domains, the generation of full-body human images remains difficult due to the diversity of identities, hairstyles, clothing, and the variance in pose. Instead of modelling this complex domain with a single GAN, multiple pre-trained GANs can be combined, where one GAN generates a global canvas (e.g., human body) and a set of specialized GANs, or insets, focus on different parts (e.g., faces, shoes) that can be seamlessly inserted onto the global canvas. The problem is modelled as jointly exploring the respective latent spaces such that the generated images can be combined, by inserting the parts rom the specialized generators onto the global canvas, without introducing seams.

Treasure Found
The notebook that was discovered had these 4 major implementations that could be helpful for us as mentioned:
1. Generating Images using Seeds – This could be used to generate random whole body images.

2. Style mixing using generated images – This can be used to transform one image into another

3. Image editing using generated images – This can help in changing the dress type of the same image

4. Joint Optimisation of InsetGAN implementation – This can help in Face-Body Montage, Generating images with a fixed user face, etc.

Now because of these above mentioned implementations, our team could shift their focus towards actually generating whole body GAN instead of focusing on preparing the dataset.

Use your face
Our team was able to create a model that had the capability of taking the user’s face shot and morphing it over a very realistic fake body. The model was based on the 4^th implementation mentioned above named Face-Body Montage/ Face Refinement. Here, the user can upload an image of their face and the body would be based on a body seed (just a random number). Then the Joint Optimisation was used to seamlessly join the face and the body. Re-Style Encoder was used to achieve Face’s Latent Code.

As shown in the above videos, the issues with this implementation was that it could not handle mismatch in skin tone and there were some issues with alignment mismatches

‍
Improvised Approach

Our team’s solution to improve over the above model involved a little bit of improvisation. This time we used InsetGAN’s implementation of ‘Body Generation using Existing Face’. Now this implementation had these two major functionalities which were tweaked to our advantage as follows:
1. Dual Optimiser – We removed ‘loss body’ in the optimisation criteria and added a new parameter ‘Joint Optimisation’ which was set to ‘False’ by default. This allowed us to remove the background from the face portion of the image (Generally uploaded by the user) and match it with perfect body block.

2. Face Inversion to get Latent Code – Using our previous implementation of Re-Style Encoder.

The intuition behind this approach was that we do not need to use joint optimization because it aims to get a better human face. Since the user themselves are uploading the images of faces then there’s no need to improve face images or generate them altogether.

The clear advantages of this approach are:

It’s quite faster from the previous one (3 min to sub-1 min)
Seamless join of face and body
Skin tone match

The issues with this approach:

The images generated are not photo realistic
Need a body seed value (An initial point for image generation)

Proposed Solution:

Defining loss_skintone and use it for optimization instead of just removing total loss_body
Asking user to upload the whole body image and use that for the initial image.

Generated Results:

‍

‍Dress change for User Photo (Use Case):
Here’s a scenario for a clothing store.
User comes up to the store and tries a virtual try-on. First step would be to upload a selfie of their face. Then 3-4 random options for body would be generated based on their input. User can choose the body type that resembles them the most and set it as default image. The default image will be stored in their profile for future use. Now, the selected image can be used to do dress change using Style Mixing. Later any time the user wants to do another virtual try-on, they can just directly dive into it. Uploading their selfie or choosing the body type closet to them would not be needed.

Our team worked on creating the following pipeline for the above mentioned use-case:

Face Inversion of User Photo
User will upload their selfie (face photo) and that will be used to generate a face latent vector using Face Inversion.
Whole body Image Generation using Face Latent Code
This was achieved in the previous (improvised) approach where we’re using the face latent vector to generate a similar (scale/skin-tone) body image.
Image inversion of both face and body
Generate the latent vectors for both face and body which will be used in the later part.
Style mixing to change dress style
Using above generated latent vectors to create a base latent vector which will be tweaked to change the dress style.

‍

The 1 and 2 point have already been achieved by the team. Let’s move on to point 3.

For Whole Body Image Inversion, our team has used PTI Inversion from StyleGAN – Human. The key challenge here was to adapt the PTI inversion to faces as it was implemented only for Body inversion by StyleGAN – Human. So our team curated their own PTI inversion for Faces and replaced the Re-style encoder Face inversion.
Another challenge with the newly curated PTI inversion was that it was too slow. It includes two steps – Projection and Optimisation. It takes about 5-7 min to achieve all three inversions at a stretch. (Face inversions and two whole body inversions). Our team tried using e4e pre-trained weights to avoid projection. That improved the performance a little (4-5 min per image) but it will not improve anymore as face inversion does not use any pre-trained weights.
A proposal from our team is to use Re-Style Encoding and remove PTI Inversion for face inversion.

‍

Style Mixing (Point 4)

Here we tried to achieve a new image with certain qualities of given image (user image) and the reference image (model image) by morphing the needed part of Latent code of reference image into user image latent code.
Refer this blog .‍

We do not need to copy everything and need selective tweaks from model’s image to user’s image.
Here’s a list of attributes we would like to keep from their respective origin images:

Model Image	User Image
Pose Details	Face Size
Dress Color	Face Color
Dress Type	Face Details
Shoe Details	Hair Style

Our team was able to achieve pose details and dress type using the Stylemixing from StyleGAN-Human but weren’t able to achieve dress colour.
We have a few proposals that could we may try in future.

Implementation of Whole Body – Restyle Encoder for inversion. This will reduce the count of generators used from three to one.
Experiment with TryOnGan. A virtual try-on room implementation.

Some more future works that are proposed by our fellowship team:

Exploring Image Editing capabilities of Style GAN Human for the real world applications
Shoe change using Style Mixing
Restyle Encoder for Whole body inversion
Full-fledged bug-free APP Development

‍