AI and Image Data: From Fancy to Serious
Recently, machine learning models confidently stepped into a new phase of their evolution. Originally, they emerged as classifiers and predictors, but now they can generate completely new data on their own.
That was made possible through extending unsupervised learning into generative modeling which can be applied to visual data.
In this article, we will explain what kind of artificial mind drives image generation and what are the implications, from popular culture to life-saving industries.
What Are Generative Adversarial Networks?
Generative Adversarial Networks (GANs) have proven to be one of the most efficient image processors.
GANs are neural networks that work in pairs.
Each pair has a generator and a discriminator. The generator consumes the training data and then generates new data that should possess the same qualities as the original dataset and would pass as natural for humans. For instance, if the generator was fed with portraits of cats, it should generate a completely new, naturally-looking photograph of a cat.
However, it is not a human who evaluates the naturalness of the generated data. This is the task of the second network: the discriminator. The discriminator consumes both the training data as well as the output from the generator. It either approves or rejects the generator’s output as fake. The two networks are engaged in an evolutionary race known from nature where a predator develops abilities for better hunting whereas his victim develops better escaping abilities.
Once the discriminator marks the input as fake and it is fake, the generator updates its generating algorithm to fake the data more efficiently. The goal of the training is to make the generator perfect in faking.
But faking does not mean harm in this case. It helps to fill gaps in visual data for good.
GAN’s (Pop) Cultural Applications
Nowadays, you do not have to be a part of the AI-industry to get in touch with image processing technologies.
Remember the cat? Such images are called deepfakes. Deepfakes are used, for example, in the movie industry. In Star Wars, filmmakers replaced the already dead actress Carry Fisher with her deepfake to insert a scene with Princess Leia in one of the spin-offs of the saga.
The mobile and Internet industry booms with applications that allow you to make your selfie Van-Gogh-style looking by using the style transfer technique. Scientists or simply curious people colorize black-and-white images to reconstruct history and memories of their families or important events from the past.
GANs even took another step towards a fully autonomous mind by enabling AIs to create images based on written instructions. The opposite works as well: give a GAN a picture and it will tell you in a natural language who or what is depicted on it.
Let’s take a deep dive into how it all works.
How GANs Work For Critical Industries
Domain Adaptation: Medicine and Pharma
We mentioned image-to-text or text-to-imagine translation in the previous part. Moreover, image-to-image translation is also possible. Style transfer for entertainment purposes is only one part of it.
In a critical industry, such as medical care, image-to-image translation can be used to facilitate research and actual treatment of patients. For instance, analyzing tissue allows to track the disease progress and define the most efficient cure. During an analysis, a tissue is stained with a reagent. However, different reagents exist for this purpose. It would take a lot of time and effort to collect real images of all possible variations of the disease progress and tissues stained with all available reagents.
Instead, image-to-image translation allows to generate such images with a high level of accuracy. At AI Superior, we were proud to obtain such an experience by doing stain transfer: processing of cancer tissue images, performed for a pharmaceutical company. The biggest challenge of this project was the absence of any paired images that would demonstrate how the same tissue looks like after being processed by different reagents. Nonetheless, the AI Superior team leveraged state-of-the-art technologies to overcome this issue and created a GAN that managed to generate realistic images of another domain preserving critical characteristics for further analysis.
Domain Adaptation: Geospatial Analytics
Another whole domain of our services that involves GANs is geospatial analytics. For entities that are interested in outdoor inspections, including development companies, national park operators, city municipalities, oil and gas companies, etc., we help to translate synthetic aperture radar data provided from satellites into a human-friendly visual data, both greyscale or color. Our customers see a natural or industrial landscape exactly as it exists on the ground. This technology allows for faster decision-making by using map views that require zero interpretation. Moreover, geospatial data can be overlayed with non-spatial data or tagged. Tags classify physical objects enabling you to spot discrepancies quickly, such as an object that does not belong to the area reducing the manual effort and cutting down the response time.
This works perfectly for both commercially used facilities and non-commercial ones preventing the physical damage from spreading or from generating a negative impact.
Image Processing and Enhancement
Apart from these two quite specific domains, various cross-domain applications of GANs are possible for different industries.
Basically, GANs allow the manipulation of images based on the GAN’s trained “understanding” of what a perfect or proper image should be.
For example, incomplete images can get their missing parts back, or damaged parts of an image — quite often, of an old photograph or artwork— can be reinstated. This technique is called image inpainting and addresses needs of many scientists, restorers, private persons, and anyone interested in enhancing digital images, pure digital ones or with a physical artifact behind it.
Next, GANs bring techniques known from Photoshop and similar software to a new level. They allow to make blurry pictures more sharp. GANs can increase an image’s resolution artificially by adding small, not traceable for a human eye, portions of the image by borrowing them from a similar image they have previously learned.
GANs can not only add to an image, but also remove and replace image parts with what is more appropriate. They can erase dirt patches from digital copies of old photographs or, more importantly, conduct denoising of images. Crime investigators are definitely the ones who can profit from this technique.
Synthetic Data Generation
By being able to generate visual data, generative adversarial networks help other neural networks to learn and their creators to get inexpensive and realistic datasets for them. A brightest example is image recognition algorithms embedded into self-driving cars that need a lot of visuals for learning to navigate safely among the vast amount of real-life objects, living and artificial.
Synthetic visual data may be helpful in anonymizing real data. We all know the problem with Google Street View. It is a great thing when you want to know a distant place better but an evil if your own face is caught by it. Currently, Google blurs faces and license plates but future-looking it would be great to replace real faces with the deepfakes: non-existing persons who would not care about their privacy.
Interestingly enough, one of the application domains for synthetic data used to be the strongest outpost of human creativity: fashion. However, for whatever reason, fashion seems to willingly adopt AI algorithms. But less for generation of completely new ideas. Similarly to video game creators, fashion designers appeal to AI for imitating the natural behaviour of clothing on a human body. That allows to predict the popularity of new clothes as not only their look counts for customers but also how the fabric sits on their bodies.
Data Encoding and Decoding
Nvidia came up with a nice enhancement for users of video conferencing. You probably know how strange it feels not to meet the eyes of your counterpart during a video conference for the simple reason that our webcams are never placed in the middle of the screen, at the point we naturally try to look at. Nvidia developed a technique that allows to transform your camera footage on the other side of the call and make you look into the eyes of your conversation partner.
This not only means a better and more natural human communication but a lesser burden for the communication equipment. Instead of transmitting the entire footage throughout the call, your image is only transmitted once. Then, only a few reference points on your face are tracked to capture your mimic. Your facial expression is then reconstructed on the other end of the call. This method reduces the amount of payload transmitted back and forth which previously consumed a huge bandwidth and computing capacities.
At AI Superior, we are constantly watching the latest trends in machine learning. Our customers provide us with the greatest examples of how AI and image processing are reshaping our lives, making this world a safer place to live. We are ready to share our expertise in applying GANs for medical care and research as well as geospatial analytics and other industries.