AI scaling and the future of content delivery

The rumor mill has recently been buzzing about Nintendo’s plans to launch a new version of their hugely popular Switch console in time for the holidays. A faster CPU, more RAM, and an improved OLED screen are all taken for granted, as you would expect for a middle-generation refresh. Those upgraded specs are also likely to get an inflated price, but given the incredible demand for the current switch, it’s likely that a bump of $ 50 or even $ 100 will not discourage many prospective buyers.

But according to a report by Bloomberg, the new Switch may be a little more up and running than you would expect from the technologically conservative Nintendo. According to their sources, the new system will use an NVIDIA chipset capable of using Deep Learning Super Sampling (DLSS), a feature currently only available on high-end GeForce RTX 20 and GeForce RTX 30s. series of GPUs. The technology, which has been used by several notable computer games over the past few years, uses machine learning to upgrade versions in real time. Instead of giving the GPU the task of producing an original 4K image, the engine can deliver the game at a lower resolution and DLSS makes the difference.

The current Nintendo Switch model

The implications of this technology, especially on computing devices, are huge. For the Switch, which is used at the same time as a battery-portable computer when removed from its dock, the use of DLSS can allow it to produce footage similar to the much larger and more expensive Xbox and PlayStation systems with which he competes. If Nintendo and NVIDIA can prove that DLSS is viable on something as small as the Switch, we’ll probably see the technology come to future smartphones and tablets to make up for their relatively limited GPUs.

But why stop there? If artificial intelligence systems like DLSS can enlarge a video game, it goes without saying that the same techniques can be applied to other forms of content as well. Instead of saturating your internet connection with a 16K video stream, will TVs of the future simply make the most of what they have using a machine learning algorithm trained in popular shows and movies?

How low can you go?

Obviously, you do not need machine learning to resize an image. You can take a standard resolution video and scale it up to high definition easily enough, and indeed your TV or Blu-ray player does exactly that when watching older content. But it does not exactly require the eye to immediately see the difference between a DVD that has been blown to fit an HD screen and modern content that is actually produced with the resolution. Taking a 720 x 480 image and moving it up to 1920 x 1080, or even 3840 x 2160 in the case of 4K, will result in a fairly obvious deterioration of the image.

To address this fundamental problem, the AI-enhanced scale actually creates new visual data to fill in the gaps between the source and target resolutions. In the case of DLSS, NVIDIA trained their neural network by taking photos of the same game at low and high resolution and having their internal supercomputer analyze the differences. To maximize the results, the high-resolution images were rendered at a level of detail that would be computer-impractical or even impossible to achieve in real-time. In combination with motion vector data, the neural network is tasked with not only filling in the necessary visual information to make the low-resolution image better than the idealistic target, but to predict what the next animation frame might look like.

NVIDIA’s DLSS 2.0 architecture

Although less than 50 computer games supported the latest version of DLSS at the time of writing, the results so far have been extremely promising. The technology enables current computers to run longer and newer and more complex games, and for current titles the FPS version can be significantly improved. In other words, if you have a computer that is powerful enough to run a game at 30 FPS in 1920 x 1080, the same computer could possibly reach 60 FPS if the game is played at 1280 x 720 and magnified with DLSS.

There have been ample opportunities over the past year or two to measure DLSS’s performance improvements on supported titles, and YouTube has been filled with comparisons that show the technology is capable. In a particularly extreme test, 2kliksphilip 2019s ran Control and 2020s Death Stranding only used 427 x 240 and DLSS to scale it up to 1280 x 720. Although the results were not perfect, both games ended up looking much better than they had the right to consider being delivered in a resolution that we are more likely to associate with the Nintendo 64 than a modern gaming computer.

AI-enhanced entertainment

While it may be early days, it seems pretty clear that machine learning systems like Deep Learning Super Sampling offer a lot of promise for gaming. But the idea is not just limited to video games. There is also a lot of pressure on the use of similar algorithms to improve older movies and television programs for which no higher resolution version exists. Both proprietary and open source software are now available that utilize the computing power of modern GPUs to upgrade still images as well as video.

Of the open source instruments in this arena, the Video2X project is well known and under active development. This Python 3 framework uses the waifu2x and Anime4K upscalers, which, as you may have gathered from their names, are designed to work primarily with anime. The idea is that you can take an animated film or series that has always been released in standard definition, and by running it through a neural network specifically trained in visually similar content, it can bring it up to 1080 or even 4K resolution .

Although the software can get started, it can be difficult as the different GPU acceleration frameworks are available depending on your operating system and hardware platform, but this is something that anyone with a relatively modern computer can do on their own. As an example I took a 640 x 360 frame from Grootbokhasie and suspended to 1920 x 1080 using default settings on the waifu2x upscaler back in Video2X:

Compared to the original 1920 x 1080 image, we can see subtle differences. The shade of the rabbit’s fur is not quite as nuanced, the eyes have a certain luster and especially the grass has gone from individual blades to something that looks more like an oil paint. But would you really have noticed any of it if the two images were not next to each other?

Some meeting required

In the previous example, AI was able to increase the resolution of an image three times with negligible graphic artifacts. What is perhaps more impressive is that the file size of the 640 x 360 frame is only one-fifth of the original 1920 x 1080 frame. Extrapolating the difference to the length of a feature film, it’s clear how technology can have a huge impact on the enormous bandwidth and storage costs associated with streaming video.

Imagine a future, instead of streaming an ultra-high resolution movie from the internet, your device is given a video stream at 1/2 or even 1/3 of the target resolution, along with a neural network model trained on that particular piece of content. Your AI-enabled player can then take this ‘dehydrated’ video and scale it up in real time to the resolution appropriate for your screen. Instead of saturating your internet connection, it would be a bit like delivering pizzas Back to the future II.

The only technical challenge that stands in the way is the time it takes to perform this kind of scaling up: when using Video2X on even reasonably high hardware, a rendering speed of 1 or 2 FPS is considered fast. It would take an enormous amount of computing power to do real-time AI scaling, but the progress NVIDIA has made with DLSS is certainly encouraging. Of course, film lovers would argue that such a reproduction might not fit the director’s intention, but if people look at their phones for 30 minutes at a time while traveling to work, it is safe to say that the ship has already sailed .

Source