ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) is software that tries to make an image larger by hallucinating additional detail with a neural network.

So one question is whether ESRGAN can be used to upscale movies, for example to make a DVD-quality image look like HD.

To examine this I decided to experiment with some films that I have on both DVD and Blu Ray. But what to choose? I have several BD+DVD combo packs but in that case they’re probably both downscaled from the same source and I can rig that sort of test from any HD image. A harder test would be one where the DVD is clearly from a different/inferior mastering. I’ve got several of those as well, and quickly narrowed it down to only the most worthy masterpieces of modern cinema:

breakin breakin2

So let’s just take a frame and see what happens. ESRGAN is trained to scale by 4x and plugging a DVD frame into it actually makes something close to a 4K image, so I’ve shrunk the result back down to around the same size as the 1080p BD. Here’s a crop of most of the frame, first the DVD frame scaled up by ESRGAN:

ESRGAN frame cropped

And now the same frame from the Blu-Ray:

HD frame cropped

So yeah, no contest – even ESRGAN can’t pull enough of a magic trick to make this DVD look like a proper HD image. Here’s some zoom-ins of a few spots. Each strip has the DVD image scaled with a more traditional cubic algorithm on the left, ESRGAN’s version in the middle, and then the actual HD on the right.

belt

Here we can easily see that Ozone’s studdy belts are missing a lot of detail and the metal bits just blur together. ESRGAN gives sharper edges but it also amplifies the compression artifacts at the bottom.

hair

Poor Kelly’s hair and face is still just a rosy-cheeked blur, and she doesn’t even seem to be looking in the same direction.

hand

One thing about movies is that the frames tend to have a lot of motion blur. We can see here that the DVD compression on Ozone’s sweet hand moves was just terrible to begin with, and ESRGAN only made it moreso. Also everybody’s teeth are much better in the HD version.

clown

Um, that guy.

ESRGAN obviously doesn’t do well with noise or glitches – they just cause it to hallucinate even more of them. Note that I’m using the ESRGAN model directly. The software has a second low-noise model available and provides a way to use that to smooth out some of the excess noise that ESRGAN introduces, but where’s the fun in that?

With the DVD you sometimes get a bunch of compression artifacts in a grid pattern, and ESRGAN makes it worse. In a few spots on Ozone’s arm where the compression was particularly rough, ESRGAN added a bunch of waves (the green lines are just here to highlight the grid):

grid

So I guess the answer to the original question is very much “NO”.









With that out of the way, what else to try? Well, ESRGAN was trained with images scaled down from high-quality starting points, so let’s see what it does if you give it something more like that. Since ESRGAN scales by 4x, if we downscale a 1080p (1920x1080) image to 480x270 and then feed it through ESRGAN, we should get back a 1080p image that we can directly compare pixel-for-pixel.

So here we have a totally not chosen on purpose shot of Jean-Claude Van Damme cheering Kelly’s dance moves

NOTE: to save bandwidth, some frame images are scaled down and linked to full resolution versions.

dancing downscaled

Scale down, ENHANCE, and hey that doesn’t look too bad…

dancing enhanced

Oh.

Oh dear.

jcvd face

This zoom-in has the HD original, the 1/4 resolution image that was fed into the scaler, what you’d get with a more traditional cubic upscale, and finally what ESRGAN perpetrated.

There is so much going on here, such as what happened to this finger. Uh, exactly what sort of images was ESRGAN trained on again?

finger

Here we see JCVD’s fist get all Cronenberged:

fist

And while this guy is still smiling, now he looks like he just came from a fight:

smile

When subtle details are lost in the downscale it sometimes recovers something similar to the original, but this bike wheel now has crazy spokes and those Adidas socks didn’t survive:

bike-hd-quarter-cubic-esrgan

What’s really interesting is that the ESRGAN version sometimes seems to have more detail than the original, even though it started with a much lower-resolution image. Of course this extra detail is all fake but it “fits” with the image pretty well, such as this guy’s hair and the tree behind him:

hairtree-hd-quarter-cubic-esrgan

There are certainly spots in the image where it’s very impressive compared to the blurry cubic scaling, for example even some really fine lines such as brake cables and the band on Ozone’s hat came back. The buildings in the background also got a facelift but they don’t look obviously wrong, just a little different.

So it’s off to the club for a dance battle…

at-the-club-hd-960x540

at-the-club-quarter-esrgan-960x540

Most of that frame did okay and Ice-T even got spikier, though the text on his necklace got lost not to mention HIS FACE:

icet-hd-quarter-esrgan

At the auditions the panel is still impressed by TKO Crew’s performance:

impressing-the-judges-hd-960x540

impressing-the-judges-quarter-esrgan-960x540

However it seems that the snooty judges have been replaced by ravenous elder gods:

judges-hd-quarter-esrgan

Over at the hospital the surgeon still notices the lineup of dancing nurses:

dancing-doctor-hd-960x540

dancing-doctor-quarter-esrgan-960x540

But now he looks more like he’s just seen the Ark of the Covenant, and the patient’s heartrate has understandably become irregular:

doctor-face-hd-quarter-esrgan

Ozone’s rooftop dance gets grainier but doesn’t do too bad:

roof-dance-hd-960x540

roof-dance-quarter-esrgan-960x540

Except that down on the ground it looks like there’s been a hurricane:

buildings-hd-quarter-esrgan

Will the zoning board save the community center from the greedy developer?

zoning-board-hd-960x540

zoning-board-quarter-esrgan-960x540

meeting-faces-hd-quarter-esrgan

W̖͉̯̫̰͎͘ͅE̺̥ H̱̤̲̤̲̳A͍̩̮̘̮̖̥͠V͍̱E̺͚͎̫ S̺̥͉̪͔̻͡U̘̼̜̺̦̗̗C̨͈͓̝̦̥Ḫ͢ ̰S̞̣I̧̞͙̱̦̻͙ͅG̛̪̟̟H͍̗͕̯T̴̺͔̳̭̲͖S̶ ̧̫T̛O͉̦̬͚ͅ ̩͡S̬̞͔̱H̜͔̼͍̠͖O̜̮̘̬̙͜ͅW̯̱ ̡̪͍̱̥̼̹YO͍̤̮̳͇͔U͔͙͞









OKAY, BUT CAN WE GO DEEPER? What happens if I cut the resolution to only 1/8 scale (240x135) and give ESRGAN only a thumbnail’s worth of pixels to work with?

dont-waste-my-time-hd-960x540

dont-waste-my-time-eighth-esrgan

Obviously the faces are a lot worse. Even medium-sized faces such as Franco’s don’t fare well:

franco-hd-eighth-cubic-esrgan

And in the background, Ozone looks like he’s just stepped into a painting:

ozone-hd-eighth-cubic-esrgan

It’s still able to add realistic details in some spots, though, such as Turbo’s hair:

turbohair-hd-eighth-cubic-esrgan

And some of the bricks are really just amazing:

bricks-hd-eighth-cubic-esrgan

How about a dance lineup from the Miracles montage?

montage-twist-hd-960x540

montage-twist-eighth-esrgan

Lil’ Wizard’s shirt is still readable but the font has changed, and the Vans have had their checkerboards swapped out for stripes. When ESRGAN finds lines or creases it seems to add more MORE MORE of them, which heavily affects clothing and faces:

montage-twist-zoom-eighth-esrgan

At the garden party there’s more face rearranging, and James’s hand gets turned into a rough sketch. As usual it manages to do something reasonable with hair, though:

party-confrontation-hd-960x540

party-confrontation-eighth-esrgan

party-confrontation-zoom-eighth-esrgan

Over to the club to take down Electro Rock:

tko-crew-victory-stomp-hd-960x540

tko-crew-victory-stomp-eighth-esrgan

I don’t even…

tko-crew-victory-stomp-zoom-eighth-esrgan

Turbo’s ceiling dance doesn’t look too bad if you can avoid looking at his face:

ceiling-dance-hd-960x540

ceiling-dance-eighth-esrgan

ceiling-dance-zoom-eighth-esrgan

Monster faces aside, it really is pretty remarkable what it’s able to construct from a few pixels, such as here where it somehow recovers Ozone’s collar bars and round snaps:

wild-eyed-crop-hd

wild-eyed-crop-zoom-eighth

wild-eyed-crop-zoom-eighth-esrgan

Unlike some other scaling algorithms ESRGAN isn’t just looking at a few neighboring pixels. I originally tried scaling some cropped regions on their own, and it produced noticeably different results compared to giving it the entire image to work with.

Before I started this I was wondering if someone had done an ESRGAN plugin for a photo editor and didn’t find anything. Now that I’ve actually used it I can see why – to run it you need a lot of other software installed such as Python 3.x and a pile of libraries. The library that handles the neural network also comes in several variants and you have to pick one specific to your OS and GPU/graphics driver.

Even with all the training done ahead of time, applying the trained model to an image still involves a lot of computations. I originally tried to run it on a machine that doesn’t have a CUDA-capable driver, which meant doing all of the computations on the CPU. With 6 cores pegged at 100% it took over a minute for a really small test image, and trying to give it a DVD frame resulted in several minutes of churning followed by a crash. I switched over to a system with a 1080Ti GPU and installed the latest CUDA driver, and when using the graphics card to do the heavy lifting it only takes about a second to process a DVD frame.

Since it can create “interesting” distortions, that raises the question of what it does if you apply it again? What seems to happen is that the image gets noisier (no surprise there), and after several iterations it starts to acquire a cyan tint.

Here’s what happens when applying it repeatedly with a 4x reduction-enhancement cycle, so: start with 1920x1080, cubic downscale to 480x270, ESRGAN back up to 1920x1080, and so on. After 3 iterations things are already pretty bad, and by 5 the cyan is visible.

NOTE: the full images were 150MB+ so this just shows a cropped slice of them with lossy compression.

putting-on-the-show-crop-quarter-esrgan-00

putting-on-the-show-crop-quarter-esrgan-01

putting-on-the-show-crop-quarter-esrgan-02

putting-on-the-show-crop-quarter-esrgan-03

putting-on-the-show-crop-quarter-esrgan-04

putting-on-the-show-crop-quarter-esrgan-05

putting-on-the-show-crop-quarter-esrgan-06

putting-on-the-show-crop-quarter-esrgan-07

putting-on-the-show-crop-quarter-esrgan-08

putting-on-the-show-crop-quarter-esrgan-09

putting-on-the-show-crop-quarter-esrgan-10

putting-on-the-show-crop-quarter-esrgan-11

Another:

dinner-with-the-parents-crop-quarter-esrgan-00

dinner-with-the-parents-crop-quarter-esrgan-01

dinner-with-the-parents-crop-quarter-esrgan-02

dinner-with-the-parents-crop-quarter-esrgan-03

dinner-with-the-parents-crop-quarter-esrgan-04

dinner-with-the-parents-crop-quarter-esrgan-05

dinner-with-the-parents-crop-quarter-esrgan-06

dinner-with-the-parents-crop-quarter-esrgan-07

dinner-with-the-parents-crop-quarter-esrgan-08

dinner-with-the-parents-crop-quarter-esrgan-09

dinner-with-the-parents-crop-quarter-esrgan-10

dinner-with-the-parents-crop-quarter-esrgan-11

And here’s starting with an 8x reduction from the original and then doing 4x up/down scaling after that, so 1920x1080 cubic down to 240x135, ESRGAN up to 960x640, cubic to 240x135, etc. It’s about the same, but the noise is more obvious because of the lower overall resolution:

caress-me-crop-eighth-esrgan-00

caress-me-crop-eighth-esrgan-01

caress-me-crop-eighth-esrgan-02

caress-me-crop-eighth-esrgan-03

caress-me-crop-eighth-esrgan-04

caress-me-crop-eighth-esrgan-05

caress-me-crop-eighth-esrgan-06

caress-me-crop-eighth-esrgan-07

caress-me-crop-eighth-esrgan-08

caress-me-crop-eighth-esrgan-09

caress-me-crop-eighth-esrgan-10

caress-me-crop-eighth-esrgan-11

I tried running one for a few more iterations but it only changed a little more in each step, and the only obvious difference was that the shading and details from the original image kept getting a bit fainter. I suspect that after a while it just turns into uniform black/white/cyan noise, perhaps with the blobs and lines shifting around each time.

BTW a warning: searching for other sample ESRGAN images can turn out to be very not work-safe. It might not have been trained for porn, but there is apparently no shortage of dudes trying to use it to upscale their collections anyway.

blarglgll

I BELIEVEBLARGLGLGLLL IN THE BEAT!