4. Compare 2 videos with LPIPS, SSIM, PSNR, UVQ and VMAF metrics

The classic metrics PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) are implemented in C for improved performance.

[1]:

import pathlib
import subprocess
import tempfile

import matplotlib.pyplot as plt

import cutcutcodec

4.1. Preparing videos to be compared

Here we’ll compare the original video with a noisy version.

[2]:

from cutcutcodec.utils import get_project_root
ref_file = get_project_root().parent / "media" / "video" / "intro.webm"
dis_file = pathlib.Path(tempfile.gettempdir()) / "distorded.mp4"

converter = subprocess.run(["ffmpeg", "-y", "-i", ref_file, dis_file], check=True, capture_output=True)  # transcode the video

4.2. Simple comparison of 2 videos or images

This function takes care of color spaces.
Image comparisons are always made in yuv, not rgb.
This function is optimized and parrallelized.

[3]:

metrics = cutcutcodec.compare(ref_file, dis_file, lpips_alex=True, psnr=True, ssim=True, uvq=True, vmaf=True)

/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
compare: 100%|██████████████████████████████████████████| 294/294 [01:02<00:00,  4.67img/s]
uvq: 100%|████████████████████████████████████████████████| 50/50 [00:34<00:00,  1.43img/s]

[4]:

plt.scatter(range(len(metrics["lpips_alex"])), metrics["lpips_alex"])
plt.xlabel("frames")
plt.ylabel("lpips")
plt.show()

plt.scatter(range(len(metrics["psnr"])), metrics["psnr"])
plt.xlabel("frames")
plt.ylabel("psnr (db)")
plt.show()

plt.scatter(range(len(metrics["ssim"])), metrics["ssim"])
plt.xlabel("frames")
plt.ylabel("ssim")
plt.show()

plt.scatter(range(len(metrics["uvq"])), metrics["uvq"])
plt.xlabel("frames 200ms")
plt.ylabel("uvq")
plt.show()

plt.scatter(range(len(metrics["vmaf"])), metrics["vmaf"])
plt.xlabel("frames")
plt.ylabel("vmaf")
plt.show()

../../../_images/build_examples_basic_metrics_6_0.png

../../../_images/build_examples_basic_metrics_6_1.png

../../../_images/build_examples_basic_metrics_6_2.png

../../../_images/build_examples_basic_metrics_6_3.png

../../../_images/build_examples_basic_metrics_6_4.png

4.3. Compare frames manually for greater control

This makes it possible to specify the hyperparameters specific to each metric.

[5]:

ref_frame = cutcutcodec.read(ref_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
ref_frame_numpy = ref_frame.numpy(force=True)
ref_frame.requires_grad = True
dis_frame = cutcutcodec.read(dis_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
dis_frame_numpy = dis_frame.numpy(force=True)
dis_frame.requires_grad = True

4.3.1. Compare learned perceptual image patch similarity (LPIPS)

This function used the pytorch lpips module in backend.
This function is torch differentiable.

[6]:

print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="alex"))
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="vgg"))
print(cutcutcodec.lpips(ref_frame, dis_frame))

/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

0.020773122

/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

0.074869946
tensor(0.0208, grad_fn=<ViewBackward0>)

4.3.2. Compare peak signal to noise ratio (PSNR)

This function has a C implementation (for cpu and no grad).
This function is torch differentiable.

[7]:

print(cutcutcodec.psnr(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.psnr(ref_frame, dis_frame))

44.240646
tensor(44.2387, grad_fn=<ViewBackward0>)

4.3.3. Compare structural similarity index measure (SSIM)

This function has a C implementation (for cpu and no grad).
This function has a fft implementation (for stride = 1).
This function is torch differentiable.

[8]:

print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.ssim(ref_frame, dis_frame))

print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy, stride=2, sigma=1.2))

0.9906746
tensor(0.9907, grad_fn=<ViewBackward0>)
0.99125826

4.3.4. Compute the unreferenced Universal Video Quality Model (UVQ)

It is the metric used by Youtube
It doesn’t care the image resolution
It uses only one frame every 200ms, on slices of 1 second.

[9]:

print(cutcutcodec.uvq([ref_frame.tolist()]*5))
print(cutcutcodec.uvq([dis_frame.tolist()]*5))

3.103020429611206
2.9377589225769043

4.3.5. Compare Netflix Video Multi-Method Assessment Fusion (VMAF)

This function only allows safe interfacing with the vmaf package. So you need to install vmaf.

[10]:

print(cutcutcodec.vmaf(ref_frame_numpy, dis_frame_numpy))

41.258125