4. Compare 2 videos with LPIPS, SSIM, PSNR, UVQ and VMAF metrics

The classic metrics PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) are implemented in C for improved performance.

[1]:
import pathlib
import subprocess
import tempfile

import matplotlib.pyplot as plt

import cutcutcodec

4.1. Preparing videos to be compared

Here we’ll compare the original video with a noisy version.

[2]:
from cutcutcodec.utils import get_project_root

ref_file = get_project_root() / "media" / "video" / "intro.webm"
dis_file = pathlib.Path(tempfile.gettempdir()) / "distorded.mp4"

converter = subprocess.run(["ffmpeg", "-y", "-i", ref_file, dis_file], check=True, capture_output=True)  # transcode the video

4.2. Simple comparison of 2 videos or images

  • This function takes care of color spaces.

  • Image comparisons are always made in yuv, not rgb.

  • This function is optimized and parrallelized.

[3]:
metrics = cutcutcodec.video_metrics(ref_file, dis_file, lpips_alex=True, psnr=True, ssim=True, uvq=True, vmaf=True)
Video metrics: 0sec_video [00:00, ?sec_video/s]/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

[4]:
plt.scatter(range(len(metrics["lpips_alex"])), metrics["lpips_alex"])
plt.xlabel("frames")
plt.ylabel("lpips")
plt.show()

plt.scatter(range(len(metrics["psnr"])), metrics["psnr"])
plt.xlabel("frames")
plt.ylabel("psnr (db)")
plt.show()

plt.scatter(range(len(metrics["ssim"])), metrics["ssim"])
plt.xlabel("frames")
plt.ylabel("ssim")
plt.show()

plt.scatter(range(len(metrics["uvq"])), metrics["uvq"])
plt.xlabel("frames 200ms")
plt.ylabel("uvq")
plt.show()

plt.scatter(range(len(metrics["vmaf"])), metrics["vmaf"])
plt.xlabel("frames")
plt.ylabel("vmaf")
plt.show()
../../../_images/build_examples_basic_metrics_6_0.png
../../../_images/build_examples_basic_metrics_6_1.png
../../../_images/build_examples_basic_metrics_6_2.png
../../../_images/build_examples_basic_metrics_6_3.png
../../../_images/build_examples_basic_metrics_6_4.png

4.3. Compare frames manually for greater control

This makes it possible to specify the hyperparameters specific to each metric.

[5]:
ref_frame = cutcutcodec.read(ref_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
ref_frame_numpy = ref_frame.numpy(force=True)
ref_frame.requires_grad = True
dis_frame = cutcutcodec.read(dis_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
dis_frame_numpy = dis_frame.numpy(force=True)
dis_frame.requires_grad = True

4.3.1. Compare learned perceptual image patch similarity (LPIPS)

  • This function used the pytorch lpips module in backend.

  • This function is torch differentiable.

[6]:
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="alex"))
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="vgg"))
print(cutcutcodec.lpips(ref_frame, dis_frame))
0.019391261
/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
0.07449738
tensor(0.0194, grad_fn=<ViewBackward0>)

4.3.2. Compare peak signal to noise ratio (PSNR)

  • This function has a C implementation (for cpu and no grad).

  • This function is torch differentiable.

[7]:
print(cutcutcodec.psnr(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.psnr(ref_frame, dis_frame))
44.20279
tensor(44.1956, grad_fn=<ViewBackward0>)

4.3.3. Compare structural similarity index measure (SSIM)

  • This function has a C implementation (for cpu and no grad).

  • This function has a fft implementation (for stride = 1).

  • This function is torch differentiable.

[8]:
print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.ssim(ref_frame, dis_frame))

print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy, stride=2, sigma=1.2))
0.9905906
tensor(0.9906, grad_fn=<ViewBackward0>)
0.99117327

4.3.4. Compute the unreferenced Universal Video Quality Model (UVQ)

  • It is the metric used by Youtube

  • It doesn’t care the image resolution

  • It uses only one frame every 200ms, on slices of 1 second.

[9]:
print(cutcutcodec.uvq([ref_frame.tolist()]*5))
print(cutcutcodec.uvq([dis_frame.tolist()]*5))
3.103020668029785
2.787476062774658

4.3.5. Compare Netflix Video Multi-Method Assessment Fusion (VMAF)

This function only allows safe interfacing with the vmaf package. So you need to install vmaf.

[10]:
print(cutcutcodec.vmaf(ref_frame_numpy, dis_frame_numpy))
42.950115