4. Compare 2 videos with LPIPS, SSIM, PSNR, UVQ and VMAF metrics¶
The classic metrics PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) are implemented in C for improved performance.
[1]:
import pathlib
import subprocess
import tempfile
import matplotlib.pyplot as plt
import cutcutcodec
4.1. Preparing videos to be compared¶
Here we’ll compare the original video with a noisy version.
[2]:
from cutcutcodec.utils import get_project_root
ref_file = get_project_root() / "media" / "video" / "intro.webm"
dis_file = pathlib.Path(tempfile.gettempdir()) / "distorded.mp4"
converter = subprocess.run(["ffmpeg", "-y", "-i", ref_file, dis_file], check=True, capture_output=True) # transcode the video
4.2. Simple comparison of 2 videos or images¶
This function takes care of color spaces.
Image comparisons are always made in yuv, not rgb.
This function is optimized and parrallelized.
[3]:
metrics = cutcutcodec.video_metrics(ref_file, dis_file, lpips_alex=True, psnr=True, ssim=True, uvq=True, vmaf=True)
Video metrics: 0sec_video [00:00, ?sec_video/s]/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
[4]:
plt.scatter(range(len(metrics["lpips_alex"])), metrics["lpips_alex"])
plt.xlabel("frames")
plt.ylabel("lpips")
plt.show()
plt.scatter(range(len(metrics["psnr"])), metrics["psnr"])
plt.xlabel("frames")
plt.ylabel("psnr (db)")
plt.show()
plt.scatter(range(len(metrics["ssim"])), metrics["ssim"])
plt.xlabel("frames")
plt.ylabel("ssim")
plt.show()
plt.scatter(range(len(metrics["uvq"])), metrics["uvq"])
plt.xlabel("frames 200ms")
plt.ylabel("uvq")
plt.show()
plt.scatter(range(len(metrics["vmaf"])), metrics["vmaf"])
plt.xlabel("frames")
plt.ylabel("vmaf")
plt.show()
4.3. Compare frames manually for greater control¶
This makes it possible to specify the hyperparameters specific to each metric.
[5]:
ref_frame = cutcutcodec.read(ref_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
ref_frame_numpy = ref_frame.numpy(force=True)
ref_frame.requires_grad = True
dis_frame = cutcutcodec.read(dis_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
dis_frame_numpy = dis_frame.numpy(force=True)
dis_frame.requires_grad = True
4.3.1. Compare learned perceptual image patch similarity (LPIPS)¶
This function used the pytorch
lpipsmodule in backend.This function is torch differentiable.
[6]:
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="alex"))
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="vgg"))
print(cutcutcodec.lpips(ref_frame, dis_frame))
0.019391261
/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/rrichard/.pyenv/versions/3.13.7/envs/3.13/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
0.07449738
tensor(0.0194, grad_fn=<ViewBackward0>)
4.3.2. Compare peak signal to noise ratio (PSNR)¶
This function has a C implementation (for cpu and no grad).
This function is torch differentiable.
[7]:
print(cutcutcodec.psnr(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.psnr(ref_frame, dis_frame))
44.20279
tensor(44.1956, grad_fn=<ViewBackward0>)
4.3.3. Compare structural similarity index measure (SSIM)¶
This function has a C implementation (for cpu and no grad).
This function has a fft implementation (for stride = 1).
This function is torch differentiable.
[8]:
print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.ssim(ref_frame, dis_frame))
print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy, stride=2, sigma=1.2))
0.9905906
tensor(0.9906, grad_fn=<ViewBackward0>)
0.99117327
4.3.4. Compute the unreferenced Universal Video Quality Model (UVQ)¶
It is the metric used by Youtube
It doesn’t care the image resolution
It uses only one frame every 200ms, on slices of 1 second.
[9]:
print(cutcutcodec.uvq([ref_frame.tolist()]*5))
print(cutcutcodec.uvq([dis_frame.tolist()]*5))
3.103020668029785
2.787476062774658
4.3.5. Compare Netflix Video Multi-Method Assessment Fusion (VMAF)¶
This function only allows safe interfacing with the vmaf package. So you need to install vmaf.
[10]:
print(cutcutcodec.vmaf(ref_frame_numpy, dis_frame_numpy))
42.950115