4. Compare 2 videos with LPIPS, SSIM, PSNR, UVQ and VMAF metrics
The classic metrics PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) are implemented in C for improved performance.
[1]:
import pathlib
import subprocess
import tempfile
import matplotlib.pyplot as plt
import cutcutcodec
4.1. Preparing videos to be compared
Here we’ll compare the original video with a noisy version.
[2]:
from cutcutcodec.utils import get_project_root
ref_file = get_project_root().parent / "media" / "video" / "intro.webm"
dis_file = pathlib.Path(tempfile.gettempdir()) / "distorded.mp4"
converter = subprocess.run(["ffmpeg", "-y", "-i", ref_file, dis_file], check=True, capture_output=True) # transcode the video
4.2. Simple comparison of 2 videos or images
This function takes care of color spaces.
Image comparisons are always made in yuv, not rgb.
This function is optimized and parrallelized.
[3]:
metrics = cutcutcodec.compare(ref_file, dis_file, lpips_alex=True, psnr=True, ssim=True, uvq=True, vmaf=True)
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
compare: 100%|██████████████████████████████████████████| 294/294 [01:02<00:00, 4.67img/s]
uvq: 100%|████████████████████████████████████████████████| 50/50 [00:34<00:00, 1.43img/s]
[4]:
plt.scatter(range(len(metrics["lpips_alex"])), metrics["lpips_alex"])
plt.xlabel("frames")
plt.ylabel("lpips")
plt.show()
plt.scatter(range(len(metrics["psnr"])), metrics["psnr"])
plt.xlabel("frames")
plt.ylabel("psnr (db)")
plt.show()
plt.scatter(range(len(metrics["ssim"])), metrics["ssim"])
plt.xlabel("frames")
plt.ylabel("ssim")
plt.show()
plt.scatter(range(len(metrics["uvq"])), metrics["uvq"])
plt.xlabel("frames 200ms")
plt.ylabel("uvq")
plt.show()
plt.scatter(range(len(metrics["vmaf"])), metrics["vmaf"])
plt.xlabel("frames")
plt.ylabel("vmaf")
plt.show()
4.3. Compare frames manually for greater control
This makes it possible to specify the hyperparameters specific to each metric.
[5]:
ref_frame = cutcutcodec.read(ref_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
ref_frame_numpy = ref_frame.numpy(force=True)
ref_frame.requires_grad = True
dis_frame = cutcutcodec.read(dis_file).out_select("video")[0].snapshot(5.0, (1080, 1920))
dis_frame_numpy = dis_frame.numpy(force=True)
dis_frame.requires_grad = True
4.3.1. Compare learned perceptual image patch similarity (LPIPS)
This function used the pytorch
lpipsmodule in backend.This function is torch differentiable.
[6]:
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="alex"))
print(cutcutcodec.lpips(ref_frame_numpy, dis_frame_numpy, net="vgg"))
print(cutcutcodec.lpips(ref_frame, dis_frame))
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
0.020773122
/home/rrichard/.pyenv/versions/3.13.2/envs/cutcutcodec/lib/python3.13/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
0.074869946
tensor(0.0208, grad_fn=<ViewBackward0>)
4.3.2. Compare peak signal to noise ratio (PSNR)
This function has a C implementation (for cpu and no grad).
This function is torch differentiable.
[7]:
print(cutcutcodec.psnr(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.psnr(ref_frame, dis_frame))
44.240646
tensor(44.2387, grad_fn=<ViewBackward0>)
4.3.3. Compare structural similarity index measure (SSIM)
This function has a C implementation (for cpu and no grad).
This function has a fft implementation (for stride = 1).
This function is torch differentiable.
[8]:
print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy))
print(cutcutcodec.ssim(ref_frame, dis_frame))
print(cutcutcodec.ssim(ref_frame_numpy, dis_frame_numpy, stride=2, sigma=1.2))
0.9906746
tensor(0.9907, grad_fn=<ViewBackward0>)
0.99125826
4.3.4. Compute the unreferenced Universal Video Quality Model (UVQ)
It is the metric used by Youtube
It doesn’t care the image resolution
It uses only one frame every 200ms, on slices of 1 second.
[9]:
print(cutcutcodec.uvq([ref_frame.tolist()]*5))
print(cutcutcodec.uvq([dis_frame.tolist()]*5))
3.103020429611206
2.9377589225769043
4.3.5. Compare Netflix Video Multi-Method Assessment Fusion (VMAF)
This function only allows safe interfacing with the vmaf package. So you need to install vmaf.
[10]:
print(cutcutcodec.vmaf(ref_frame_numpy, dis_frame_numpy))
41.258125