cutcutcodec.core.analysis.video.complexity.dct

Compute a differenciable batched torch spatial dtc complexity.

Functions

compute_dct(tensor, dim)

Compute the DCT-II on the given axis.

dct_matrix(size, dtype)

Return the DCT-II matrix, including average coefficient.

spatial_dct(img[, threads, patch])

Compute the spatial dct complexity for the image.

temporal_dct(imgs[, threads, patch])

Compute the temporal dct complexity between 2 images.

Details

cutcutcodec.core.analysis.video.complexity.dct.compute_dct(tensor: Tensor, dim: int) Tensor[source]

Compute the DCT-II on the given axis.

The output vector \(\hat x_k\) is defined as \(\hat x_k = \sum\limits_{l=0}^{n-1} x_l \cos\left(\frac{\pi}{n}\left(l+\frac{1}{2}\right)k\right)\).

It is calculated by a matrix product, computed by dct_matrix().

Parameters

inputtorch.Tensor

A n-dimensional tensor of real.

dimint

The axis along which the DCT is computed. The other axes are treated as batch dimensions.

Returns

outputtorch.Tensor

The dimension of the input tensor. The input and output have the same size.

Examples

>>> import torch
>>> from cutcutcodec.core.analysis.video.complexity.dct import compute_dct
>>> src = torch.randn((128, 16, 16))
>>> 2d_dct = compute_dct(compute_dct(src, -1), -2)  # compute the 2d dct
>>>
cutcutcodec.core.analysis.video.complexity.dct.dct_matrix(size: Integral, dtype: dtype) Tensor[source]

Return the DCT-II matrix, including average coefficient.

The square matrix \(\boldsymbol{D} \in \mathcal M_{n,n}(\mathbb R)\) is defined as \(d_{ij} = \cos\left(\frac{\pi}{n}\left(i-1\right)\left(j-\frac{1}{2}\right)\right)\).

For a given “temporal” column vector \(\boldsymbol{x} \in \mathcal M_{n,1}(\mathbb R)\), the “spatial” column vector \(\boldsymbol{\hat{x}} \in \mathcal M_{n,1}(\mathbb R)\) is obtained with \(\boldsymbol{\hat{x}} = \boldsymbol{D}\boldsymbol{x}\).

Parameters

sizeint

The matrix size \(n\).

dtypetorch.dtype

The torch dtype of the matrix, float16, float32 or float64.

Returns

dtc_matrixtorch.Tensor

The 2d square matrix \(\boldsymbol{D}\) of the DCT-II coefficients.

Examples

>>> import torch
>>> from cutcutcodec.core.analysis.video.complexity.dct import dct_matrix
>>> dct_matrix(8, torch.float32)
tensor([[ 1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000],
        [ 0.9808,  0.8315,  0.5556,  0.1951, -0.1951, -0.5556, -0.8315, -0.9808],
        [ 0.9239,  0.3827, -0.3827, -0.9239, -0.9239, -0.3827,  0.3827,  0.9239],
        [ 0.8315, -0.1951, -0.9808, -0.5556,  0.5556,  0.9808,  0.1951, -0.8315],
        [ 0.7071, -0.7071, -0.7071,  0.7071,  0.7071, -0.7071, -0.7071,  0.7071],
        [ 0.5556, -0.9808,  0.1951,  0.8315, -0.8315, -0.1951,  0.9808, -0.5556],
        [ 0.3827, -0.9239,  0.9239, -0.3827, -0.3827,  0.9239, -0.9239,  0.3827],
        [ 0.1951, -0.5556,  0.8315, -0.9808,  0.9808, -0.8315,  0.5556, -0.1951]])
>>> _ @ torch.sin(0.5 * torch.pi * torch.arange(8))[:, None]
tensor([[ 0.0000e+00],
        [ 1.0616e+00],
        [ 2.6822e-07],
        [ 2.1727e+00],
        [-2.8284e+00],
        [-1.4518e+00],
        [-2.9802e-07],
        [-2.1116e-01]])
>>>
cutcutcodec.core.analysis.video.complexity.dct.spatial_dct(img: Tensor, threads: int = 0, patch: Integral = 32) Tensor[source]

Compute the spatial dct complexity for the image.

The dct spatial complexity \(C_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:

\[\begin{split}\begin{cases} C_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} H_m \\ H_m = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_m(i,j)\right| \\ \mathscr{D}_m(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_m(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]

With \(\mathscr{F}_m(i,j)\) the DCT-II applied to the patch \(m\) of the image, calculated by the function compute_dct(). The patches cover the full image and are not overlapping.

Parameters

imgarraylike

The Y[UV] images, of shape ([*batch], [1], height, width, [channels]). Only the Y component is used. It has to be in range [0, 1]. The image is sliced in non-overlapping squares of size \(s \times s\). If the height or width of the image is not a multiple of \(s\), edges will be cropped.

threadsint, optional

Defines the number of threads. The value -1 means that the function uses as many calculation threads as there are cores. The default value (0) allows the same behavior as (-1) if the function is called in the main thread, otherwise (1) to avoid nested threads. Any other positive value corresponds to the number of threads used.

patchint, default = 32

The patch size \(s\). It has to be >= 1. The default value of 32 is the one proposed in the VCA paper.

Returns

spatial_dctarraylike

The \(C_{\text{dct}}\) scalar for each image (of shape batch).

Notes

  • It comes from the paper A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.

  • The VCA tool offers an optimized version of this metric. The result is close to the E column of the .csv file generated with ffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.

  • This function can be called by cutcutcodec metric video.mp4 --spatial-dct -o result.json.

Examples

>>> import numpy as np
>>> from cutcutcodec.core.analysis.video.complexity import spatial_dct
>>> np.random.seed(0)
>>> img = np.random.random((720, 1080, 3))  # It could also be a torch array list...
>>> spatial_dct(img).round(2)
array([1.59])
>>>
cutcutcodec.core.analysis.video.complexity.dct.temporal_dct(imgs: Tensor, threads: int = 0, patch: Integral = 32) Tensor[source]

Compute the temporal dct complexity between 2 images.

The dct temporal complexity \(H_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:

\[\begin{split}\begin{cases} H_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} \left| H_{m,t} - H_{m,t-1} \right| \\ H_{m,t} = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_{m,t}(i,j)\right| \\ \mathscr{D}_{m,t}(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_{m,t}(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]

With \(\mathscr{F}_{m,t}(i,j)\) the DCT-II applied to the patch \(m\) of the image \(t\), calculated by the function compute_dct(). The patches cover the full image and are not overlapping.

Parameters

imgsarraylike

The Y[UV] images, of shape ([*batch], 2, height, width, [channels]). Only the Y component is used. It has to be in range [0, 1].

threads, patch:

Same as spatial_dct().

Returns

temporal_dctarraylike

The \(H_{dct} \in \mathbb{R}^+\) scalar for each couple of image (of shape batch).

Notes

  • It is inspired by the paper A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.

  • The VCA tool offers an optimized version of a similar metric. The result is close to the h column of the .csv file generated with ffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.

  • This function can be called by cutcutcodec metric video.mp4 --temporal-dct -o result.json.

Examples

>>> import numpy as np
>>> from cutcutcodec.core.analysis.video.complexity import temporal_dct
>>> np.random.seed(0)
>>> imgs = np.random.random((2, 720, 1080, 3))  # It could also be a torch array list...
>>> temporal_dct(imgs).round(2)
array([0.03])
>>>