cutcutcodec.core.analysis.video.complexity.dct¶

Compute a differenciable batched torch spatial dtc complexity.

Functions

`compute_dct`(tensor, dim)	Compute the DCT-II on the given axis.
`dct_matrix`(size, dtype)	Return the DCT-II matrix, including average coefficient.
`spatial_dct`(img[, threads, patch])	Compute the spatial dct complexity for the image.
`temporal_dct`(imgs[, threads, patch])	Compute the temporal dct complexity between 2 images.

Details

cutcutcodec.core.analysis.video.complexity.dct.compute_dct(tensor: Tensor, dim: int) → Tensor[source]

Compute the DCT-II on the given axis.

The output vector \(\hat x_k\) is defined as \(\hat x_k = \sum\limits_{l=0}^{n-1} x_l \cos\left(\frac{\pi}{n}\left(l+\frac{1}{2}\right)k\right)\).

It is calculated by a matrix product, computed by dct_matrix().

Parameters¶

inputtorch.Tensor: A n-dimensional tensor of real.
dimint: The axis along which the DCT is computed. The other axes are treated as batch dimensions.

Returns¶

outputtorch.Tensor: The dimension of the input tensor. The input and output have the same size.

Examples¶

>>> import torch
>>> from cutcutcodec.core.analysis.video.complexity.dct import compute_dct
>>> src = torch.randn((128, 16, 16))
>>> 2d_dct = compute_dct(compute_dct(src, -1), -2)  # compute the 2d dct
>>>

cutcutcodec.core.analysis.video.complexity.dct.dct_matrix(size: Integral, dtype: dtype) → Tensor[source]

Return the DCT-II matrix, including average coefficient.

The square matrix \(\boldsymbol{D} \in \mathcal M_{n,n}(\mathbb R)\) is defined as \(d_{ij} = \cos\left(\frac{\pi}{n}\left(i-1\right)\left(j-\frac{1}{2}\right)\right)\).

For a given “temporal” column vector \(\boldsymbol{x} \in \mathcal M_{n,1}(\mathbb R)\), the “spatial” column vector \(\boldsymbol{\hat{x}} \in \mathcal M_{n,1}(\mathbb R)\) is obtained with \(\boldsymbol{\hat{x}} = \boldsymbol{D}\boldsymbol{x}\).

Parameters¶

sizeint: The matrix size \(n\).
dtypetorch.dtype: The torch dtype of the matrix, float16, float32 or float64.

Returns¶

dtc_matrixtorch.Tensor: The 2d square matrix \(\boldsymbol{D}\) of the DCT-II coefficients.

Examples¶

>>> import torch
>>> from cutcutcodec.core.analysis.video.complexity.dct import dct_matrix
>>> dct_matrix(8, torch.float32)
tensor([[ 1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000,  1.0000],
        [ 0.9808,  0.8315,  0.5556,  0.1951, -0.1951, -0.5556, -0.8315, -0.9808],
        [ 0.9239,  0.3827, -0.3827, -0.9239, -0.9239, -0.3827,  0.3827,  0.9239],
        [ 0.8315, -0.1951, -0.9808, -0.5556,  0.5556,  0.9808,  0.1951, -0.8315],
        [ 0.7071, -0.7071, -0.7071,  0.7071,  0.7071, -0.7071, -0.7071,  0.7071],
        [ 0.5556, -0.9808,  0.1951,  0.8315, -0.8315, -0.1951,  0.9808, -0.5556],
        [ 0.3827, -0.9239,  0.9239, -0.3827, -0.3827,  0.9239, -0.9239,  0.3827],
        [ 0.1951, -0.5556,  0.8315, -0.9808,  0.9808, -0.8315,  0.5556, -0.1951]])
>>> _ @ torch.sin(0.5 * torch.pi * torch.arange(8))[:, None]
tensor([[ 0.0000e+00],
        [ 1.0616e+00],
        [ 2.6822e-07],
        [ 2.1727e+00],
        [-2.8284e+00],
        [-1.4518e+00],
        [-2.9802e-07],
        [-2.1116e-01]])
>>>

cutcutcodec.core.analysis.video.complexity.dct.spatial_dct(img: Tensor, threads: int = 0, patch: Integral = 32) → Tensor[source]

Compute the spatial dct complexity for the image.

The dct spatial complexity \(C_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:

\[\begin{split}\begin{cases} C_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} H_m \\ H_m = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_m(i,j)\right| \\ \mathscr{D}_m(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_m(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]

With \(\mathscr{F}_m(i,j)\) the DCT-II applied to the patch \(m\) of the image, calculated by the function compute_dct(). The patches cover the full image and are not overlapping.

Parameters¶

imgarraylike: The Y[UV] images, of shape ([*batch], [1], height, width, [channels]). Only the Y component is used. It has to be in range [0, 1]. The image is sliced in non-overlapping squares of size \(s \times s\). If the height or width of the image is not a multiple of \(s\), edges will be cropped.
threadsint, optional: Defines the number of threads. The value -1 means that the function uses as many calculation threads as there are cores. The default value (0) allows the same behavior as (-1) if the function is called in the main thread, otherwise (1) to avoid nested threads. Any other positive value corresponds to the number of threads used.
patchint, default = 32: The patch size \(s\). It has to be >= 1. The default value of 32 is the one proposed in the VCA paper.

Returns¶

spatial_dctarraylike: The \(C_{\text{dct}}\) scalar for each image (of shape batch).

Notes¶

It comes from the paper A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.
The VCA tool offers an optimized version of this metric. The result is close to the E column of the .csv file generated with ffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.
This function can be called by cutcutcodec metric video.mp4 --spatial-dct -o result.json.

Examples¶

>>> import numpy as np
>>> from cutcutcodec.core.analysis.video.complexity import spatial_dct
>>> np.random.seed(0)
>>> img = np.random.random((720, 1080, 3))  # It could also be a torch array list...
>>> spatial_dct(img).round(2)
array([1.59])
>>>

cutcutcodec.core.analysis.video.complexity.dct.temporal_dct(imgs: Tensor, threads: int = 0, patch: Integral = 32) → Tensor[source]

Compute the temporal dct complexity between 2 images.

The dct temporal complexity \(H_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:

\[\begin{split}\begin{cases} H_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} \left| H_{m,t} - H_{m,t-1} \right| \\ H_{m,t} = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_{m,t}(i,j)\right| \\ \mathscr{D}_{m,t}(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_{m,t}(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]

With \(\mathscr{F}_{m,t}(i,j)\) the DCT-II applied to the patch \(m\) of the image \(t\), calculated by the function compute_dct(). The patches cover the full image and are not overlapping.

Parameters¶

imgsarraylike: The Y[UV] images, of shape ([*batch], 2, height, width, [channels]). Only the Y component is used. It has to be in range [0, 1].
threads, patch:: Same as spatial_dct().

Returns¶

temporal_dctarraylike: The \(H_{dct} \in \mathbb{R}^+\) scalar for each couple of image (of shape batch).

Notes¶

It is inspired by the paper A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.
The VCA tool offers an optimized version of a similar metric. The result is close to the h column of the .csv file generated with ffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.
This function can be called by cutcutcodec metric video.mp4 --temporal-dct -o result.json.

Examples¶

>>> import numpy as np
>>> from cutcutcodec.core.analysis.video.complexity import temporal_dct
>>> np.random.seed(0)
>>> imgs = np.random.random((2, 720, 1080, 3))  # It could also be a torch array list...
>>> temporal_dct(imgs).round(2)
array([0.03])
>>>