cutcutcodec.core.analysis.video.complexity.dct.spatial_dct¶

cutcutcodec.core.analysis.video.complexity.dct.spatial_dct(img: Tensor, threads: int = 0, patch: Integral = 32) → Tensor[source]¶

Compute the spatial dct complexity for the image.

The dct spatial complexity \(C_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:

\[\begin{split}\begin{cases} C_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} H_m \\ H_m = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_m(i,j)\right| \\ \mathscr{D}_m(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_m(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]

With \(\mathscr{F}_m(i,j)\) the DCT-II applied to the patch \(m\) of the image, calculated by the function compute_dct(). The patches cover the full image and are not overlapping.

Parameters¶

imgarraylike: The Y[UV] images, of shape ([*batch], [1], height, width, [channels]). Only the Y component is used. It has to be in range [0, 1]. The image is sliced in non-overlapping squares of size \(s \times s\). If the height or width of the image is not a multiple of \(s\), edges will be cropped.
threadsint, optional: Defines the number of threads. The value -1 means that the function uses as many calculation threads as there are cores. The default value (0) allows the same behavior as (-1) if the function is called in the main thread, otherwise (1) to avoid nested threads. Any other positive value corresponds to the number of threads used.
patchint, default = 32: The patch size \(s\). It has to be >= 1. The default value of 32 is the one proposed in the VCA paper.

Returns¶

spatial_dctarraylike: The \(C_{\text{dct}}\) scalar for each image (of shape batch).

Notes¶

It comes from the paper A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.
The VCA tool offers an optimized version of this metric. The result is close to the E column of the .csv file generated with ffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.
This function can be called by cutcutcodec metric video.mp4 --spatial-dct -o result.json.

Examples¶

>>> import numpy as np
>>> from cutcutcodec.core.analysis.video.complexity import spatial_dct
>>> np.random.seed(0)
>>> img = np.random.random((720, 1080, 3))  # It could also be a torch array list...
>>> spatial_dct(img).round(2)
array([1.59])
>>>