cutcutcodec.core.analysis.video.complexity.dct¶
Compute a differenciable batched torch spatial dtc complexity.
Functions
|
Compute the DCT-II on the given axis. |
|
Return the DCT-II matrix, including average coefficient. |
|
Compute the spatial dct complexity for the image. |
|
Compute the temporal dct complexity between 2 images. |
Details
- cutcutcodec.core.analysis.video.complexity.dct.compute_dct(tensor: Tensor, dim: int) Tensor[source]
Compute the DCT-II on the given axis.
The output vector \(\hat x_k\) is defined as \(\hat x_k = \sum\limits_{l=0}^{n-1} x_l \cos\left(\frac{\pi}{n}\left(l+\frac{1}{2}\right)k\right)\).
It is calculated by a matrix product, computed by
dct_matrix().Parameters¶
- inputtorch.Tensor
A n-dimensional tensor of real.
- dimint
The axis along which the DCT is computed. The other axes are treated as batch dimensions.
Returns¶
- outputtorch.Tensor
The dimension of the input tensor. The input and output have the same size.
Examples¶
>>> import torch >>> from cutcutcodec.core.analysis.video.complexity.dct import compute_dct >>> src = torch.randn((128, 16, 16)) >>> 2d_dct = compute_dct(compute_dct(src, -1), -2) # compute the 2d dct >>>
- cutcutcodec.core.analysis.video.complexity.dct.dct_matrix(size: Integral, dtype: dtype) Tensor[source]
Return the DCT-II matrix, including average coefficient.
The square matrix \(\boldsymbol{D} \in \mathcal M_{n,n}(\mathbb R)\) is defined as \(d_{ij} = \cos\left(\frac{\pi}{n}\left(i-1\right)\left(j-\frac{1}{2}\right)\right)\).
For a given “temporal” column vector \(\boldsymbol{x} \in \mathcal M_{n,1}(\mathbb R)\), the “spatial” column vector \(\boldsymbol{\hat{x}} \in \mathcal M_{n,1}(\mathbb R)\) is obtained with \(\boldsymbol{\hat{x}} = \boldsymbol{D}\boldsymbol{x}\).
Parameters¶
- sizeint
The matrix size \(n\).
- dtypetorch.dtype
The torch dtype of the matrix, float16, float32 or float64.
Returns¶
- dtc_matrixtorch.Tensor
The 2d square matrix \(\boldsymbol{D}\) of the DCT-II coefficients.
Examples¶
>>> import torch >>> from cutcutcodec.core.analysis.video.complexity.dct import dct_matrix >>> dct_matrix(8, torch.float32) tensor([[ 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000], [ 0.9808, 0.8315, 0.5556, 0.1951, -0.1951, -0.5556, -0.8315, -0.9808], [ 0.9239, 0.3827, -0.3827, -0.9239, -0.9239, -0.3827, 0.3827, 0.9239], [ 0.8315, -0.1951, -0.9808, -0.5556, 0.5556, 0.9808, 0.1951, -0.8315], [ 0.7071, -0.7071, -0.7071, 0.7071, 0.7071, -0.7071, -0.7071, 0.7071], [ 0.5556, -0.9808, 0.1951, 0.8315, -0.8315, -0.1951, 0.9808, -0.5556], [ 0.3827, -0.9239, 0.9239, -0.3827, -0.3827, 0.9239, -0.9239, 0.3827], [ 0.1951, -0.5556, 0.8315, -0.9808, 0.9808, -0.8315, 0.5556, -0.1951]]) >>> _ @ torch.sin(0.5 * torch.pi * torch.arange(8))[:, None] tensor([[ 0.0000e+00], [ 1.0616e+00], [ 2.6822e-07], [ 2.1727e+00], [-2.8284e+00], [-1.4518e+00], [-2.9802e-07], [-2.1116e-01]]) >>>
- cutcutcodec.core.analysis.video.complexity.dct.spatial_dct(img: Tensor, threads: int = 0, patch: Integral = 32) Tensor[source]
Compute the spatial dct complexity for the image.
The dct spatial complexity \(C_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:
\[\begin{split}\begin{cases} C_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} H_m \\ H_m = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_m(i,j)\right| \\ \mathscr{D}_m(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_m(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]With \(\mathscr{F}_m(i,j)\) the DCT-II applied to the patch \(m\) of the image, calculated by the function
compute_dct(). The patches cover the full image and are not overlapping.Parameters¶
- imgarraylike
The Y[UV] images, of shape ([*batch], [1], height, width, [channels]). Only the Y component is used. It has to be in range [0, 1]. The image is sliced in non-overlapping squares of size \(s \times s\). If the height or width of the image is not a multiple of \(s\), edges will be cropped.
- threadsint, optional
Defines the number of threads. The value -1 means that the function uses as many calculation threads as there are cores. The default value (0) allows the same behavior as (-1) if the function is called in the main thread, otherwise (1) to avoid nested threads. Any other positive value corresponds to the number of threads used.
- patchint, default = 32
The patch size \(s\). It has to be >= 1. The default value of 32 is the one proposed in the VCA paper.
Returns¶
- spatial_dctarraylike
The \(C_{\text{dct}}\) scalar for each image (of shape batch).
Notes¶
It comes from the paper
A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.The VCA tool offers an optimized version of this metric. The result is close to the
Ecolumn of the .csv file generated withffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.This function can be called by
cutcutcodec metric video.mp4 --spatial-dct -o result.json.
Examples¶
>>> import numpy as np >>> from cutcutcodec.core.analysis.video.complexity import spatial_dct >>> np.random.seed(0) >>> img = np.random.random((720, 1080, 3)) # It could also be a torch array list... >>> spatial_dct(img).round(2) array([1.59]) >>>
- cutcutcodec.core.analysis.video.complexity.dct.temporal_dct(imgs: Tensor, threads: int = 0, patch: Integral = 32) Tensor[source]
Compute the temporal dct complexity between 2 images.
The dct temporal complexity \(H_{\text{dct}} \in \mathbb{R}^+\) is defined as follow:
\[\begin{split}\begin{cases} H_{\text{dct}} = \frac{1}{n_{\text{blocs}}} \sum\limits_{m=1}^{n_{\text{blocs}}} \left| H_{m,t} - H_{m,t-1} \right| \\ H_{m,t} = \frac{1}{s^2} \sum\limits_{i=1}^s \sum\limits_{j=1}^s e^{\left(\frac{ij}{s^2}\right)^2-1} \left|\mathscr{D}_{m,t}(i,j)\right| \\ \mathscr{D}_{m,t}(i,j) = \begin{cases} 0 & \text{if } i + j = 2 \\ \mathscr{F}_{m,t}(i,j) & \text{otherwise} \\ \end{cases} \\ \end{cases}\end{split}\]With \(\mathscr{F}_{m,t}(i,j)\) the DCT-II applied to the patch \(m\) of the image \(t\), calculated by the function
compute_dct(). The patches cover the full image and are not overlapping.Parameters¶
- imgsarraylike
The Y[UV] images, of shape ([*batch], 2, height, width, [channels]). Only the Y component is used. It has to be in range [0, 1].
- threads, patch:
Same as
spatial_dct().
Returns¶
- temporal_dctarraylike
The \(H_{dct} \in \mathbb{R}^+\) scalar for each couple of image (of shape batch).
Notes¶
It is inspired by the paper
A NEW ENERGY FUNCTION FOR SEGMENTATION AND COMPRESSION.The VCA tool offers an optimized version of a similar metric. The result is close to the
hcolumn of the .csv file generated withffmpeg -i video.mp4 -f yuv4mpegpipe - | vca --y4m --input stdin --no-lowpass --complexity-csv result.csv.This function can be called by
cutcutcodec metric video.mp4 --temporal-dct -o result.json.
Examples¶
>>> import numpy as np >>> from cutcutcodec.core.analysis.video.complexity import temporal_dct >>> np.random.seed(0) >>> imgs = np.random.random((2, 720, 1080, 3)) # It could also be a torch array list... >>> temporal_dct(imgs).round(2) array([0.03]) >>>