[Review] Covariance Tensor For Convolutional Neural Networks

Oscar Guarnizo
11 min readMay 3, 2022

--

This post intends to continue the same track as my previous post (Image2StyleGAN) and present my relevant works. A year ago, I worked for an Artificial Intelligence (AI) company called DIGEVO in Chile. I was mainly in charge of developing and deploying deep learning computer vision models. It was a fruitful experience because I could understand better the production side of AI.

Some things that work in concept do not always work in practice.

Although I worked on production activities most of the time, there was a possibility to work on a research project during my stay at DIGEVO. I accepted it immediately because, to be honest, I enjoy more researching, learning new things, and figuring out challenging solutions.

As a result, we achieved to publish our work in an Open Access article at IEEE Access. The project entitled “Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition” proposes a new method to extract features from image datasets using Covariance Tensor and Tucker Decomposition.

The usefulness of our method is the possibility of leveraging these features to enhance Convolutional Neural Network (CNN) performance. Our paper explored a rearrangement of these features as kernels for a CNN; specifically, we performed two tests

  1. Direct inference using these kernels (freeze conv layers and just train the fully connected layers).
  2. Initialize using kernels (training the whole network using these kernels as initializers).

We found promising results. Nonetheless, I was intrigued about more elaborated applications of these kernels, such as regularizers or contrastive learning approaches. For that reason, I did this post intending to promote further research on this topic.

Who I am? Hi there! 👋

I’m Oscar, a young computer scientist from Ecuador. During my bachelor’s program, I received an unconventional formation in Artificial Intelligence and Machine Learning (AI/ML). Since that point, I have been studying AI/ML on my own (recently with a more efficient plan).

My journey so far has been very fruitful and has allowed me to accomplish several milestones. So far, I have been able to publish 4+ scientific articles. I have worked in the industry on projects for computer vision. I have done an internship at KAUST, Saudi Arabia, working on StyleGAN projects. I was a member of Scientific Computing Group back in my college. Last but not least, I recently co-founded DeepARC, a non-profit research group that I am most proud of. It was created by the cooperation of alumni, professors, and students in order to encourage AI/ML research among undergraduate students.

1. Tensor Principles

In the following, I will explain some concepts that you will need to understand in our article.

If you’re here, I’m guessing you already know about vectors, matrices, and tensors. But for whom doesn’t know, a tensor is a multi-dimensional array representation. To put this into context, a vector is a one-dimensional array, while a matrix is a two-dimensional array. So, roughly speaking, we will call tensor to an array with three or more dimensions.

Nevertheless, the term tensor is a generalization for dimensional arrays so that a vector is a one-order tensor, and a matrix is a two-order tensor.

Tensor Unfolding

A practical way to deal with tensor representations is to unfold a tensor into a matrix representation. A tensor unfolding (also called matrization) with mode-n reduces a tensor to a matrix by conserving the n-th dimension and reshaping (unifying) the remaining dimensions of the tensor.

The image below shows us a third-order tensor’s unfolding modes 1, 2, and 3. As you can see, we keep the n-th dimension and unify the remaining dimensions. For example, in a third-order tensor, mode 1 corresponds to the rows, mode 2 to the columns, and mode 3 to the depth.

Note: When I was trying to understand this project for the first time, it was really helpful for me to understand the input and output dimensions of any operation. Therefore, I added the dimensions to our graphs.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

N-mode product

Additionally, we can perform an n-mode product operation over these unfolding representations. The n-mode product is calculated between a tensor and a matrix, where the tensor’s n-th dimension must coincide with the last dimension of the matrix.

The n-mode product is equivalent to pre-multiplying each mode-n unfolding of X by U and folding the result all again. It is a fact that helps us to understand how to perform this operation.

The bellow shows us the n-mode product between a third-order tensor and a matrix. First, we have to unfold the tensor with mode n. Then, we perform a regular matrix product and fold the result again.

Note: Remember the last dimension of the matrix must match the n-th dimension of the tensor. As we get the product mode 1, then J2=I1. However, if we want product mode 2, then J2=I2. In the same way, if we want the product mode 3, them J2=I3.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

Note: Take in mind that n-th dimension of the tensor is replaced by the J1 dimension of the matrix on the resulting tensor.

Covariance Tensor

The covariance tensor, also called n-mode cross covariance, is a generalization of the covariance matrix. In our work, we got the covariance tensor of the same tensor, but it is also possible to get the covariance tensor between different tensors.

To understand how to calculate the covariance tensor, we will start from the simple case, the covariance matrix.

In the simple case, the covariance matrix (two-order tensor), we can slice the matrix in column vectors and calculate the outer product between each pair. After that, we sum up all the resulting matrices and get the covariance matrix.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

In the same way, we can apply a similar procedure for higher-order tensors, where now the n-mode indicates the dimension that we have to slice.

The image below shows us the covariance tensor of a third-order tensor with dimensions (I1, I2, I3). We slice by the 3-mode, calculate the outer product and sum up all the resulting tensors.

Note: In some sense, in the covariance tensor, we use the mode n as a slicer to get the relationship between the slices.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

It is important to realize the output dimensions of a covariance tensor. As we eliminate the third dimension by slicing it, our output dimensions will be given by the outer product of the remaining dimensions so that our output dimensions are (I1, I2, I2, I1).

Tucker Decomposition

Tucker Decomposition is a tensor decomposition method to factorize a tensor into a core tensor G and a set of orthogonal factor matrices U. The image below shows us the Tucker Decomposition, where we use n-mode products so that the core tensor G is multiplied by the unfolding modes of U.

Note: Here, I am not writing the dimensions explicitly, but we will see what the dimensions are later in the main method.

As we will see later, the core tensor encapsulates some relevant features of the tensor X. That is why we are interested in finding it in our article. As the factor matrices U are orthogonal, we can get its core tensor by the following equation.

Consequently, we have to get the factor matrices U. Suppose we have tensor X; then we can get factor matrices U using Singular Value Decomposition (SVD). Specifically, we must unfold the tensor X with mode n and calculate the SVD. The resulting U matrix from SVD will be our factor matrix mode n.

Note: This fact has a theoretical explanation, but it’s enough to understand that we will apply SVD to the unfolding representations of tensor X.

2. Covariance Tensor Method

Once we understand the preliminaries above, we can start with our method.

Note: From this point, I will write new tensors, dimensions, and illustrations for our method specifically. Don’t get confused with the previous explanation.

The following image depicts our method, called the Covariance Tensor Method (CovTen). In the following sections, I will explain each of these modules’ behavior one by one.

The idea behind our method is to extract relevant features that characterize a given dataset of images.

In this way, we start with a given dataset X and extract image examples one by one. The image example is defined as

Consequently, we extract patches from the image given a stride and patch size (same for height and width).

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

Then, the extracted patches are grouped into a tensor.

Later, we found the covariance tensor (with mode 4) from this tensor of patches and kept the resulting tensor.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

The resulting tensor is super symmetric produced by the slicing with mode 4 and outer product in the internal process.

Note: Roughly speaking, getting this covariance tensor with mode 4 means catching the relationship between different patches. Or, in other words, grab the relationships between different regions of an image.

Subsequently, we get the covariance tensor for all the training examples and average them. Note that the mean covariance tensor is still super symmetric.

The next step involves getting the factor matrices (U’s) from the mean covariance tensor. Here, unlike the standard tucker decomposition, we decide only to get the factor matrices U1, U2, and U3. It is because the covariance matrix is super symmetric so that we can get a good approximation of the tensor core with only these three matrices.

Note: We had some mistakes with the dimensions of unfoldings and the average covariance tensor in the original article. Here, these typos were corrected.

First, we have to get the unfolding modes 1, 2 and 3.

Then, we apply the singular value decomposition (SVD) to each unfolding.

Once we have the factor matrices, we can get an approximation of the core tensor as

Note: Applying these products, modes 1, 2, and 3, is similar to multiplying specific faces of the covariance tensor by the matrices U modes 1, 2, 3, as the following illustration indicates.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

The resulting core tensor dimensions are given by the first dimension of each factor matrix U and the patches dimensions.

The core tensor encapsulates some relevant features of our dataset. We propose rearranging these features into kernels of size (patch height, patch width, channels) for subsequently plugged into convolutional layers.

Note: As the number of kernels is given by the dimensions J1, J2, and J3, you can select the number of kernels by choosing how many columns from U1, U2, or U3 to choose for building the approximation of the core tensor G. But recall that the first columns are the more representative.

At this point, you just need to plug the kernels into a convolutional layer.

3. Cascade Style

Although we already have a method to extract features from the images dataset, we must consider that a convolutional neural network (CNN) incorporates multiple layers. Therefore, we need a technique to plug into different layers efficiently.

We choose to follow a cascade style approach, similar to the proposal on PCA Net. The idea is to repeat our method after each convolution with the subsequent layers, using the previous layers’ output (feature maps) as the input for our method.

Convolutional Neural Network Feature Extraction Using Covariance Tensor Decomposition by Fonseca et al.

4. Inference and Initializers

We perform two different tests to check the usefulness of our method.

  1. Inference Capacity: in this test, we seek to check if the extracted features are representative enough to perform discriminative tasks for classifying images. In this way, we freeze the convolutional layers to train only the fully connected layers.
  2. Kernel Initialization: in this test, we seek to check the initialization capacity of our features when a CNN is trained thoroughly (without frozen layers).

The results of both experiments were promising, indicating that our method achieves extracting relevant characteristics from image datasets.

Despite our results, we were intrigued by the possible applications of these kernels. We came up with some ideas about regularizers, contrastive learning, or one-shot learning, but we didn’t develop them further. Additionally, we only perform experiments in relatively small datasets with low resolutions. Probably, we could get more insights from large datasets. I elaborate this blog to promote further research on this topic for all these reasons.

5. Final Thoughts

Thank you so much if you got to this point; I really appreciate that. I hope to receive any constructive advice. I’m always looking to improve the quality of my posts and your comments will be highly valuable to me.

Additionally, if you want to check more about this topic, I leave some literature for a further review.

Have a nice day! :D

--

--

Oscar Guarnizo
Oscar Guarnizo

Written by Oscar Guarnizo

A young computer scientist with a great passion for machine learning. My page: https://zosov.github.io/ Github: https://github.com/ZosoV

No responses yet