The trace of a square matrix is the sum of the elements on the main diagonal. That is, for an *n* by *n* square matrix *A*, the trace of *A* is

$$\mathrm{tr}(A) = \sum_{i=1}^n A_{ii} .$$

This might not seem too exciting at first. However, the trace operator has a neat quasi-commutative property: for matrices *U* and *V*, so long as the internal dimensions work out, it is true that

$$\mathrm{tr}(UV) = \mathrm{tr}(VU) .$$

The proof isn’t too hard so I’ll skip it. If we had a third matrix *W* (again assuming the internal dimensions work out), since matrix multiplication is associative, it is also true that

$$\mathrm{tr}(UVW) = \mathrm{tr}(WUV) = \mathrm{tr}(VWU) .$$

It’s not truly commutative, since you can only do cyclic shifts of the arguments. So, e.g., tr(*UVW*) is not equal to tr(*WVU*) in general.

What can you do with this? For one thing, note that the trace of a scalar a is itself: tr(*a*) = *a*. So if you have a matrix multiplication that results in a scalar, you can use trace to rearrange the arguments.

For instance, let *U* be a 1 by *n* row vector, and let *V* be an *n* by *n* matrix. If *U** is the transpose of *U*, then *UVU** is a scalar. This kind of expression comes up pretty often in jointly Gaussian distributions.

Now say U is a zero-mean vector with covariance matrix E[*U***U*], and I want to know E[*UVU**]. Using the trace trick, I can express this expectation in terms of E[*U***U*]: first, we can write

$$UVU^* = \mathrm{tr}(UVU^*) = \mathrm{tr}(VU^*U) ,$$

and since expectation distributes over the trace sum, we have

$$E[\mathrm{tr}(VU^*U)] = \mathrm{tr}(E[VU^*U]) = \mathrm{tr}(V E[U^*U]) .$$

As a result, if you know the covariance E[*U*U*], there’s no need to recalculate any expectations.

*This is a repost: original is at http://andreweckford.blogspot.ca/2009/09/trace-tricks.html*