Differentiating tensor expressions

On this page, we will work with expressions of the form

uv\begin{aligned} \frac{\partial \underline{\boldsymbol{ u}}}{\partial \underline{\boldsymbol{ v}}} \end{aligned}

that is, differentiation a tensor valued expression wrt. to a tensor. In this case, u=uiei\underline{\boldsymbol{ u}}=u_i\underline{\boldsymbol{ e}}_{ i} and v=viei\underline{\boldsymbol{ v}}=v_i \underline{\boldsymbol{ e}}_{ i}. Working with constant orthonormal coordinate systems, we use that

uv=uivjeiej\begin{aligned} \frac{\partial \underline{\boldsymbol{ u}}}{\partial \underline{\boldsymbol{ v}}} = \frac{\partial u_i}{\partial v_j} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} \end{aligned}

Here, constant implies that the base vectors are constant in space. Hence, it suffices to differentiate the coefficients as the derivative of each base vector is zero.

Consider the following in 2d: The tensor b\boldsymbol{ b} is a function of the tensor a\boldsymbol{ a}, such that b(a)=aa\boldsymbol{ b}(\boldsymbol{ a}) = \boldsymbol{ a}\boldsymbol{ a} (i.e. bij=ainanjb_{ij} = a_{in}a_{nj}). We would like to differentiate ab\boldsymbol{ a}\boldsymbol{ b} wrt. a\boldsymbol{ a}.
aimbmj(a)akl==[ai1b1j(a)+ai2b2j(a)]akl,(Expand dummy index summation24 individualscalar expressions, one for each i,j,k,l)=ai1b1j(a)akl+ai2b2j(a)akl,(Product rule)=ai1b1j(a)akl+ai1aklb1j(a)+ai2b2j(a)akl+ai2aklb2j(a),(Chain rule)=ai1a1nanjakl+δikδ1lb1j(a)+ai2a2nanjakl+δikδ2lb2j(a)=ai1[a1nanjakl+a1naklanj]+δikδ1lb1j(a)+ai2[a2nanjakl+a2naklanj]+δikδ2lb2j(a)=ai1[a1nδnkδjl+δ1kδnlanj]+δikδ1lb1j(a)+ai2[a2nδnkδjl+δ2kδnlanj]+δikδ2lb2j(a)=ai1[a1kδjl+δ1kalj]+δikδ1lb1j(a)+ai2[a2kδjl+δ2kalj]+δikδ2lb2j(a)=aim[amkδjl+δmkalj]+δikδmlbmj(a),(Identify as summation,reinstate dummy indices)=aim[amkδjl+δmkalj]+δikblj(a)\begin{aligned} &\frac{\partial a_{im}b_{mj}(\boldsymbol{ a})}{\partial a_{kl}} =\\ &= \frac{\partial \left[a_{i1}b_{1j}(\boldsymbol{ a}) + a_{i2}b_{2j}(\boldsymbol{ a})\right]}{\partial a_{kl}}, \quad \left(\begin{matrix}\text{Expand dummy index summation} \Rightarrow 2^4 \text{ individual}\\ \text{scalar expressions, one for each }i,j,k,l\end{matrix}\right)\\ &= \frac{\partial a_{i1}b_{1j}(\boldsymbol{ a})}{\partial a_{kl}} + \frac{\partial a_{i2}b_{2j}(\boldsymbol{ a})}{\partial a_{kl}}, \quad (\text{Product rule})\\ &= a_{i1}\frac{\partial b_{1j}(\boldsymbol{ a})}{\partial a_{kl}} + \frac{\partial a_{i1}}{\partial a_{kl}}b_{1j}(\boldsymbol{ a}) + a_{i2}\frac{\partial b_{2j}(\boldsymbol{ a})}{\partial a_{kl}} + \frac{\partial a_{i2}}{\partial a_{kl}}b_{2j}(\boldsymbol{ a}), \quad (\text{Chain rule}) \\ &= a_{i1}\frac{\partial a_{1n}a_{nj}}{\partial a_{kl}} + \delta_{ik}\delta_{1l} b_{1j}(\boldsymbol{ a}) + a_{i2}\frac{\partial a_{2n}a_{nj}}{\partial a_{kl}} + \delta_{ik}\delta_{2l} b_{2j}(\boldsymbol{ a}) \\ &= a_{i1}\left[a_{1n}\frac{\partial a_{nj}}{\partial a_{kl}}+\frac{\partial a_{1n}}{\partial a_{kl}}a_{nj}\right] + \delta_{ik}\delta_{1l} b_{1j}(\boldsymbol{ a}) + a_{i2}\left[a_{2n}\frac{\partial a_{nj}}{\partial a_{kl}}+\frac{\partial a_{2n}}{\partial a_{kl}}a_{nj}\right] + \delta_{ik}\delta_{2l} b_{2j}(\boldsymbol{ a}) \\ &= a_{i1}\left[a_{1n}\delta_{nk}\delta_{jl}+\delta_{1k}\delta_{nl}a_{nj}\right] + \delta_{ik}\delta_{1l} b_{1j}(\boldsymbol{ a}) + a_{i2}\left[a_{2n}\delta_{nk}\delta_{jl}+\delta_{2k}\delta_{nl}a_{nj}\right] + \delta_{ik}\delta_{2l} b_{2j}(\boldsymbol{ a})\\ &= a_{i1}\left[a_{1k}\delta_{jl}+\delta_{1k}a_{lj}\right] + \delta_{ik}\delta_{1l} b_{1j}(\boldsymbol{ a}) + a_{i2}\left[a_{2k}\delta_{jl}+\delta_{2k}a_{lj}\right] + \delta_{ik}\delta_{2l} b_{2j}(\boldsymbol{ a}) \\ &= a_{im}\left[a_{mk}\delta_{jl}+\delta_{mk}a_{lj}\right] + \delta_{ik}\delta_{ml} b_{mj}(\boldsymbol{ a}), \quad \left( \begin{matrix} \text{Identify as summation,} \\ \text{reinstate dummy indices} \end{matrix} \right)\\ &= a_{im}\left[a_{mk}\delta_{jl}+\delta_{mk}a_{lj}\right] + \delta_{ik} b_{lj}(\boldsymbol{ a}) \end{aligned}

The same result is achieved without expanding the dummy indices\textcolor{blue}{ \text{dummy indices}}:

aimbmj(a)akl==aimbmj(a)akl+aimaklbmj(a)=aimamnanjakl+δikδmlbmj(a)=aim[amnanjakl+amnaklanj]+δikblj(a)=aim[amnδnkδjl+δmkδnlanj]+δikblj(a)=aim[amkδjl+δmkalj]+δikblj(a)\begin{aligned} &\frac{\partial a_{i\textcolor{blue}{ m}}b_{\textcolor{blue}{ m}j}(\boldsymbol{ a})}{\partial a_{kl}} =\\ &= a_{i\textcolor{blue}{ m}}\frac{\partial b_{\textcolor{blue}{ m}j}(\boldsymbol{ a})}{\partial a_{kl}} + \frac{\partial a_{i\textcolor{blue}{ m}}}{\partial a_{kl}}b_{\textcolor{blue}{ m}j}(\boldsymbol{ a})\\ &= a_{i\textcolor{blue}{ m}}\frac{\partial a_{\textcolor{blue}{ mn}}a_{\textcolor{blue}{ n}j}}{\partial a_{kl}} + \delta_{ik}\delta_{\textcolor{blue}{ m}l} b_{\textcolor{blue}{ m}j}(\boldsymbol{ a})\\ &= a_{i\textcolor{blue}{ m}}\left[a_{\textcolor{blue}{ mn}}\frac{\partial a_{\textcolor{blue}{ n}j}}{\partial a_{kl}}+\frac{\partial a_{\textcolor{blue}{ mn}}}{\partial a_{kl}}a_{\textcolor{blue}{ n}j}\right] + \delta_{ik} b_{lj}(\boldsymbol{ a})\\ &= a_{i\textcolor{blue}{ m}}\left[a_{\textcolor{blue}{ mn}}\delta_{\textcolor{blue}{ n}k}\delta_{jl}+\delta_{\textcolor{blue}{ m}k}\delta_{\textcolor{blue}{ n}l}a_{\textcolor{blue}{ n}j}\right] + \delta_{ik} b_{lj}(\boldsymbol{ a})\\ &= a_{i\textcolor{blue}{ m}}\left[a_{\textcolor{blue}{ m}k}\delta_{jl}+\delta_{\textcolor{blue}{ m}k}a_{lj}\right] + \delta_{ik} b_{lj}(\boldsymbol{ a}) \end{aligned}

And for completeness, this is a2I+aaT+IbT\boldsymbol{ a}^2 \overline{\otimes} \boldsymbol{ I} + \boldsymbol{ a}\overline{\otimes}\boldsymbol{ a}^{\mathrm{T}} + \boldsymbol{ I}\overline{\otimes}\boldsymbol{ b}^{\mathrm{T}}

Differentiating tensor function wrt. scalar

If we consider a=f(x)=xb\boldsymbol{ a} = f(x) = x\boldsymbol{ b}, then

ax=xbijxeiej=bijeiej=b\begin{aligned} \frac{\partial \boldsymbol{ a}}{\partial x} = \frac{\partial x b_{ij}}{\partial x} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} = b_{ij} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} = \boldsymbol{ b} \end{aligned}

because bijb_{ij} doesn't depend on xx.

Differentiating tensor function wrt. tensor

Let's first consider the differentiating a tensor wrt. itself. For a first-order tensor, we have

uu=uiujeiejuiuj=δijuu=I\begin{aligned} \frac{\partial \underline{\boldsymbol{ u}}}{\partial \underline{\boldsymbol{ u}}} &= \frac{\partial u_i}{\partial u_j} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j}\\ \frac{\partial u_i}{\partial u_j} &= \delta_{ij} \\ \frac{\partial \underline{\boldsymbol{ u}}}{\partial \underline{\boldsymbol{ u}}} &= \boldsymbol{ I} \end{aligned}

As ui/uj\partial u_i/\partial u_j is 1 if i=ji=j and 0 if iji\neq j.

If we now consider a 2nd order tensor, we have

aa=aijakleiejekelaijakl=δikδjlaa= I\begin{aligned} \frac{\partial \boldsymbol{ a}}{\partial \boldsymbol{ a}} &= \frac{\partial a_{ij}}{\partial a_{kl}} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j}\otimes\underline{\boldsymbol{ e}}_{ k}\otimes\underline{\boldsymbol{ e}}_{ l} \\ \frac{\partial a_{ij}}{\partial a_{kl}} &= \delta_{ik}\delta_{jl}\\ \frac{\partial \boldsymbol{ a}}{\partial \boldsymbol{ a}} &= \textbf{\textsf{ I}} \end{aligned}

aij/akl\partial a_{ij}/\partial a_{kl} is 1 only if i=ki=k and j=lj=l, otherwise, it is zero. In other words: aij/akl=δikδjl\partial a_{ij}/\partial a_{kl}=\delta_{ik}\delta_{jl}.

To consider a more complicated example, we look at

[va]v=vkakivjeiejvkakivj=vkvjaki=δkjaki=aji[va]v=aT\begin{aligned} \frac{\partial \left[\underline{\boldsymbol{ v}}\boldsymbol{ a}\right]}{\partial \underline{\boldsymbol{ v}}} &= \frac{\partial v_k a_{ki}}{\partial v_j} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} \\ \frac{\partial v_k a_{ki}}{\partial v_j} &= \frac{\partial v_k}{\partial v_j} a_{ki} = \delta_{kj} a_{ki} = a_{ji} \\ \frac{\partial \left[\underline{\boldsymbol{ v}}\boldsymbol{ a}\right]}{\partial \underline{\boldsymbol{ v}}} &= \boldsymbol{ a}^{\mathrm{T}} \end{aligned}

Differentiating scalar function wrt. tensor

If we consider y=f(a)=a:ay = f(\boldsymbol{ a}) = \boldsymbol{ a}:\boldsymbol{ a}, then

ya=aklaklaijeiej=[aklaijakl+aklaklaij]eiej=[δkiδljakl+aklδkiδlj]eiej=[aij+aij]eiej=2aijeiej=2a\begin{aligned} \frac{\partial y}{\partial \boldsymbol{ a}} &= \frac{\partial a_{kl}a_{kl}}{\partial a_{ij}} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j}\\ &= \left[\frac{\partial a_{kl}}{\partial a_{ij}} a_{kl} + a_{kl} \frac{\partial a_{kl}}{\partial a_{ij}}\right]\underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} \\ &= \left[\delta_{ki}\delta_{lj} a_{kl} + a_{kl} \delta_{ki}\delta_{lj}\right]\underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} \\ &= \left[a_{ij} + a_{ij}\right]\underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} = 2a_{ij}\underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} = 2\boldsymbol{ a} \end{aligned}

Gradient

Some operations wrt. the coordinates are so common that they have their own name and notation. The concept of a gradient, f\nabla f, of a scalar function, f(x)f(\underline{\boldsymbol{ x}}), is well known. In our notation, we would then have

grad(f)=fx=if(x)ei\begin{aligned} \text{grad}(f) = \frac{\partial f}{\partial \underline{\boldsymbol{ x}}} = \nabla_i f(\underline{\boldsymbol{ x}}) \underline{\boldsymbol{ e}}_{ i} \end{aligned}

And we will define the vector operator \underline{\boldsymbol{ \nabla}} as

=iei=xiei\begin{aligned} \underline{\boldsymbol{ \nabla}} = \nabla_i \underline{\boldsymbol{ e}}_{ i} = \frac{\partial }{\partial x_{i}} \underline{\boldsymbol{ e}}_{ i} \end{aligned}

The gradient of higher order tensors is then possible to express as, e.g., v \underline{\boldsymbol{ v}}\otimes\underline{\boldsymbol{ \nabla}} and a \boldsymbol{ a}\otimes\underline{\boldsymbol{ \nabla}}.

To clarify what operand the gradient is acting on in a larger expression, it can be necessary to enclose the entire expression in brackets:

  • ab\boldsymbol{ a} \boldsymbol{ b}\otimes\underline{\boldsymbol{ \nabla}}: Not clear if the gradient is acting on b\boldsymbol{ b} or the expression ab\boldsymbol{ a}\boldsymbol{ b}

  • a[b]\boldsymbol{ a}\left[ \boldsymbol{ b}\otimes\underline{\boldsymbol{ \nabla}}\right]: Gradient acting on b\boldsymbol{ b}

  • [ab] \left[\boldsymbol{ a}\boldsymbol{ b}\right]\otimes\underline{\boldsymbol{ \nabla}}: Gradient acting on the expression ab\boldsymbol{ a}\boldsymbol{ b}

  • c[ab]\boldsymbol{ c} \left[\boldsymbol{ a}\boldsymbol{ b}\right]\otimes\underline{\boldsymbol{ \nabla}}: Not clear if gradient is acting on ab\boldsymbol{ a}\boldsymbol{ b} or c[ab]\boldsymbol{ c}\left[\boldsymbol{ a}\boldsymbol{ b}\right]

  • c[[ab]]\boldsymbol{ c}\left[ \left[\boldsymbol{ a}\boldsymbol{ b}\right]\otimes\underline{\boldsymbol{ \nabla}}\right]: Gradient is acting on ab\boldsymbol{ a}\boldsymbol{ b}

In some cases, brackets are required also for regular expression, e.g. C= A:[ab] D=[ A:a]b\textbf{\textsf{ C}}=\textbf{\textsf{ A}}:\left[\boldsymbol{ a}\overline{\otimes}\boldsymbol{ b}\right] \neq \textbf{\textsf{ D}}=\left[\textbf{\textsf{ A}}:\boldsymbol{ a}\right]\overline{\otimes}\boldsymbol{ b} ( Cijkl= Aijmnamkbnl Dijkl= Aikmnamnbjl\textsf{ C}_{ ijkl}=\textsf{ A}_{ ijmn}a_{mk}b_{nl}\neq\textsf{ D}_{ ijkl}=\textsf{ A}_{ ikmn}a_{mn}b_{jl}). However, it is more often required when working with the \underline{\boldsymbol{ \nabla}} operator: It's always better to add an extra bracket to be extra clear and avoid mistakes.

Divergence

The divergence, div(v)\text{div}(\boldsymbol{ v}), can also be more generally defined by using the \underline{\boldsymbol{ \nabla}} operator as e.g.

  • Divergence of 1st order tensor: v \underline{\boldsymbol{ v}}\cdot\underline{\boldsymbol{ \nabla}}

  • Divergence of 2nd order tensor: a \boldsymbol{ a}\cdot\underline{\boldsymbol{ \nabla}}

Divergence of higher order tensors is not common. As for the gradient, brackets are crucial to ensure that we know which operand (tensor) \underline{\boldsymbol{ \nabla}} is operating on.

Curl

The curl of a vector field, v(x)\underline{\boldsymbol{ v}}(\underline{\boldsymbol{ x}}), is defined as

curl(v)=v×\begin{aligned} \text{curl}(\underline{\boldsymbol{ v}}) &= - \underline{\boldsymbol{ v}}\times\underline{\boldsymbol{ \nabla}} \end{aligned}

This operation is common in fluid mechanics to find the rotation of a velocity field, v\underline{\boldsymbol{ v}}.

It is also possible to define the curl for higher order tensors. Here we use the definition from Rubin (2000):

curl(a)=a×\begin{aligned} \text{curl}(\boldsymbol{ a}) &= - \boldsymbol{ a}\times\underline{\boldsymbol{ \nabla}} \end{aligned}

which is the same as for vectors. An important property of the curl of a the gradient of a vector is

[u]×=0\begin{aligned} - \left[ \underline{\boldsymbol{ u}}\otimes\underline{\boldsymbol{ \nabla}}\right]\times\underline{\boldsymbol{ \nabla}} = \boldsymbol{ 0} \end{aligned}
For the curl of a second order tensor, curl(a)\text{curl}(\boldsymbol{ a}), it can be written as
curl(a)=εopjaipxoeiej\begin{aligned} \text{curl}(\boldsymbol{ a}) = \varepsilon_{opj} \frac{\partial a_{ip}}{\partial x_o} \underline{\boldsymbol{ e}}_{ i}\otimes\underline{\boldsymbol{ e}}_{ j} \end{aligned}
In the different variations, it could have the opposite sign, a\boldsymbol{ a} could be transposed, or the result could be transposed. In many use cases, the sign, and whether or not the result is transposed, is not critical. However, definitions that have a\boldsymbol{ a} transposed do not fulfill the important identity that [u]×=0- \left[ \underline{\boldsymbol{ u}}\otimes\underline{\boldsymbol{ \nabla}}\right]\times\underline{\boldsymbol{ \nabla}} = \boldsymbol{ 0} and should be avoided!