Archive

Posts Tagged ‘Norms’

How to compute hard-to-compute matrix norms

January 11th, 2016

There are a wide variety of different norms of matrices and operators that are useful in many different contexts. Some matrix norms, such as the Schatten norms and Ky Fan norms, are easy to compute thanks to the singular value decomposition. However, the computation of many other norms, such as the induced p-norms (when p ≠ 1, 2, ∞), is NP-hard. In this post, we will look at a general method for getting quite good estimates of almost any matrix norm.

The basic idea is that every norm can be written as a maximization of a convex function over a convex set (in particular, every norm can be written as a maximization over the unit ball of the dual norm). However, this maximization is often difficult to deal with or solve analytically, so instead it can help to write the norm as a maximization over two or more simpler sets, each of which can be solved individually. To illustrate how this works, let’s start with the induced matrix norms.

Induced matrix norms

The induced p → q norm of a matrix B is defined as follows:

\(\displaystyle\|B\|_{p\rightarrow q}:=\max\big\{\|B\mathbf{x}\|_q : \|\mathbf{x}\|_p = 1\big\},\)

where

\(\displaystyle\|\mathbf{x}\|_p := \left(\sum_{i}|x_i|^p\right)^{1/p}\)

is the vector p-norm. There are three special cases of these norms that are easy to compute:

  1. When p = q = 2, this is the usual operator norm of B (i.e., its largest singular value).
  2. When p = q = 1, this is the maximum absolute column sum: \(\|B\|_{1\rightarrow 1} = \max_j\sum_i|b_{ij}|\).
  3. When p = q = ∞, this is the maximum absolute row sum: \(\|B\|_{\infty\rightarrow \infty} = \max_i\sum_j|b_{ij}|\).

However, outside of these three special cases (and some other special cases, such as when B only has real entries that are non-negative [1]), this norm is much messier. In general, its computation is NP-hard [2], so how can we get a good idea of its value? Well, we rewrite the norm as the following double maximization:

\(\displaystyle\|B\|_{p\rightarrow q}=\max\big\{|\mathbf{y}^*B\mathbf{x}| : \|\mathbf{x}\|_p = 1, \|\mathbf{y}\|_{q^\prime} = 1\big\},\)

where \(q^\prime\) is the positive real number such that \(1/q + 1/q^\prime = 1\) (and we take \(q^\prime = 1\) if \(q = \infty\), and vice-versa). The idea is then to maximize over \(\mathbf{x}\) and \(\mathbf{y}\) one at a time, alternately.

  1. Start by setting \(j = 1\) and fixing a randomly-chosen vector \(\mathbf{x}_0\), scaled so that \(\|\mathbf{x}_0\|_{p} = 1\).
  2. Compute

    \(\max\big\{|\mathbf{y}^*B\mathbf{x}_{j-1}| : \|\mathbf{y}\|_{q^\prime} = 1\big\},\)

    keeping \(\mathbf{x}_{j-1}\) fixed, and let \(\mathbf{y}_j\) be the vector attaining this maximum. By Hölder’s inequality, we know that this maximum value is exactly equal to \(\|B\mathbf{x}_{j-1}\|_{q}\). Furthermore, the equality condition of Hölder’s inequality tells us that the vector \(\mathbf{y}_j\) attaining this maximum is the one with complex phases that are the same as those of \(B\mathbf{x}_{j-1}\), and whose magnitudes are such that \(|\mathbf{y}_j|^{q^\prime}\) is a multiple of \(|B\mathbf{x}_{j-1}|^q\) (here the notation \(|\cdot|^q\) means we take the absolute value and the q-th power of every entry of the vector).

  3. Compute

    \(\max\big\{|\mathbf{y}_j^*B\mathbf{x}| : \|\mathbf{x}\|_{p} = 1\big\},\)

    keeping \(\mathbf{y}_j\) fixed, and let \(\mathbf{x}_j\) be the vector attaining this maximum. By an argument almost identical to that of step 2, this maximum is equal to \(\|B^*\mathbf{y}_j\|_{p^\prime}\), where \(p^\prime\) is the positive real number such that \(1/p + 1/p^\prime = 1\). Furthermore, the vector \(\mathbf{x}_j\) attaining this maximum is the one with complex phases that are the same as those of \(B^*\mathbf{y}_j\), and whose magnitudes are such that \(|\mathbf{x}_j|^p\) is a multiple of \(|B^*\mathbf{y}_j|^{p^\prime}\).

  4. Increment \(j\) by 1 and return to step 2. Repeat until negligible gains are made after each iteration.

This algorithm is extremely quick to run, since Hölder’s inequality tells us exactly how to solve each of the two maximizations separately, so we’re left only performing simple vector calculations at each step. The downside of this algorithm is that, even though it will always converge to some local maximum, it might converge to a value that is smaller than the true induced p → q norm. However, in practice this algorithm is fast enough that it can be run several thousand times with different (randomly-chosen) starting vectors \(\mathbf{x}_0\) to get an extremely good idea of the value of \(\|B\|_{p\rightarrow q}\).

It is worth noting that this algorithm is essentially the same as the one presented in [3], and reduces to the power method for finding the largest singular value when p = q = 2. This algorithm has been implemented in the QETLAB package for MATLAB as the InducedMatrixNorm function.

Induced Schatten superoperator norms

There is a natural family of induced norms on superoperators (i.e., linear maps \(\Phi : M_n \rightarrow M_n\)) as well. First, for a matrix \(X \in M_n\), we define its Schatten p-norm to be the p-norm of its vector of singular values:

\(\|X\|_p := \left(\sum_{i=1}^n \sigma_i(X)^p\right)^{1/p}.\)

Three special cases of the Schatten p-norms include:

  • p = 1, which is often called the “trace norm” or “nuclear norm”,
  • p = 2, which is often called the “Frobenius norm” or “Hilbert–Schmidt norm”, and
  • p = ∞, which is the usual operator norm.

The Schatten norms themselves are easy to compute (since singular values are easy to compute), but their induced counter-parts are not.

Given a superoperator \(\Phi : M_n \rightarrow M_n\), its induced Schatten p → q norm is defined as follows:

\(\|\Phi\|_{p\rightarrow q} := \max\big\{ \|\Phi(X)\|_q : \|X\|_p = 1 \big\}.\)

These induced Schatten norms were studied in some depth in [4], and crop up fairly frequently in quantum information theory (especially when p = q = 1) and operator theory (especially when p = q = ∞). The fact that they are NP-hard to compute in general is not surprising, since they reduce to the induced matrix norms (discussed earlier) in the case when \(\Phi\) only acts on the diagonal entries of \(X\) and just zeros out the off-diagonal entries. However, it seems likely that this norm’s computation is also difficult even in the special cases p = q = 1 and p = q = ∞ (however, it is straightforward to compute when p = q = 2).

Nevertheless, we can obtain good estimates of this norm’s value numerically using essentially the same method as discussed in the previous section. We start by rewriting the norm as a double maximization, where each maximization individually is easy to deal with:

\(\|\Phi\|_{p\rightarrow q} = \max\big\{ |\mathrm{Tr}(Y^*\Phi(X))| : \|X\|_p = 1, \|Y\|_{q^\prime} = 1\big\},\)

where \(q^\prime\) is again the positive real number (or infinity) satisfying \(1/q + 1/q^\prime = 1\). We now maximize over \(X\) and \(Y\), one at a time, alternately, just as before:

  1. Start by setting \(j = 1\) and fixing a randomly-chosen matrix \(X_0\), scaled so that \(\|X_0\|_p = 1\).
  2. Compute

    \(\max\big\{|\mathrm{Tr}(Y^*\Phi(X_{j-1})| : \|Y\|_{q^\prime} = 1\big\},\)

    keeping \(X_{j-1}\) fixed, and let \(Y_j\) be the matrix attaining this maximum. By the Hölder inequality for Schatten norms, we know that this maximum value is exactly equal to \(\|\Phi(X_{j-1})\|_{q}\). Furthermore, the matrix \(Y_j\) attaining this maximum is the one with the same left and right singular vectors as \(\Phi(X_{j-1})\), and whose singular values are such that there is a constant \(c\) so that \(\sigma_i(Y_j)^{q^\prime} = c\sigma_i(\Phi(X_{j-1}))^q\) for all \(i\) (i.e., the vector of singular values of \(Y_j\), raised to the \(q^\prime\) power, is a multiple of the vector of singular values of \(\Phi(X_{j-1})\), raised to the \(q\) power).

  3. Compute

    \(\max\big\{|\mathrm{Tr}(Y_j^*\Phi(X)| : \|X\|_{p} = 1\big\},\)

    keeping \(Y_j\) fixed, and let \(X_j\) be the matrix attaining this maximum. By essentially the same argument as in step 2, we know that this maximum value is exactly equal to \(\|\Phi^*(Y_j)\|_{p^\prime}\), where \(\Phi^*\) is the map that is dual to \(\Phi\) in the Hilbert–Schmidt inner product. Furthermore, the matrix \(X_j\) attaining this maximum is the one with the same left and right singular vectors as \(\Phi^*(Y_j)\), and whose singular values are such that there is a constant \(c\) so that \(\sigma_i(X_j)^{p} = c\sigma_i(\Phi^*(Y_j))^{p^\prime}\) for all \(i\).

  4. Increment \(j\) by 1 and return to step 2. Repeat until negligible gains are made after each iteration.

The above algorithm is almost identical to the algorithm presented for induced matrix norms, but with absolute values and complex phases of the vectors \(\mathbf{x}\) and \(\mathbf{y}\) replaced by the singular values and singular vectors of the matrices \(X\) and \(Y\), respectively. The entire algorithm is still extremely quick to run, since each step just involves computing one singular value decomposition.

The downside of this algorithm, as with the induced matrix norm algorithm, is that we have no guarantee that this method will actually converge to the induced Schatten p → q norm; only that it will converge to some lower bound of it. However, the algorithm works pretty well in practice, and is fast enough that we can simply run it a few thousand times to get a very good idea of what the norm actually is. If you’re interested in making use of this algorithm, it has been implemented in QETLAB as the InducedSchattenNorm function.

Entanglement Norms

The central idea used for the previous two families of norms can also be used to get lower bounds on the following norm on \(M_m \otimes M_n\) that comes up from time to time when dealing with quantum entanglement:

\(\|X\|_{S(1)} := \max\Big\{\big|(\mathbf{v}\otimes\mathbf{w})^*X(\mathbf{x} \otimes \mathbf{y})\big| : \|\mathbf{v}\| = \|\mathbf{w}\| = \|\mathbf{x}\| = \|\mathbf{y}\| = 1\Big\}.\)

(As a side note: this norm, and some other ones like it, were the central focus on my thesis.) This norm is already written for us as a double maximization, so the idea presented in the previous two sections is somewhat clearer from the start: we fix randomly-generated vectors \(\mathbf{v}\) and \(\mathbf{x}\) and then maximize over all vectors \(\mathbf{w}\) and \(\mathbf{y}\), which can be done simply by computing the left and right singular vectors associated with the maximum singular value of the operator

\((\mathbf{v} \otimes I)^*X(\mathbf{x} \otimes I) \in M_n.\)

We then fix \(\mathbf{w}\) and \(\mathbf{y}\) as those singular vectors and then maximize over all vectors \(\mathbf{v}\) and \(\mathbf{x}\) (which is again a singular value problem), and we iterate back and forth until we converge to some value.

As with the previously-discussed norms, this algorithm always converges, and it converges to a lower bound of \(\|X\|_{S(1)}\), but perhaps not its exact value. If you want to take this algorithm out for a spin, it has been implemented in QETLAB as the sk_iterate function.

It’s also worth mentioning that this algorithm generalizes straightforwardly in several different directions. For example, it can be used to find lower bounds on the norms \(\|\cdot\|_{S(k)}\) where we maximize on the left and right by pure states with Schmidt rank not larger than k rather than separable pure states, and it can be used to find lower bounds on the geometric measure of entanglement [5].

References:

  1. D. Steinberg. Computation of matrix norms with applications to robust optimization. Research thesis. Technion – Israel University of Technology, 2005.
  2. J. M. Hendrickx and A. Olshevsky. Matrix p-norms are NP-hard to approximate if p ≠ 1,2,∞. 2009. E-print: arXiv:0908.1397
  3. D. W. Boyd. The power method for ℓp norms. Linear Algebra and Its Applications, 9:95–101, 1974.
  4. J. Watrous. Notes on super-operator norms induced by Schatten norms. Quantum Information & Computation, 5(1):58–68, 2005. E-print: arXiv:quant-ph/0411077
  5. T.-C. Wei and P. M. Goldbart. Geometric measure of entanglement and applications to bipartite and multipartite quantum states. Physical Review A, 68:042307, 2003. E-print: arXiv:quant-ph/0212030

Norms and Dual Norms as Supremums and Infimums

May 26th, 2012

Let \(\mathcal{H}\) be a finite-dimensional Hilbert space over \(\mathbb{R}\) or \(\mathbb{C}\) (the fields of real and complex numbers, respectively). If we let \(\|\cdot\|\) be a norm on \(\mathcal{H}\) (not necessarily the norm induced by the inner product), then the dual norm of \(\|\cdot\|\) is defined by

\(\displaystyle\|\mathbf{v}\|^\circ := \sup_{\mathbf{w} \in \mathcal{H}}\Big\{ \big| \langle \mathbf{v}, \mathbf{w} \rangle \big| : \|\mathbf{w}\| \leq 1 \Big\}.\)

The double-dual of a norm is equal to itself (i.e., \(\|\cdot\|^{\circ\circ} = \|\cdot\|\)) and the norm induced by the inner product is the unique norm that is its own dual. Similarly, if \(\|\cdot\|_p\) is the vector p-norm, then \(\|\cdot\|_p^\circ = \|\cdot\|_q\), where \(q\) satisfies \(1/p + 1/q = 1\).

In this post, we will demonstrate that \(\|\cdot\|^\circ\) has an equivalent characterization as an infimum, and we use this characterization to provide a simple derivation of several known (but perhaps not well-known) formulas for norms such as the operator norm of matrices.

For certain norms (such as the “separability norms” presented at the end of this post), this ability to write a norm as both an infimum and a supremum is useful because computation of the norm may be difficult. However, having these two different characterizations of a norm allows us to bound it both from above and from below.

The Dual Norm as an Infimum

Theorem 1. Let \(S \subseteq \mathcal{H}\) be a bounded set satisfying \({\rm span}(S) = \mathcal{H}\) and define a norm \(\|\cdot\|\) by

\(\displaystyle\|\mathbf{v}\| := \sup_{\mathbf{w} \in S}\Big\{ \big| \langle \mathbf{v}, \mathbf{w} \rangle \big| \Big\}.\)

Then \(\|\cdot\|^\circ\) is given by

\(\displaystyle\|\mathbf{v}\|^\circ = \inf\Big\{ \sum_i |c_i| : \mathbf{v} = \sum_i c_i \mathbf{v}_i, \mathbf{v}_i \in S \ \forall \, i \Big\},\)

where the infimum is taken over all such decompositions of \(\mathbf{v}\).

Before proving the result, we make two observations. Firstly, the quantity \(\|\cdot\|\) described by Theorem 1 really is a norm: boundedness of \(S\) ensures that the supremum is finite, and \({\rm span}(S) = \mathcal{H}\) ensures that \(\|\mathbf{v}\| = 0 \implies \mathbf{v} = 0\). Secondly, every norm on \(\mathcal{H}\) can be written in this way: we can always choose \(S\) to be the unit ball of the dual norm \(\|\cdot\|^\circ\). However, there are times when other choices of \(S\) are more useful or enlightening (as we will see in the examples).

Proof of Theorem 1. Begin by noting that if \(\mathbf{w} \in S\) and \(\|\mathbf{v}\| \leq 1\) then \(\big| \langle \mathbf{v}, \mathbf{w} \rangle \big| \leq 1\). It follows that \(\|\mathbf{w}\|^{\circ} \leq 1\) whenever \(\mathbf{w} \in S\). In fact, we now show that \(\|\cdot\|^\circ\) is the largest norm on \(\mathcal{H}\) with this property. To this end, let \(\|\cdot\|_\prime\) be another norm satisfying \(\|\mathbf{w}\|_{\prime}^{\circ} \leq 1\) whenever \(\mathbf{w} \in S\). Then

\(\displaystyle \| \mathbf{v} \| = \sup_{\mathbf{w} \in S} \Big\{ \big| \langle \mathbf{w}, \mathbf{v} \rangle \big| \Big\} \leq \sup_{\mathbf{w}} \Big\{ \big| \langle \mathbf{w}, \mathbf{v} \rangle \big| : \|\mathbf{w}\|_{\prime}^{\circ} \leq 1 \Big\} = \|\mathbf{v}\|_\prime.\)

Thus  \(\| \cdot \| \leq \| \cdot \|_\prime\), so by taking duals we see that \(\| \cdot \|^\circ \geq \| \cdot \|_\prime^\circ\), as desired.

For the remainder of the proof, we denote the infimum in the statement of the theorem by \(\|\cdot\|_{{\rm inf}}\). Our goal now is to show that: (1) \(\|\cdot\|_{{\rm inf}}\) is a norm, (2) \(\|\cdot\|_{{\rm inf}}\) satisfies \(\|\mathbf{w}\|_{{\rm inf}} \leq 1\) whenever \(\mathbf{w} \in S\), and (3) \(\|\cdot\|_{{\rm inf}}\) is the largest norm satisfying property (2). The fact that \(\|\cdot\|_{{\rm inf}} = \|\cdot\|^\circ\) will then follow from the first paragraph of this proof.

To see (1) (i.e., to prove that \(\|\cdot\|_{{\rm inf}}\) is a norm), we only prove the triangle inequality, since positive homogeneity and the fact that \(\|\mathbf{v}\|_{{\rm inf}} = 0\) if and only if \(\mathbf{v} = 0\) are both straightforward (try them yourself!). Fix \(\varepsilon > 0\) and let \(\mathbf{v} = \sum_i c_i \mathbf{v}_i\), \(\mathbf{w} = \sum_i d_i \mathbf{w}_i\) be decompositions of \(\mathbf{v}, \mathbf{w}\) with \(\mathbf{v}_i, \mathbf{w}_i \in S\) for all i, satisfying \(\sum_i |c_i| \leq \|\mathbf{v}\|_{{\rm inf}} + \varepsilon\) and \(\sum_i |d_i| \leq \|\mathbf{w}\|_{{\rm inf}} + \varepsilon\). Then

\(\displaystyle \|\mathbf{v} + \mathbf{w}\|_{{\rm inf}} \leq \sum_i |c_i| + \sum_i |d_i| \leq \|\mathbf{v}\|_{{\rm inf}} + \|\mathbf{w}\|_{{\rm inf}} + 2\varepsilon.\)

Since \(\varepsilon > 0\) was arbitrary, the triangle inequality follows, so \(\|\cdot\|_{{\rm inf}}\) is a norm.

To see (2) (i.e., to prove that \(\|\mathbf{v}\|_{{\rm inf}} \leq 1\) whenever \(\mathbf{v} \in S\)), we simply write \(\mathbf{v}\) in its trivial decomposition \(\mathbf{v} = \mathbf{v}\), which gives the single coefficient \(c_1 = 1\), so \(\|\mathbf{v}\|_{{\rm inf}} \leq \sum_i c_i = c_1 = 1\).

To see (3) (i.e., to prove that \(\|\cdot\|_{{\rm inf}}\) is the largest norm on \(\mathcal{H}\) satisfying condition (2)), begin by letting \(\|\cdot\|_\prime\) be any norm on \(\mathcal{H}\) with the property that \(\|\mathbf{v}\|_{\prime} \leq 1\) for all \(\mathbf{v} \in S\). Then using the triangle inequality for \(\|\cdot\|_\prime\) shows that if \(\mathbf{v} = \sum_i c_i \mathbf{v}_i\) is any decomposition of \(\mathbf{v}\) with \(\mathbf{v}_i \in S\) for all i, then

\(\displaystyle\|\mathbf{v}\|_\prime = \Big\|\sum_i c_i \mathbf{v}_i\Big\|_\prime \leq \sum_i |c_i| \|\mathbf{v}_i\|_\prime = \sum_i |c_i|.\)

Taking the infimum over all such decompositions of \(\mathbf{v}\) shows that \(\|\mathbf{v}\|_\prime \leq \|\mathbf{v}\|_{{\rm inf}}\), which completes the proof.

The remainder of this post is devoted to investigating what Theorem 1 says about certain specific norms.

Injective and Projective Cross Norms

If we let \(\mathcal{H} = \mathcal{H}_1 \otimes \mathcal{H}_2\), where \(\mathcal{H}_1\) and \(\mathcal{H}_2\) are themselves finite-dimensional Hilbert spaces, then one often considers the injective and projective cross norms on \(\mathcal{H}\), defined respectively as follows:

\(\displaystyle \|\mathbf{v}\|_{I} := \sup\Big\{ \big| \langle \mathbf{v}, \mathbf{a} \otimes \mathbf{b} \rangle \big| : \|\mathbf{a}\| = \|\mathbf{b}\| = 1 \Big\} \text{ and}\)

\(\displaystyle \|\mathbf{v}\|_{P} := \inf\Big\{ \sum_i \| \mathbf{a}_i \| \| \mathbf{b}_i \| : \mathbf{v} = \sum_i \mathbf{a}_i \otimes \mathbf{b}_i \Big\},\)

where \(\|\cdot\|\) here refers to the norm induced by the inner product on \(\mathcal{H}_1\) or \(\mathcal{H}_2\). The fact that \(\|\cdot\|_{I}\) and \(\|\cdot\|_{P}\) are duals of each other is simply Theorem 1 in the case when S is the set of product vectors:

\(\displaystyle S = \big\{ \mathbf{a} \otimes \mathbf{b} : \|\mathbf{a}\| = \|\mathbf{b}\| = 1 \big\}.\)

In fact, the typical proof that the injective and projective cross norms are duals of each other is very similar to the proof of Theorem 1 provided above (see [1, Chapter 1]).

Maximum and Taxicab Norms

Use \(n\) to denote the dimension of \(\mathcal{H}\) and let \(\{\mathbf{e}_i\}_{i=1}^n\) be an orthonormal basis of \(\mathcal{H}\). If we let \(S = \{\mathbf{e}_i\}_{i=1}^n\) then the norm \(\|\cdot\|\) in the statement of Theorem 1 is the maximum norm (i.e., the p = ∞ norm):

\(\displaystyle\|\mathbf{v}\|_\infty = \sup_i\Big\{\big|\langle \mathbf{v}, \mathbf{e}_i \rangle \big| \Big\} = \max \big\{ |v_1|,\ldots,|v_n|\big\},\)

where \(v_i = \langle \mathbf{v}, \mathbf{e}_i \rangle\) is the i-th coordinate of \(\mathbf{v}\) in the basis \(\{\mathbf{e}_i\}_{i=1}^n\). The theorem then says that the dual of the maximum norm is

\(\displaystyle \|\mathbf{v}\|_\infty^\circ = \inf \Big\{ \sum_i |c_i| : \mathbf{v} = \sum_i c_i \mathbf{e}_i \Big\} = \sum_{i=1}^n |v_i|,\)

which is the taxicab norm (i.e., the p = 1 norm), as we expect.

Operator and Trace Norm of Matrices

If we let \(\mathcal{H} = M_n\), the space of \(n \times n\) complex matrices with the Hilbert–Schmidt inner product

\(\displaystyle \big\langle A, B \big\rangle := {\rm Tr}(AB^*),\)

then it is well-known that the operator norm and the trace norm are dual to each other:

\(\displaystyle \big\| A \big\|_{op} := \sup_{\mathbf{v}}\Big\{ \big\|A\mathbf{v}\big\| : \|\mathbf{v}\| = 1 \Big\} \text{ and}\)

\(\displaystyle \big\| A \big\|_{op}^\circ = \big\|A\big\|_{tr} := \sup_{U}\Big\{ \big| {\rm Tr}(AU) \big| : U \in M_n \text{ is unitary} \Big\},\)

where \(\|\cdot\|\) is the Euclidean norm on \(\mathbb{C}^n\). If we let \(S\) be the set of unitary matrices in \(M_n\), then Theorem 1 provides the following alternate characterization of the operator norm:

Corollary 1. Let \(A \in M_n\). Then

\(\displaystyle \big\|A\big\|_{op} = \inf\Big\{ \sum_i |c_i| : A = \sum_i c_i U_i \text{ and each } U_i \text{ is unitary} \Big\}.\)

As an application of Corollary 1, we are able to provide the following characterization of unitarily-invariant norms (i.e., norms \(\|\cdot\|_{\prime}\) with the property that \(\big\|UAV\big\|_{\prime} = \big\|A\big\|_{\prime}\) for all unitary matrices \(U, V \in M_n\)):

Corollary 2. Let \(\|\cdot\|_\prime\) be a norm on \(M_n\). Then \(\|\cdot\|_\prime\) is unitarily-invariant if and only if

\(\displaystyle \big\|ABC\big\|_\prime \leq \big\|A\big\|_{op}\big\|B\big\|_{\prime}\big\|C\big\|_{op}\)

for all \(A, B, C \in M_n\).

Proof of Corollary 2. The “if” direction is straightforward: if we let \(A\) and \(C\) be unitary, then

\(\displaystyle \big\|B\big\|_\prime = \big\|A^*ABCC^*\big\|_\prime \leq \big\|ABC\big\|_\prime \leq \big\|B\big\|_{\prime},\)

where we used the fact that \(\big\|A\big\|_{op} = \big\|C\big\|_{op} = 1\). It follows that \(\big\|ABC\big\|_\prime = \big\|B\big\|_\prime\), so \(\|\cdot\|_\prime\) is unitarily-invariant.

To see the “only if” direction, write \(A = \sum_i c_i U_i\) and \(C = \sum_i d_i V_i\) with each \(U_i\) and \(V_i\) unitary. Then

\(\displaystyle \big\|ABC\big\|_\prime = \Big\|\sum_{i,j}c_i d_j U_i B V_j\Big\|_\prime \leq \sum_{i,j} |c_i| |d_j| \big\|U_i B V_j\big\|_\prime = \sum_{i,j} |c_i| |d_j| \big\|B\big\|_\prime.\)

By taking the infimum over all decompositions of \(A\) and \(C\) of the given form and using Corollary 1, the result follows.

An alternate proof of Corollary 2, making use of some results on singular values, can be found in [2, Proposition IV.2.4].

Separability Norms

As our final (and least well-known) example, let \(\mathcal{H} = M_m \otimes M_n\), again with the usual Hilbert–Schmidt inner product. If we let

\(\displaystyle S = \{ \mathbf{a}\mathbf{b}^* \otimes \mathbf{c}\mathbf{d}^* : \|\mathbf{a}\| = \|\mathbf{b}\| = \|\mathbf{c}\| = \|\mathbf{d}\| = 1 \},\)

where \(\|\cdot\|\) is the Euclidean norm on \(\mathbb{C}^m\) or \(\mathbb{C}^n\), then Theorem 1 tells us that the following two norms are dual to each other:

\(\displaystyle \big\|A\big\|_s := \sup\Big\{ \big| (\mathbf{a}^* \otimes \mathbf{c}^*)A(\mathbf{b} \otimes \mathbf{d}) \big| : \|\mathbf{a}\| = \|\mathbf{b}\| = \|\mathbf{c}\| = \|\mathbf{d}\| = 1 \Big\} \text{ and}\)

\(\displaystyle \big\|A\big\|_s^\circ = \inf\Big\{ \sum_i \big\|A_i\big\|_{tr}\big\|B_i\big\|_{tr} : A = \sum_i A_i \otimes B_i \Big\}.\)

There’s actually a little bit of work to be done to show that \(\|\cdot\|_s^\circ\) has the given form, but it’s only a couple lines – consider it an exercise for the interested reader.

Both of these norms come up frequently when dealing with quantum entanglement. The norm \(\|\cdot\|_s^\circ\) was the subject of [3], where it was shown that a quantum state \(\rho\) is entangled if and only if \(\|\rho\|_s^\circ > 1\) (I use the above duality relationship to provide an alternate proof of this fact in [4, Theorem 6.1.5]). On the other hand, the norm \(\|\cdot\|_s\) characterizes positive linear maps of matrices and was the subject of [5, 6].

References

  1. J. Diestel, J. H. Fourie, and J. Swart. The Metric Theory of Tensor Products: Grothendieck’s Résumé Revisited. American Mathematical Society, 2008. Chapter 1: pdf
  2. R. Bhatia. Matrix Analysis. Springer, 1997.
  3. O. Rudolph. A separability criterion for density operators. J. Phys. A: Math. Gen., 33:3951–3955, 2000. E-print: arXiv:quant-ph/0002026
  4. N. Johnston. Norms and Cones in the Theory of Quantum Entanglement. PhD thesis, University of Guelph, 2012.
  5. N. Johnston and D. W. Kribs. A Family of Norms With Applications in Quantum Information TheoryJournal of Mathematical Physics, 51:082202, 2010.
  6. N. Johnston and D. W. Kribs. A Family of Norms With Applications in Quantum Information Theory IIQuantum Information & Computation, 11(1 & 2):104–123, 2011.

MATLAB Scripts for Computing Completely Bounded Norms via Semidefinite Programming

July 23rd, 2011

Update [March 3, 2015]: I have released a MATLAB package called QETLAB that can compute the completely bounded and diamond norms (and much more) much faster than the code contained in this blog post. I recommend using QETLAB instead of the code below.


 

In operator theory, the completely bounded norm of a linear map on complex matrices \(\Phi : M_m \rightarrow M_n\) is defined by \(\|\Phi\|_{cb} := \sup_{k \geq 1} \| id_k \otimes \Phi \|\), where \(\|\Phi\|\) is the usual norm on linear maps defined by \(\|\Phi\| := \sup_{X \in M_m} \{ \|\Phi(X)\| : \|X\| \leq 1\}\) and \(\|X\|\) is the operator norm of \(X\) [1]. The completely bounded norm is particularly useful when thinking of \(M_m\) and \(M_n\) as operator spaces.

The dual of the completely bounded norm is called the diamond norm, which plays an important role in quantum information theory, as it can be used to measure the distance between quantum channels. The diamond norm of \(\Phi\) is typically denoted \(\|\Phi\|_{\diamond}\). For properties of the completely bounded and diamond norms, see [1,2,3].

A method for efficiently computing the completely bounded and diamond norms via semidefinite programming was recently presented in [4]. The purpose of this post is to provide MATLAB scripts that implement this algorithm and demonstrate its usage.

Download and Install

In order to make use of these scripts to compute the completely bounded or diamond norm, you must download and install two things: the SeDuMi semidefinite program solver and the MATLAB scripts themselves.

  1. SeDuMi – Please follow the instructions on the SeDuMi website to download and install it. If possible, you should install SeDuMi 1.1R3, not SeDuMi 1.21 or SeDuMi 1.3, since there is a bug with the newer versions when dealing with complex matrices.
  2. CB Norm MATLAB Package – Once SeDuMi is installed, download the CB norm MATLAB scripts, unzip them, and place them in your MATLAB scripts directory. The zip file contains 10 MATLAB scripts.

Once the scripts are installed, type “help CBNorm” or “help DiamondNorm” at the MATLAB prompt to learn how to use the CBNorm and DiamondNorm functions. Several usage examples are provided below.

Usage Examples

The representation of the linear map \(\Phi\) that the CBNorm and DiamondNorm functions take as input is a pair of arrays of its left- and right- generalized Choi-Kraus operators. That is, an array of operators \(\{A_i\}\) and \(\{B_i\}\) such that \(\Phi(X) = \sum_i A_i X B_i\) for all \(X\).

Basic Examples

If we wanted to compute the completely bounded and diamond norms of the map

the MATLAB input and output would be as follows:

>> PhiA(:,:,1) = [1,1;1,0];
>> PhiA(:,:,2) = [1,0;1,2];
>> PhiB(:,:,1) = [1,0;0,1];
>> PhiB(:,:,2) = [1,2;1,1];
>> CBNorm(PhiA,PhiB)

ans =

    7.2684

>> DiamondNorm(PhiA,PhiB)

ans =

    7.4124

So we see that its completely bounded norm is 7.2684 and its diamond norm is 7.4124.

If we instead want to compute the completely bounded or diamond norm of a completely positive map, we only need to provide its Kraus operators – i.e., operators \(\{A_i\}\) such that \(\Phi(X) = \sum_i A_i X A_i^\dagger\) for all \(X\). Furthermore, in this case semidefinite programming isn’t used at all, since [1, Proposition 3.6] tells us that \(\|\Phi\|_{cb} = \|\Phi(I)\|\) and \(\|\Phi\|_{\diamond} = \|\Phi^\dagger(I)\|\), and computing \(\|\Phi(I)\|\) is trivial. The following example demonstrates the usage of these scripts in this case, via a completely positive map \(\Phi : M_3 \rightarrow M_2\) with four (essentially random) Kraus operators:

>> PhiA(:,:,1) = [1 0 0;0 1 1];
>> PhiA(:,:,2) = [-3 0 1;5 1 1];
>> PhiA(:,:,3) = [0 2 0;0 0 0];
>> PhiA(:,:,4) = [1 1 3;0 2 0];
>> CBNorm(PhiA)

ans =

   42.0000

>> DiamondNorm(PhiA)

ans =

   38.7303

Transpose Map

Suppose we want to compute the completely bounded or diamond norm of the transpose map on \(M_n\). A generalized Choi-Kraus representation is given by defining \(A_{ij} = B_{ij} = e_i e_j^\dagger\), where \(\{e_i\}\) is the standard basis of \(\mathbb{C}^n\) (i.e., \(A_{ij}\) and \(B_{ij}\) are the operators with matrix representation in the standard basis with a one in the \((i,j)\)-entry and zeroes elsewhere). It is known that the completely bounded and diamond norms of the n-dimensional transpose map are both equal to n, which can be verified in small dimensions as follows:

>> % 2-dimensional transpose
>> PhiA(:,:,1) = [1 0;0 0];
>> PhiA(:,:,2) = [0 1;0 0];
>> PhiA(:,:,3) = [0 0;1 0];
>> PhiA(:,:,4) = [0 0;0 1];
>> PhiB = PhiA;
>> CBNorm(PhiA,PhiB)

ans =

    2.0000

>> DiamondNorm(PhiA,PhiB)

ans =

    2.0000
>> % 3-dimensional transpose
>> I = eye(3);
>> for i=1:3
for j=1:3
PhiA(:,:,3*(i-1)+j) = I(:,i)*I(j,:);
end
end
>> PhiB = PhiA;
>> CBNorm(PhiA,PhiB)

ans =

    3.0000

>> DiamondNorm(PhiA,PhiB)

ans =

    3.0000

Difference of Unitary Channels

Now consider the map \(\Phi : M_2 \rightarrow M_2\) defined by \(\Phi(X) = X – UXU^\dagger\), where \(U\) is the following unitary matrix:

We know from [2, Theorem 12] that the CB norm and diamond norm of \(\Phi\) are both equal to the diameter of the smallest closed disc containing all of the eigenvalues of \(U\). Because the eigenvalues of \(U\) are \((1 \pm i)/\sqrt{2}\), the smallest closed disc containing its eigenvalues has diameter \(\sqrt{2}\), so \(\|\Phi\|_{cb} = \|\Phi\|_{\diamond} = \sqrt{2}\). This result can be verified as follows:

>> PhiA(:,:,1) = [1 0;0 1];
>> PhiA(:,:,2) = [1 1;-1 1]/sqrt(2);
>> PhiB(:,:,1) = [1 0;0 1];
>> PhiB(:,:,2) = -[1 -1;1 1]/sqrt(2);
>> CBNorm(PhiA,PhiB)

ans =

    1.4142

>> DiamondNorm(PhiA,PhiB)

ans =

    1.4142

References

  1. V. I. Paulsen. Completely bounded maps and operator algebras. Cambridge University Press, 2003.
  2. N. Johnston, D. W. Kribs, and V. I. Paulsen. Computing stabilized norms for quantum operations via the theory of completely bounded maps. Quantum Inf. Comput., 9:16-35, 2009.
  3. J. Watrous. Theory of quantum information lecture notes.
  4. J. Watrous. Semidefinite programs for completely bounded norms. Theory Comput., 5:217–238, 2009.

Isometries of the Vector p-Norms: Signed and Complex Permutation Matrices

September 17th, 2010

Recall that in linear algebra, the vector p-norm of a vector x ∈ Cn (or x ∈ Rn) is defined to be

where xi is the ith element of x and 1 ≤ p ≤ ∞ (where the p = ∞ case is understood to mean the limit as p approaches ∞, which gives the maximum norm). By far the most well-known of these norms is the Euclidean norm, which arises when p = 2. Another well-known norm arises when p = 1, which gives the “taxicab” norm.

The problem that will be investigated in this post is to characterize what operators preserve the p-norms – i.e., what their isometries are. In the p = 2 case of the Euclidean norm, the answer is well-known: the isometries of the real Euclidean norm are exactly the orthogonal matrices, and the isometries of the complex Euclidean norm are exactly the unitary matrices. It turns out that if p ≠ 2 then the isometry group looks much different. Indeed, Exercise IV.1.3 of [1] asks the reader to show that the isometries of the p = 1 and p = ∞ norms are what are known as complex permutation matrices (to be defined). We will investigate those cases as well as a situation when p ≠ 1, 2, ∞.

p = 1: The “Taxicab” Norm

Recall that a permutation matrix is a matrix with exactly one “1” in each of its rows and columns, and a “0” in every other position. A signed permutation matrix (sometimes called a generalized permutation matrix) is similar – every row and column has exactly one non-zero entry, which is either 1 or -1. Similarly, a complex permutation matrix is a matrix for which every row and column has exactly one non-zero entry, and every non-zero entry is a complex number with modulus 1.

It is not difficult to show that if x ∈ Rn then the taxicab norm of x is preserved by signed permutation matrices, and if x ∈ Cn then the taxicab norm of x is preserved by complex permutation matrices. We will now show that the converse holds:

Theorem 1. Let P ∈ Mn be an n × n matrix. Then

if and only if P is a complex permutation matrix (or a signed permutation matrix, respectively).

Proof. We only prove the “only if” implication, because the “if” implication is trivial (an exercise left for the reader?). So let’s suppose that P is an isometry of the p = 1 vector norm. Let ei denote the ith standard basis vector, let pi denote the ith column of P, and let pij denote the (j,i)-entry of P (i.e., the jth entry of pi). Then Pei = pi for all i, so

Similarly, P(ei + ek) = pi + pk for all i,k, so

However, by the triangle inequality for the absolute value we know that the above equality can only hold if there exist non-negative real constants cijk ≥ 0 such that pij = cijkpkj. However, it is similarly the case that P(ei – ek) = pi – pk for all i,k, so

Using the equality condition for the complex absolute value again we then know that there exist non-negative real constants dijk ≥ 0 such that pij = -dijkpkj. Using the fact that each cijk and each dijk is non-negative, it follows that each row contains at most one non-zero entry (and each row must indeed contain at least one non-zero entry since the isometries of any norm must be nonsingular).

Thus every row has exactly one non-zero entry. By using (again) the fact that isometries must be nonsingular, it follows that each of the non-zero entries must occur in a distinct column (otherwise there would be a zero column). The fact that each non-zero entry has modulus 1 follows from simply noting that P must preserve the p = 1 norm of each ei.

p = ∞: The Maximum Norm

As with the p = 1 case, it is not difficult to show that if x ∈ Rn then the maximum norm of x is preserved by signed permutation matrices, and if x ∈ Cn then the maximum norm of x is preserved by complex permutation matrices. We will now show that the converse holds in this case as well:

Theorem 2. Let P ∈ Mn be an n × n matrix. Then

if and only if P is a complex permutation matrix (or a signed permutation matrix, respectively).

Proof. Again, we only prove the “only if” implication, since the “if” implication is trivial. So suppose that P is an isometry of the p = ∞ vector norm. As before, let ei denote the ith standard basis vector, let pi denote the ith column of P, and let pij denote the (j,i)-entry of P (i.e., the jth entry of pi). Then Pei = pi for all i, so

In other words, each entry of P has modulus at most 1, and each column has at least one element with modulus equal to 1. Also, P(ei ± ek) = pi ± pk for all i,k, so

It follows that if |pij| = 1, then pkj = 0 for all k ≠ i. Because each column has an element with modulus 1, it follows that each row has exactly 1 non-zero entry. Because each column has an entry with modulus 1, it follows that each row and column has exactly 1 non-zero entry, which must have modulus 1, so P is a signed or complex permutation matrix.

Any p ≠ 2

When p = 2, the isometries are orthogonal/unitary matrices. When p = 1 or p = ∞, the isometries are signed/complex permutation matrices, which are a very small subset of the orthogonal/unitary matrices. One might naively expect that the isometries for other values of p somehow interpolate between those two extremes. Alternatively, one might expect that the signed/complex permutation matrices are the only isometries for all other values of p as well. It turns out that the latter conjecture is correct [2,3].

Theorem 3. Let P ∈ Mn be an n × n matrix and let p ∈ [1,2) ∪ (2,∞]. Then

if and only if P is a complex permutation matrix (or a signed permutation matrix, respectively).

References:

  1. R. Bhatia, Matrix analysis. Volume 169 of Graduate texts in mathematics (1997).
  2. S. Chang and C. K. Li, Certain Isometries on Rn. Linear Algebra Appl. 165, 251–265 (1992).
  3. C. K. Li, W. So, Isometries of lp norm. Amer. Math. Monthly 101, 452–453 (1994).