Tài liệu An iterative greedy algorithm for sparsityconstrained optimization - Nguyen Ngoc Minh: Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 40, 12 - 2015 109
AN ITERATIVE GREEDY ALGORITHM FOR SPARSITY-
CONSTRAINED OPTIMIZATION
Nguyen Ngoc Minh1*, Nguyen Nam2
Abstract: This paper presents an iterative greedy algorithm, namely gradient
matching pursuit (gradMP), to solve a general sparsity-constrained optimization.
Our algorithm generalizes the idea of the CoSaMP algorithm to handle the non-
quadratic structure of the objective function together with the general form of the
sparse constraint. The algorithm thus can solve for broader class of objective
functions and sparsity constraints. In addition, our simulations on various problems
also demonstrates that our algorithm usually leads to comparable performance, if
not better, than the convex minimization approach.
Keywords: Matching pursuit, GradMP, CoSaMP.
1. INTRODUCTION
Over the past decade, the problem of recovering low-dimensional structured models
from highly incomplete info...
9 trang |
Chia sẻ: quangot475 | Lượt xem: 595 | Lượt tải: 0
Bạn đang xem nội dung tài liệu An iterative greedy algorithm for sparsityconstrained optimization - Nguyen Ngoc Minh, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 40, 12 - 2015 109
AN ITERATIVE GREEDY ALGORITHM FOR SPARSITY-
CONSTRAINED OPTIMIZATION
Nguyen Ngoc Minh1*, Nguyen Nam2
Abstract: This paper presents an iterative greedy algorithm, namely gradient
matching pursuit (gradMP), to solve a general sparsity-constrained optimization.
Our algorithm generalizes the idea of the CoSaMP algorithm to handle the non-
quadratic structure of the objective function together with the general form of the
sparse constraint. The algorithm thus can solve for broader class of objective
functions and sparsity constraints. In addition, our simulations on various problems
also demonstrates that our algorithm usually leads to comparable performance, if
not better, than the convex minimization approach.
Keywords: Matching pursuit, GradMP, CoSaMP.
1. INTRODUCTION
Over the past decade, the problem of recovering low-dimensional structured models
from highly incomplete information has attracted increasing attentions by researchers from
diverse different fields such as signal processing, machine learning and statistics. Work in
compressed sensing has shown that a sparse signal can be precisely reconstructed from a
small number of nonadaptive linear measurements, which is proportional to a logarithmic
factor multiplying with the signal's sparsity level. In high-dimensional statistics, it has
demonstrated that the linear regression parameter can be statistically well estimated from a
small set of observations assuming that the parameter vector is sufficiently sparse[1].
Recently, these ideas have been extended to other models such as a low-rank structure, in
which it has been studied that a low-rank data matrix can be recovered from a small
amount of samples being proportional to the degree of freedom of the matrix. On the other
direction, Gaussian graphical model estimation can be formulated as a sparse recovery
problem with the log likelihood loss, in which it has been shown that the graph can be re-
ensembled assuming that underlying graph is sufficiently sparse.
A common theme of these problems is the assumption of low-dimensional structures
imposed on the signal or the parameters need to be estimated. In general, in order to
recover the signal or the parameter of interest, those problems mentioned above can be
formulated as minimizing a loss function subject to a sparse constraint:
min ( )
x
x subject to
0,D
x k (1)
where
0,D
x is the norm which measures the "sparsity" of x with respect to a given set D.
For example, if the set D consists of n standard vectors in n-dimension, then
0,D
x is the
0 -norm of x which counts the number of nonzero coefficients of x. Also in this
optimization, k is a user-defined parameter which is used to control the sparsity level of
the estimator, and ( )x is the objective function which measures the consistency between
the observation and the parameter needing to be estimated. In this paper, ( )x is a smooth,
but not necessarily a convex function. This estimation problem is sufficiently general to
cover various problems of interest. For instance, in compressed sensing and sparse linear
regression, we would like to recover a sparse signal * nx from limited linear
observations *:ny y Ax e , where A is the m n measurement matrix and e is
Công nghệ thông tin & Khoa học máy tính
N.N.Minh, N.Nam, “An interative greedy algorithm for sparsity-constrained optimization.” 110
the noise vector. Naturally, the objective function ( )x is formulated as the quadratic loss:
2
2
( )x y Ax and the constraint is the 0 -norm. Similarly, in matrix recovery where
one would like to recover a low-rank matrix *X from linear observations *y Ax e ,
the loss function is again quadratic
2
2
y Ax and the constraint is now the matrix rank.
Recent years have witnessed a blossom of various algorithms proposed to solve (1) or
variants of this optimization with specific loss functions and constraints. In general, these
algorithms can be categorized into two main approaches: The first approach is to recast
this nonconvex programming into a convex counterpart by using the 1 -norm to relax the
0 counterpart. In particular, if ( )x is the convex function and 0,Dx k is the
standard 0 -norm of x, the regularized convex relaxation of (1) is as follows
1min ( )
x
x x (2)
where is the positive regularization parameter and the 1 -norm 1x is simply defined
as the absolute sum of its coefficients (we note that in the case x is the matrix, the 0 -
norm is considered as the matrix rank and 1 -norm of x is replaced by the nuclear norm,
which is the sum of singular values of x ). The optimization (2) is convex and can be
solved efficiently by many proposed algorithms.
Together with the 1 minimization approach, iterative methods such as greedy [1] [2]
and iterative hard thresh holding [3] has shown to be very efficient in recovering sparse
signals from very limited amount of observations. These methods are usually easier to
implement and in many cases computationally faster than the 1 counterpart.
The greedy methods are initially introduced under the name of matching pursuit (MP)
[1] and orthogonal matching pursuit (OMP). While MP and OMP are simple and easy to
implement, their lack of provable reconstruction quality might prevent them from being
widely applied in practice. Recently, several researchers have developed various
extensions of the MP and OMP algorithms to significantly improve the recovery
performance as well as to provide the optimal theoretical guarantees. Among them are
compressive sampling matching pursuit (CoSaMP) [2] and subspace pursuit [4]. However,
these algorithms only consider the situations in which the loss function is quadratic.
In this paper, we propose an iterative greedy algorithm to solve the optimization (1).
Our algorithm generalizes the idea of the CoSaMP algorithm to handle the non-quadratic
structures of the objective function as well as the general forms of the constraint. Our
contribution is to provide strong theoretical bounds for the recovery error as well as the
linear convergence rate of the algorithm. Leveraging on the notions of D-restricted strong
convexity (D-RSC) and D-restricted strong smoothness (D -RSS) enforced on the loss
function, we show that the 2 -norm error between the optimal solution x
of (1) and
iterate tx obtained by the algorithm at the t -th iteration decays exponentially. These two
properties D -RSC and D -RSS are analogous to the RSC and RSS, which has been the
essential tools to show the efficient estimation and fast convergence of the convex
programming [5] [6]. We apply this general theory to estimate the recovery error of
various standard recovery problems such as compressed sensing, matrix/tensor recovery,
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 40, 12 - 2015 111
and generalized linear regression. When the loss function is quadratic and the estimator
has the vector (matrix) form, we obtain a similar theoretical guarantee as the CoSaMP [2]
(ARMiRA [7]). In addition, our simulations on several problems in statistics demonstrate
that our algorithm usually leads to comparable performance, if not better, than the convex
minimization approach.
2. GRADIENT MATCHING PURSUIT ALGORITHM
We present in this section the algorithm to solve (1). The algorithm, called gradient
matching pursuit (GradMP), is described in Algorithm 1. The name gradient comes from
the first step of the algorithm that the signal support is estimated via taking the gradient of
the loss function at each iteration and chooses the support domain associated with the
direction that maximizes the gradient. In general, the algorithm follows the similar flow as
the CoSaMP. However, we have made several changes to address the non-quadratic
structure of the loss function and the general form of the sparse constraint. The algorithm
can be seen as the generalizations of the CoSaMP [2] for sparse vector recovery and the
ADMiRA [7] for low-rank matrix recovery to the nonlinear settings. Particularly
compared to CoSaMP and ADMiRA, GradMP differs in the proxy step where instead of
taking the correlation between the sensing matrix and the error residual, we take the
gradient of the loss function. In addition, in the identification step, the signal support is
estimated by projecting the gradient onto the spaces spaned by subsets of the set D. In fact,
when the loss function is quadratic and the set D consists of standard basis, Algorithm 1
reduces to the CoSaMP. Despite the algorithmic similarity, the analysis of this more
generalized algorithm is considerably different. It is because we do not have the luxury of
accessing the closed form solution of the restricted optimization in the estimation step, as
the least-square does in the CoSaMP.
Intuitively, GradMP is performed as follows: instead of solving the difficult nonconvex
problem (1) directly, at each iteration, GradMP looks for a subspace based on the previous
estimate and then seeks for a new solution by solving a sub-optimization problem which
takes place at the estimation step. This subproblem is often much easier to optimize due to
its low-dimensional structure. In particular, the optimization at the estimation step can be
written as
ˆ| |
ˆ min ( )D
Provided k is considerable smaller than the ambient dimension n , this optimization is
on 3k space, as opposed to n in the original problem (1). In addition, in many practical
problems in statistics, although ( )x is nonconvex, it is often convex on a restricted set.
Throughout the paper, we denote by xˆ the feasible solution of the minimization (1):
arg min ( )
x
x x
subject to
0,D
x k (3)
We assume T to be the support set of xˆ with respect to D. That is, xˆ can be
decomposed as ˆ i i
i T
x d
. For a given vector x , let TDP x be the orthogonal projection
of x onto the space spanned by columns of matrix TD defined by
2 argminT
T
D
u D
P x x u
(4)
Công nghệ thông tin & Khoa học máy tính
N.N.Minh, N.Nam, “An interative greedy algorithm for sparsity-constrained optimization.” 112
where the norm is defined as the 2 -norm for vectors or Frobenius norm for matrices. We
also denote DC as the space spanned by columm of D :
| |{ : , }DC x x D
(5)
It is clear that DC is the convex set.
Algorithm 1. GradMP algorithm
1. input: k and stopping criterion
2. initialize: , 0x , and 0t
3. repeat
proxy: ( ) tu x
identify:
2
argmax { 2 } :| |DP u k
merge: ˆ
estimate:
ˆ
argmin . ( ) s t.x Db x x C
prune:
2
argmax { :| | }DP b k
update:
1 t Dx P b
t = t+1
4. until: halting criterion true
5. output: tx
3. SIMULATION AND RESULTS
3.1. Covariance matrix estimation
In this section, we apply the proposed algorithm to solve some important problems in
statistics in which the underlying loss functions are generally not quadratic. Our first
consideration is the covariance matrix estimation. We will demonstrate via extensive
simulations that GradMP is a potential algorithm for this problem.
Covariance matrix estimation has been a long standing problem in statistics (e.g. [8]).
Given a Gaussian random vector (1) ( )( ,., )p T px x x with mean and unknown
covariance , the ultimate goal is to estimate from n random samples 1,.,{ }i i nx of
x . If we denote a Gaussian graphical model as ( , )G V E where each i th vertex is
associated with a random vector ( )ix , then the structure of the graph is encoded in the
covariance matrix. In particular, there is no edge between the two vertices i th and j th if the
( , )i j entry of the concentration matrix 1 is zero. Using maximum likelihood
estimation, we need to solve the following optimization
1 argmax trace( ) logdet( )S
(6)
where
1
1
( )( )
n
T
i i
i
S x x
n
is the empirical covariance matrix. Without imposing
any constraint on the covariance matrix, it is clear that the solution of (6) might not be
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 40, 12 - 2015 113
consistent, especially in the scenarios in which the number of observations n is
significantly less than the data dimension p .
Notice that in multivariate Gaussian distribution, if the ( , )i j component of 1 is zero,
then two variables ( )ix and ( )jx are conditionally independent, given the other variables.
Thus, it would make sense to enforce sparsity on the concentration matrix 1 in the
0,off
( ) logdet( argmin tr ) s.t ac .e S k (7)
where 0,off denotes the number of off-diagonal nonzero entries of .
Over the last few years, there has been an extensive amount of work on recasting this
problem to the convex program (e.g. [9],[10]) and establishing strong statistical guarantees
for this estimator (e.g. [11],[12]). In [9], Yuan and Lin proposed to solve (7) by using the
1 penalty
1,off
( ) logd argmin et(trace )S (8)
where the off-diagonal 1 regularizer is denoted as
1,off | |ij
i j
. Based on the work
of Banerjee [13], Friedman [10] developed a very efficient co-ordinate descent algorithm
to solve (8). On the theoretical side, Roth [11] provided an upper bound in 2 -norm for
the estimation error. Under some tight conditions on the population covariance matrix,
Ravikumar [12] showed the capability of (8) to recover the exact graph structure where the
number of observations is in the order of 2( log )k p , where k is the maximum number
of edges in any vertex. Very recently, leveraging on the forward-backward greedy
algorithm of [14], Johnson [15] proposed an efficient greedy approach to solve (7). They
showed that under a weak assumption on , only ( log )k p samples is necessary to
achieve exact graph structure. This is a significant improvement over previous algorithms
and results in the literature.
Despite all the advantages, the computational complexity of the algorithm is
considerably heavy. It is because at each iteration, the algorithm requires to perform the
minimization ( )t lrac og te de ( )S for a fixed support set of two times.
Furthermore, at each iteration, the algorithm has to solve two other minimizations to select
the best entry or to remove the unnecessary entry from the current estimation.
In this section, we propose to solve (7) using our GradMP algorithm. Denote
trac( ) ( ) log det( )e S . It is clear that 1) ( S . We note that in the
algorithm , offU is denoted as the matrix U whose entries on the diagonal are equal to
zeros. Similar notation is used for offB . Furthermore, in Step 5 , off ,2supp( )kU is the
operator that returns the set containing locations of the 2k largest entries of the matrix
offU . As we can see, at each iteration, our GradMP only requires one minimization in step
7. Thus, the computational complexity of our algorithm is significantly less than that of
[15].
The optimization in step 7 is ˆmin ( ) logdet( ) s.t. sutrac pp(e )S
(9)
Công nghệ thông tin & Khoa học máy tính
N.N.Minh, N.Nam, “An interative greedy algorithm for sparsity-constrained optimization.” 114
which can be seen as a similar version of (7). However, this optimization is essentially
easier to solve since the support ˆ of is already defined. The algorithm for the
minimization (9) can be found in Section 17.3.1 of [8].
Algorithm 2. Covariance matrix estimation algorithm
Initialization:
0 and 1t
repeat
1U S
ˆ
ˆ( ) subject t argmin suppo ( )B
off ,( s p )up kB
t B , 0c
t
1t t
until halting criterion true
Output: ˆ t
3.2 Results
In this section, we compare the graph recovery performance of our GradMP algorithm
with the algorithm proposed in [15]. As shown in the experiments of [15], the greedy
algorithm significantly outperforms the 1 approach in estimating the Gaussian graph
structures. In particular, the authors of [15] demonstrated that the probability of success in
recovering the graph with forward-backward greedy approach is much higher than that of
the 1 counterpart. In the following experiments, we show that our GradMP offers the
same, if not better, graph recoverability while the computational complexity is
substantially reduced.
We conduct similar experiments as in [15] in which three different graph structures are
studied including chain, star and grid structures, as demonstrated in Fig.1. For these
graphs, the number of nodes varies with {36,64,100}p where the maximum number
of node degree is set to 2k with chain graph. These values are set to 4k and
0.1k p for grid and star graphs, respectively. We varied the number of samples via a
scaling parameter
70 log
n
x
k p
with (0,3)x . For each fixed n and p , we perform
50 trials where in each trial, we claim the success if the algorithm recover the exact graph
edges. More detailed experimental setup and the construction of the covariance matrices
can be found in [15].
As demonstrated in [15], the greedy forward-backward (Fo-Ba) algorithm outperforms
the graphical lasso (glasso) that implements the optimization (7) in term of the sample size
needed for perfect edge recovery. In this simulation, we show that our proposed GradMP
has similar performance, if not better, than the Fo-Ba algorithm while the computational
complexity is significantly smaller.
off ,2 supp( )kU
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 40, 12 - 2015 115
Fig.2 compares our proposed GradMP algorithm with the greedy forward-backward
(Fo-Ba) algorithm investigated in [15]. Figures on the left column represent the probability
of success with respect the scaling parameter x controlling the number of sample size n .
Figures on the right column show the average computational time used to estimate the
graph in 50 trials. As can be seen from Fig.2 , our proposed GradMP attains similar
recoverability as the greedy Fo-Ba algorithm for chain and grid graphs, while GradMP
performs better on star graph. Furthermore, our proposed algorithm requires significantly
less computational complexity. For all three graph structures, the computational time of
GradMP does not increase considerably according to the grow of the sample size n and
the graph size p , whereas it increases significantly with the algorithm in [15].
Figure 1: Graph structures. Left: chain graph, middle: star graph, and right: grid graph.
Công nghệ thông tin & Khoa học máy tính
N.N.Minh, N.Nam, “An interative greedy algorithm for sparsity-constrained optimization.” 116
Figure 2. Probability of success and computational complexity. Top: chain graph, middle:
star graph, and bottom: grid graph.
4. CONCLUSION
In this paper, an iterative greedy algorithm, namely gradient matching pursuit
(gradMP) has been presented and analyzed. This algorithm can be used to solve a general
sparsity-constrained optimization. Our algorithm generalizes the idea of the CoSaMP
algorithm to handle the non-quadratic structure of the objective function together with the
general form of the sparse constraint. The simulations on several problems in statistics also
demonstrate that our algorithm usually leads to comparable performance, if not better, than
the convex minimization approach. This algorithm can also be extended to use in various
other fields.
REFERENCES
[1]. S. Mallat and Z. Zhang. “Matching pursuits with time-frequency dictionaries”.
IEEE Trans. Signal Process., 41(12):3397-3415, Dec. 1993.
[2]. D. Needell and J. A. Tropp. CoSaMP: “Iterative signal recovery from incomplete
and inaccurate samples”. Applied Comput. Harmon. Anal., 26:301-321, 2008.
[3]. T. Blumensath and M. E. Davies. “Iterative hard thresh holding for compressed
sensing”. Applied and Computational Harmonic Analysis, 27(3):265-274, 2009.
[4]. W. Dai and O. Milenkovic. “Subspace pursuit for compressive sensing signal
reconstruction”. IEEE Trans. Inf. Theory, 55(5):2230-2249, May 2009.
[5]. A. Agarwal, S. Negahban, and M. J. Wainwright. “Fast global convergence of
gradient methods for high-dimensional statistical recovery”. Ann. Statist.,
40(5):2452-2482, 2012.
[6]. S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. “An unified framework
for high-dimensional analysis of M-estimators with decomposable regularizers”.
Stat. Scien., 27(4):538-557, Dec. 2012.
[7]. K. Lee and Y. Bresler. ADMiRA: “Atomic decomposition for minimum rank
approximation”. IEEE Trans. Inf. Theory, 56(9):4402-4416, 2010.
[8]. T. Hastie, R. Tibshirani, and J. Friedman. “The Element of Statistical Learning:
Data mining, Inference, and Prediction”. Springer, 3rd edition, 2009.
[9]. M. Yuan and Y. Lin. “Model selection and estimation in the gaussian graphical
model”. Biometrika, 94(1):19-35, 2007.
[10]. J. Friedman, T. Hastie, and R. Tibshirani. “Sparse inverse covariance estimation
with the graphical lasso”. Biostatistics, 9(3):432-441, 2008.
Nghiên cứu khoa học công nghệ
Tạp chí Nghiên cứu KH&CN quân sự, Số 40, 12 - 2015 117
[11]. A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu. “Sparse permutation invariant
covariance estimation”. Electron. J. Statist., 2:494-515, 2008.
[12]. P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. “High-dimensional
covariance estimation by minimizing `1-penalized log-determinant divergence”.
Electron. J. Statist., 5:935-980, 2011.
[13]. O. Banerjee, L. E. Ghaoui, and A. d'Aspremont. “Model selection through sparse
maximum likelihood estimation for multivariate gaussian or binary data”. J.
Machine Learn. Res., 9:485-516, 2008.
[14]. T. Zhang. “Adaptive forward-backward greedy algorithm for learning sparse
representations”. IEEE Trans. Infor. Theory, 57(7):4689-4708, July 2011.
[15]. C. C. Johnson, A. Jalali, and P. Ravikumar. “High-dimensional sparse inverse
covariance estimation using greedy methods”. In Proc. 15th Inter Conf. Artic.
Intell. Statist. (AISTAT), La Palma, Canary Islands, Apr. 2012.
TÓM TẮT
MỘT THUẬT TOÁN GREEDY CHO VIỆC TỐI ƯU VỚI CÁC GIỚI HẠN RỜI RẠC
Bài báo này trình bày một thuật toán greedy lặp, cụ thể là gradient matching
pursuit (gradMP), để giải quyết tối ưu hóa một hệ ràng buộc rời rạc chung. Thuật
toán của chúng tôi khái quát ý tưởng của thuật toán CoSaMP để xử lý các kết cấu
phi bậc hai của hàm mục tiêu cùng với các khung chung của các ràng buộc rời rạc.
Các thuật toán như vậy có thể giải quyết cho các lớp rộng lớn hơn về chức năng
mục tiêu và các ràng buộc rời rạc. Ngoài ra, mô phỏng của chúng tôi về các vấn đề
khác nhau cũng cho thấy rằng thuật toán của chúng tôi thường dẫn đến hiệu năng
tương đương, nếu không tốt hơn, so với phương pháp giảm thiểu convex.
Từ khóa: Matching pursuit, GradMP, CoSaMP
Nhận bài ngày 29 tháng 9 năm 2015
Hoàn thiện ngày 13 tháng 11 năm 2015
Chấp nhận đăng ngày 25 tháng 12 năm 2015
Address: 1 Department of Electronic Engineering, Posts and Telecommunications Institue of
Technology, 122 Hoang Quoc Viet, Hanoi, Vietnam;
2 Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139,
United States
*Email: minhnn@ptit.edu.vn
Các file đính kèm theo tài liệu này:
- 15_minh_8762_2149262.pdf