{"title": "Bayesian Source Localization with the Multivariate Laplace Prior", "book": "Advances in Neural Information Processing Systems", "page_first": 1901, "page_last": 1909, "abstract": "We introduce a novel multivariate Laplace (MVL) distribution as a sparsity promoting prior for Bayesian source localization that allows the specification of constraints between and within sources. We represent the MVL distribution as a scale mixture that induces a coupling between source variances instead of their means. Approximation of the posterior marginals using expectation propagation is shown to be very efficient due to properties of the scale mixture representation. The computational bottleneck amounts to computing the diagonal elements of a sparse matrix inverse. Our approach is illustrated using a mismatch negativity paradigm for which MEG data and a structural MRI have been acquired. We show that spatial coupling leads to sources which are active over larger cortical areas as compared with an uncoupled prior.", "full_text": "Bayesian Source Localization with the\n\nMultivariate Laplace Prior\n\nMarcel van Gerven1,2 Botond Cseke1 Robert Oostenveld2 Tom Heskes1,2\n\nRadboud University Nijmegen\nNijmegen, The Netherlands\n\n1Institute for Computing and Information Sciences\n\n2Donders Institute for Brain, Cognition and Behaviour\n\nAbstract\n\nWe introduce a novel multivariate Laplace (MVL) distribution as a sparsity pro-\nmoting prior for Bayesian source localization that allows the speci\ufb01cation of con-\nstraints between and within sources. We represent the MVL distribution as a scale\nmixture that induces a coupling between source variances instead of their means.\nApproximation of the posterior marginals using expectation propagation is shown\nto be very ef\ufb01cient due to properties of the scale mixture representation. The com-\nputational bottleneck amounts to computing the diagonal elements of a sparse ma-\ntrix inverse. Our approach is illustrated using a mismatch negativity paradigm for\nwhich MEG data and a structural MRI have been acquired. We show that spatial\ncoupling leads to sources which are active over larger cortical areas as compared\nwith an uncoupled prior.\n\n1 Introduction\n\nY = XS + E\n\nElectroencephalography (EEG) and magnetoencephalography (MEG) provide an instantaneous and\nnon-invasive measure of brain activity. Let q, p, and t denote the number of sensors, sources and\ntime points, respectively. Sensor readings Y \u2208 Rq\u00d7t and source currents S \u2208 Rp\u00d7t are related by\n(1)\nwhere X \u2208 Rq\u00d7p is a lead \ufb01eld matrix that represents how sources project onto the sensors and\nE \u2208 Rq\u00d7t represents sensor noise.\nUnfortunately, localizing distributed sources is an ill-posed inverse problem that only admits a\nunique solution when additional constraints are de\ufb01ned. In a Bayesian setting, these constraints\ntake the form of a prior on the sources [3, 19]. Popular choices of prior source amplitude distri-\nbutions are Gaussian or Laplace priors, whose MAP estimates correspond to minimum norm and\nminimum current estimates, respectively [18]. Minimum norm estimates produce spatially smooth\nsolutions but are known to suffer from depth bias and smearing of nearby sources. In contrast, mini-\nmum current estimates lead to focal source estimates that may be scattered too much throughout the\nbrain volume [9].\nIn this paper, we take the Laplace prior as our point of departure for Bayesian source localization\n(instead of using just the MAP estimate). The obvious approach is to assume univariate Laplace\npriors on individual sources. Here, in contrast, we assume a multivariate Laplace distribution over\nall sources, which allows sources to be coupled. We show that such a distribution can be represented\nas a scale mixture [2] that differs substantially from the one presented in [5].\nOur representation allows the speci\ufb01cation of both spatio-temporal as well as sparsity constraints.\nSince the posterior cannot be computed exactly, we formulate an ef\ufb01cient expectation propagation\n\n1\n\n\falgorithm [12] which allows us to approximate the posterior of interest for very large models. Ef-\n\ufb01ciency arises from the block diagonal form of the approximate posterior covariance matrix due to\nproperties of the scale mixture representation. The computational bottleneck then reduces to com-\nputation of the diagonal elements of a sparse matrix inverse, which can be solved through Cholesky\ndecomposition of a sparse matrix and application of the Takahashi equation [17]. Furthermore, mo-\nment matching is achieved by one-dimensional numerical integrations. Our approach is evaluated\non MEG data that was recorded during an oddball task.\n\n2 Bayesian source localization\n\nIn a Bayesian setting, the goal of source localization is to estimate the posterior\n\np(S | Y, X, \u03a3, \u0398) \u221d p(Y | S, X, \u03a3)p(S | \u0398)\n\nwhere the likelihood term p(Y | S) =(cid:81)\n\n(2)\nt N (yt | Xst, \u03a3) factorizes over time and \u03a3 represents\nsensor noise. The prior p(S | \u0398), with \u0398 acting as a proxy for the hyper-parameters, can be used\nto incorporate (neuroscienti\ufb01c) constraints. For simplicity, we assume independent Gaussian noise\nwith a \ufb01xed variance \u03c32, i.e., \u03a3 = \u03c32I. Without loss of generality, we will focus on one time-point\n(yt, st) only and drop the subscript when clear from context.1\nThe source localization problem can be formulated as a (Bayesian) linear regression problem where\nthe source currents s play the role of the regression coef\ufb01cients and rows of the lead \ufb01eld matrix\nX can be interpreted as covariates. In the following, we de\ufb01ne a multivariate Laplace distribution,\nrepresented in terms of a scale mixture, as a convenient prior that incorporates both spatio-temporal\nand sparsity constraints.\nThe univariate Laplace distribution\n\nL (s | \u03bb) \u2261 \u03bb\n2\n\nexp (\u2212\u03bb|s|)\n\n(3)\n\ncan be represented as a scale mixture of Gaussians [2], the scaling function being an exponential\ndistribution with parameter \u03bb2/2. The scale parameter \u03bb controls the width of the distribution and\nthus the regularizing behavior towards zero. Since the univariate exponential distribution is a \u03c72\n2\ndistribution, one can alternatively write\n\ndudv N(cid:0)s | 0, u2 + v2(cid:1)N(cid:0)u | 0, 1/\u03bb2(cid:1)N(cid:0)v | 0, 1/\u03bb2(cid:1) .\n\nL (s | \u03bb) =\n\n(cid:90)\n\n(4)\n\nEltoft et al [5] de\ufb01ned the multivariate Laplace distribution as a scale mixture of a multivariate\n\u221a\nz\u03a31/2s where s is a standard normal multivariate Gaussian, \u03a3 is a positive\nGaussian given by\nde\ufb01nite matrix, and z is drawn from a univariate exponential distribution. The work presented\nin [11] is based on similar ideas but replaces the distribution on z with a multivariate log-normal\ndistribution.\nIn contrast, we use an alternative formulation of the multivariate Laplace distribution that couples\nthe variances of the sources rather than the source currents themselves. This is achieved by gener-\nalizing the representation in Eq. (4) to the multivariate case. For an uncoupled multivariate Laplace\ndistribution, this generalization reads\n\ndudv(cid:89)\n\nN(cid:0)si | 0, u2\n\ni + v2\n\ni\n\n(cid:1)N(cid:0)vi | 0, 1/\u03bb2(cid:1)N(cid:0)ui | 0, 1/\u03bb2(cid:1)\n\nL (s | \u03bb) =\n\n(cid:90)\n\n(5)\n\ni\n\nsuch that each source current si gets assigned scale variables ui and vi. We can interpret the scale\nvariables corresponding to source i as indicators of its relevance: the larger (the posterior estimate\nof) u2\ni , the more relevant the corresponding source. In order to introduce correlations between\nsources, we de\ufb01ne our multivariate Laplace (MVL) distribution as the following scale mixture:\n\ni + v2\n\n(cid:90)\n\n(cid:32)(cid:89)\n\nN(cid:0)si | 0, u2\n\ni + v2\n\ni\n\n(cid:1)(cid:33)\n\nL (s | \u03bb, J) \u2261\n\ndudv\n\nN(cid:0)v | 0, J\u22121/\u03bb2(cid:1)N(cid:0)u | 0, J\u22121/\u03bb2(cid:1) ,\n\n(6)\n\n1Multiple time-points can be incorporated by vectorizing Y and S, and augmenting X.\n\ni\n\n2\n\n\fprior. The factor f represents the likelihood term N(cid:0)y | Xs, \u03c32I(cid:1). Factors gi correspond to the\n\nFigure 1: Factor graph representation of Bayesian source localization with a multivariate Laplace\n\ncoupling between sources and scales. Factors h1 and h2 represent the (identical) multivariate Gaus-\nsians on u and v with prior precision matrix J. The gi are the only non-Gaussian terms and need to\nbe approximated.\n\nwhere J\u22121 is a normalized covariance matrix. This de\ufb01nition yields a coupling in the magnitudes\nof the source currents through their variances. The normalized covariance matrix J\u22121 speci\ufb01es the\ncorrelation strengths, while \u03bb acts as a regularization parameter. Note that this approach is de\ufb01ning\nthe multivariate Laplace with the help of a multivariate exponential distribution [10]. As will be\nshown in the next section, apart from having a semantics that differs from [5], our scale mixture rep-\nresentation has some desirable characteristics that allow for ef\ufb01cient approximate inference. Based\non the above formulation, we de\ufb01ne the sparse linear model as\n\np(y, s | X, \u03c32, \u03bb, J) = N(cid:0)y | Xs, \u03c32I(cid:1)L (s | \u03bb, J) .\n\n(7)\n\nThe factor graph in Fig. 1 depicts the interactions between the variables in our model.\n\n3 Approximate inference\n\nOur goal is to compute posterior marginals for sources si as well as scale variables ui and vi in order\nto determine source relevance. These marginals are intractable and we need to resort to approximate\ninference methods. In this paper we use a deterministic approximate inference method called expec-\ntation propagation (EP) [12]. For a detailed analysis of the use of EP in case of the decoupled prior,\nwhich is a special case of our MVL prior, we refer to [16]. EP works by iterative minimizations of\nthe Kullback\u2013Leibler (KL) divergence between appropriately chosen distributions in the following\nway.\nWe introduce the vector of all latent variables z = (sT , uT , vT )T . The posterior distribution on z\ngiven the data y (which is considered \ufb01xed and given and therefore omitted in our notation) can be\nwritten in the factorized form\n\nwhere t0(z) \u221d N(cid:0)y | Xs, \u03c32I(cid:1)N(cid:0)v | 0, J\u22121/\u03bb2(cid:1)N(cid:0)u | 0, J\u22121/\u03bb2(cid:1) and ti(z) = ti(si, ui, vi) =\n(cid:1). The term t0(z) is a Gaussian function, i.e., it can be written in the form\nN(cid:0)si | 0, u2\nblock-diagonal structure. Using EP, we will approximate p(z) with q (z) \u221d t0(z)(cid:81)\n\nexp(zT h0 \u2212 zT K0z/2). It factorizes into Gaussian functions of s, u, and v such that K0 has a\n\u00afti (z), where\nthe \u00afti(z) are Gaussian functions as well.\nOur de\ufb01nition of the MVL distribution leads to several computational bene\ufb01ts. Equation (6) intro-\nduces 2p auxiliary Gaussian variables (u, v) that are coupled to the si\u2019s by p non-Gaussian factors,\nthus, we have to approximate p terms. The multivariate Laplace distribution de\ufb01ned in [5] introduces\none auxiliary variable and couples all the sisj terms to it, therefore, it would lead to p2 non-Gaussian\nterms to be approximated. Moreover, as we will see below, the a priori independence of u and v and\n\np(z) \u221d t0(z)(cid:89)\n\ni + v2\n\ni\n\ni\n\nti(z) ,\n\ni\n\n(8)\n\n3\n\nfs1s2\u00b7\u00b7\u00b7spg1g2\u00b7\u00b7\u00b7gpu1v1u2v2\u00b7\u00b7\u00b7upvph1h2\fi by de\ufb01ning q\\i \u221d t0(z)(cid:81)\\i\n\n\u00aftj, minimizing KL(cid:0)tiq\\i (cid:107) q\u2217(cid:1) with\ngence then boils down to the minimization of KL(cid:0)ti(zi)q\\i(zi) (cid:107) q\u2217(zi)(cid:1) with respect to q\u2217(zi)\n\nthe form of the terms ti(z) results in an approximation of the posterior with the same block-diagonal\nstructure as that of t0(z).\nIn each step, EP updates \u00afti with \u00aft\u2217\ni \u221d q\u2217/q\\i. It can be shown that when ti depends only on a subset of\nrespect to q\u2217 and setting \u00aft\u2217\nvariables zi (in our case on zi = (si, ui, vi)) then so does \u00afti. The minimization of the KL diver-\ni (zi) \u221d q\u2217(zi)/q\\i(zi). Minimization of the KL divergence corresponds to\nand \u00afti is updated to \u00aft\u2217\nmoment matching, i.e., q\u2217(si, ui, vi) is a Gaussian with the same mean and covariance matrix as\nqi(zi) \u221d ti(zi)q\\i(zi). So, to update the i-th term in a standard application of EP, we would have\nto compute q\\i(zi) and could then use a three-dimensional (numerical) integration to compute all\n\ufb01rst and second moments of qi(zi). Below we will explain how we can exploit the speci\ufb01c charac-\nteristics of the MVL to do this more ef\ufb01ciently. For stability, we use a variant of EP, called power\nEP [13], where q\\i \u221d \u00aft(1\u2212\u03b1)\nexplanation of standard EP corresponds to \u03b1 = 1. In the following we will give the formulas for\ngeneral \u03b1.\nWe will now work out the EP update for the i-th term approximation in more detail to show by\ninduction that \u00afti(si, ui, vi) factorizes into independent terms for si, ui, and vi. Since ui and vi play\nexactly the same role, it is also easy to see that the term approximation is always symmetric in ui\nand vi. Let us suppose that q (si, ui, vi) and consequently q\\i (si, ui, vi) factorizes into independent\nterms for si, ui, and vi, e.g., we can write\n\ni q\\i (cid:107) q\u2217(cid:1) with \u03b1 \u2208 (0, 1] is minimized. The above\n\n\u00aftj and KL(cid:0)t\u03b1\n\n(cid:81)\\i\n\ni\n\n(10)\n\n(cid:19)\n\n(cid:1)\n\ni\n\ni\n\n\u03b1\u03c32\n\ni\n\ni\n\ni\n\nq\\i (si, ui, vi) = N (si | mi, \u03c32\n\ni )N (ui | 0, \u03bd2\n\ni )N (vi | 0, \u03bd2\ni ).\n\ni + v2\n\ni\n\ni + v2\n\nqi(si | ui, vi) \u221d N\n\n(9)\nBy initializing \u00afti(si, ui, vi) = 1, we have q(z) \u221d t0(z) and the factorization of q\\i (si, ui, vi)\nfollows directly from the factorization of t0(z) into independent terms for s, u, and v. That is, for\nthe \ufb01rst EP step, the factorization can be guaranteed. To obtain the new term approximation, we\nhave to compute the moments of the distribution qi(si, ui, vi) \u221d N (si | 0, u2\ni )\u03b1q\\i(si, ui, vi),\nwhich, by regrouping terms, can be written in the form qi(si, ui, vi) = qi(si | ui, vi)qi(ui, vi) with\n\n(cid:18)\nqi(ui, vi) \u221d (cid:0)u2\n(cid:3))/2. Since qi(ui, vi) can be expressed as a function of u2\n(cid:3) + E(cid:2)v2\n(cid:3) = (E(cid:2)u2\n(cid:3) can be computed with one-dimensional Gauss-Laguerre integration. Summa-\n\n(11)\nSince qi(ui, vi) only depends on u2\ni and is thus invariant under sign changes of ui and vi,\nwe must have E [ui] = E [vi] = 0, as well as E [uivi] = 0. Because of symmetry, we further have\ni + v2\ni\nonly, this variance can be computed from (11) using one-dimensional Gauss-Laguerre numerical\nquadrature [15]. The \ufb01rst and second moments of si conditioned upon ui and vi follow directly\nfrom (10). Because both (10) and (11) are invariant under sign changes of ui and vi, we must have\nE [siui] = E [sivi] = 0. Furthermore, since the conditional moments again depend only on u2\ni + v2\ni ,\n\n(cid:3) = E(cid:2)v2\nE(cid:2)u2\nalso E [si] and E(cid:2)s2\n\ni + v2\n\u00d7N (ui | 0, \u03bd2\ni and v2\n\n(cid:1)(1\u2212\u03b1)/2 N(cid:0)\u221a\n\ni (u2\n\u03c32\n,\ni + u2\n\u03b1\u03c32\n\u03b1mi | 0, \u03b1\u03c32\n\ni + v2\ni )\ni + v2\ni + u2\n\nsi | mi(u2\ni + u2\n\ni )N (vi | 0, \u03bd2\ni ) .\n\ni + v2\ni )\ni + v2\n\nrizing, we have shown that if the old term approximations factorize into independent terms for si, ui,\ni (si, ui, vi) \u221d q\u2217(si, ui, vi)/q\\i(si, ui, vi),\nand vi, the new term approximation after an EP update, \u00aft\u2217\nmust do the same. Furthermore, given the cavity distribution q\\i(si, ui, vi), all required moments\ncan be computed using one-dimensional numerical integration.\nThe crucial observation here is that the terms ti(si, ui, vi) introduce dependencies between si and\n(ui, vi), as expressed in Eqs. (10) and (11), but do not lead to correlations that we have to keep track\nof in a Gaussian approximation. This is not speci\ufb01c to EP, but a consequence of the symmetries and\ninvariances of the exact distribution p(s, u, v). That is, also when the expectations are taken with\nrespect to the exact p(s, u, v) we have E [ui] = E [vi] = E [uivi] = E [siui] = E [sivi] = 0 and\n\n(cid:3) determines the amount of regularization\n\n(cid:3). The variance of the scales E(cid:2)u2\n\n(cid:3) = E(cid:2)v2\n\nE(cid:2)u2\n\non the source parameter si such that large variance implies little regularization.\nLast but not least, contrary to conventional sequential updating, we choose to update the terms \u00afti in\nparallel. That is, we compute all q\\is and update all terms simultaneously. Calculating q\\i(si, ui, vi)\n\ni + v2\n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\n4\n\n\frequires the computation of the marginal moments q(si), q(ui) and q(vi). For this, we need the\ndiagonal elements of the inverse of the precision matrix K of q(z). This precision matrix has the\nblock-diagonal form\n\n\uf8ee\uf8f0 XT X/\u03c32 + Ks\n\n0\n0\n\nK =\n\n\uf8f9\uf8fb\n\n\u03bb2J + Ku\n\n0\n\n0\n\n0\n0\n\n\u03bb2J + Kv\n\n(12)\n\nwhere J is a sparse precision matrix which determines the coupling, and Ks, Ku, and Kv = Ku\nare diagonal matrices that contain the contributions of the term approximations. We can exploit the\nlow-rank representation of XT X/\u03c32 + Ks to compute its inverse using the Woodbury formula [7].\nThe diagonal elements of the inverse of \u03bb2J + Ku can be computed ef\ufb01ciently via sparse Cholesky\ndecomposition and the Takahashi equation [17]. By updating the term approximations in parallel,\nwe only need to perform these operations once per parallel update.\n\n4 Experiments\n\nReturning to the source localization problem, we will show that the MVL prior can be used to induce\nconstraints on the source estimates. To this end, we use a dataset obtained for a mismatch negativity\nexperiment (MMN) [6]. The MMN is the negative component of the difference between responses\nto normal and deviant stimuli within an oddball paradigm that peaks around 150 ms after stimulus\nonset. In our experiment, the subject had to listen to normal (500 Hz) and deviant (550 Hz) tones,\npresented for 70 ms. Normal tones occurred 80% of the time, whereas deviants occurred 20% of the\ntime. A total of 600 trials was acquired.\nData was acquired with a CTF MEG System (VSM MedTech Ltd., Coquitlam, British Columbia,\nCanada), which provides whole-head coverage using 275 DC SQUID axial gradiometers. A re-\nalistically shaped volume conduction model was constructed based on the individual\u2019s structural\nMRI [14]. The brain volume was discretized to a grid with a 0.75 cm resolution and the lead \ufb01eld\nmatrix was calculated for each of the 3863 grid points according to the head position in the system\nand the forward model. The lead \ufb01eld matrix is de\ufb01ned for the three x, y, and z orientations in each\nof the source locations and was normalized to correct for depth bias. Consequently, the lead \ufb01eld\nmatrix X is of size 275 \u00d7 11589. The 275 \u00d7 1 observation vector y was rescaled to prevent issues\nwith numerical precision.\nIn the next section, we compare source estimates for the MMN difference wave that have been\nobtained when using either a decoupled or a coupled MVL prior. For ease of exposition, we focus\non a spatial prior induced by the coupling of neighboring sources. In order to demonstrate the effect\nof the spatial prior, we assume a \ufb01xed regularization parameter \u03bb and \ufb01xed noise variance \u03c32, as\nestimated by means of the L curve criterion [8]. Differences in the source estimates will therefore\narise only from the form of the 11589 \u00d7 11589 sparse precision matrix J. The \ufb01rst estimate is\nobtained by assuming that there is no coupling between elements of the lead \ufb01eld matrix, such that\nJ = I. This gives a Bayesian formulation of the minimum current estimate [18]. The second\nestimate is obtained by assuming a coupling between neighboring sources i and j within the brain\nvolume with \ufb01xed strength c. This coupling is speci\ufb01ed through the unnormalized precision matrix\n\u02c6Jij.2\nThis prior dictates that the magnitude of the variances of the source currents are coupled between\nsources.\nFor the coupling strength c, we use correlation as a guiding principle. Recall that the unnormal-\nized precision matrix \u02c6J in the end determines the correlations (of the variances) between sources.\nSpeci\ufb01cally, correlation between sources si and sj is given by\n\n\u02c6J by assuming \u02c6Jix,jx = \u02c6Jiy,jy = \u02c6Jiz,jz = \u2212c while diagonal elements \u02c6Jii are set to 1 \u2212(cid:80)\n\nj(cid:54)=i\n\n(cid:16)\u02c6J\u22121(cid:17)\n\n(cid:16)\u02c6J\u22121(cid:17) 1\n\n2\n\n(cid:16)\u02c6J\u22121(cid:17) 1\n\n2\n\n/\n\nij\n\nii\n\njj\n\nrij =\n\n.\n\n(13)\n\nFor example, using c = 10, we would obtain a correlation coef\ufb01cient of ri,i+1 = 0.78. Note that this\nalso leads to more distant sources having non-zero correlations. The positive correlation between\n\n2The normalized precision matrix is obtained through J = diag(\u02c6J\u22121)\n\n1\n\n2 \u02c6J diag(\u02c6J\u22121)\n\n1\n2 .\n\n5\n\n\fFigure 2: Spatial coupling leads to the normalized precision matrix J with coupling of neighboring\nsource orientations in the x, y, and z directions. The (reordered) matrix L is obtained from the\nCholesky decomposition of J. The correlation matrix C shows the correlations between the source\norientations. For the purpose of demonstration, we show matrices using a very coarse discretization\nof the brain volume.\n\nneighboring sources is motivated by the notion that we expect neighboring sources to be similarly\nthough not equivalently involved for a given task. Evidently, the desired correlation coef\ufb01cient also\ndepends on the resolution of the discretized brain volume.\nFigure 2 demonstrates how a chosen coupling leads to a particular structure of J, where irregularities\nin J are caused by the structure of the imaged brain volume. The \ufb01gure also shows the computational\nbottleneck of our algorithm, which is to compute diagonal elements of J\u22121. This can be solved by\nmeans of the Takahashi equation which operates on the matrix L that results from a sparse Cholesky\ndecomposition. The block diagonal structure of L arises from a reordering of rows and columns\nusing, for instance, the amd algorithm [1]. The correlation matrix C shows the correlations between\nthe sources induced by the structure of J. Zeros in the correlation matrix arise from the independence\nbetween source orientations x, y, and z.\n\n5 Results\n\nFigure 3 depicts the difference wave that was obtained by subtracting the trial average for standard\ntones from the trial average for deviant tones. A negative de\ufb02ection after 100 ms is clearly visible.\nThe event-related \ufb01eld indicates patterns of activity at central channels in both hemispheres. These\n\nFigure 3: Evolution of the difference wave at right central sensors and event-related \ufb01eld of the\ndifference wave 125 ms after cue onset.\n\n6\n\nJLC00.050.10.150.20.250.3\u221210\u22128\u22126\u22124\u22122024x10\u221214time(s)standarddeviantdifference\fFigure 4: Source estimates using a decoupled prior (top) or a coupled prior (bottom). Plots are\ncentered on the left temporal source.\n\nFigure 5: Relative variance using a decoupled prior (top) or a coupled prior (bottom). Plots are\ncentered on the right temporal source.\n\n\ufb01ndings are consistent with the mismatch negativity literature [6]. We now proceed to localizing the\nsources of the activation induced by mismatch negativity.\nFigure 4 depicts the localized sources when using either a decoupled MVL prior or a coupled MVL\nprior. The coupled spatial prior leads to stronger source currents that are spread over a larger brain\nvolume. MVL source localization has correctly identi\ufb01ed the source over left temporal cortex but\ndoes not capture the source over right temporal cortex that is also hypothesized to be present (cf.\nFig. 3). Note however that the source estimates in Fig. 4 represent estimated mean power and thus\ndo not capture the full posterior over the sources.\nDifferences between the decoupled and the coupled prior become more salient when we look at the\nrelative variance of the auxiliary variables as shown in Fig. 5. Relative variance is de\ufb01ned here as\nposterior variance minus prior variance of the auxiliary variables, normalized to be between zero\nand one. This measure indicates the change in magnitude of the variance of the auxiliary variables,\nand thus indirectly that of the sources via Eq. (6). Since only sources with non-zero contributions\nshould have high variance, this measure can be used to indicate the relevance of a source. Figure 5\n\n7\n\n\fshows that temporal sources in both left and right hemispheres are relevant. The relevance of the\ntemporal source in the right hemisphere becomes more pronounced when using the coupled prior.\n\n6 Discussion\n\nIn this paper, we introduced a multivariate Laplace prior as the basis for Bayesian source localiza-\ntion. By formulating this prior as a scale mixture we were able to approximate posteriors of interest\nusing expectation propagation in an ef\ufb01cient manner. Computation time is mainly in\ufb02uenced by the\nsparsity structure of the precision matrix J which is used to specify interactions between sources by\ncoupling their variances. We have demonstrated the feasibility of our approach using a mismatch\nnegativity dataset. It was shown that coupling of neighboring sources leads to source estimates that\nare somewhat more spatially smeared as compared with a decoupled prior. Furthermore, visualiza-\ntion of the relative variance of the auxiliary variables gave additional insight into the relevance of\nsources.\nContrary to the MAP estimate (i.e., the minimum current estimate), our Bayesian estimate does not\nexactly lead to sparse posteriors given a \ufb01nite amount of data. However, posterior marginals can\nstill be used to exclude irrelevant sources since these will typically have a mean activation close to\nzero with small variance. In principle, we could force our posteriors to become more MAP-like by\n\nreplacing the likelihood term with N(cid:0)y | Xs, \u03c32I(cid:1)1/T in the limit T \u2192 0. From the Bayesian point\n\nof view, one may argue whether taking this limit is fair. In any case, given the inherent uncertainty\nin our estimates we favor the representation in terms of (non-sparse) posterior marginals.\nNote that it is straightforward to impose other constraints since this only requires the speci\ufb01cation\nof suitable interactions between sources through J. For instance, the spatial prior could be made\nmore realistic by taking anatomical constraints into account or by the inclusion of coupling between\nsources over time. Other constraints that can be implemented with our approach are the coupling of\nindividual orientations within a source, or even the coupling of source estimates between different\nsubjects. Coupling of source orientations has been realized before in [9] through an (cid:96)1/(cid:96)2 norm,\nalthough not using a fully Bayesian approach. In future work, we aim to examine the effect of the\nproposed priors and optimize the regularization and coupling parameters via empirical Bayes [4].\nOther directions for further research are inclusion of the noise variance in the optimization procedure\nand dealing with the depth bias that often arises in distributed source models in a more principled\nway.\nIn [11], \ufb01elds of Gaussian scale mixtures were used for modeling the statistics of wavelet coef\ufb01cients\nof photographics images. Our approach differs in two important aspects. To obtain a generalization\nof the univariate Laplace distribution, we used a multivariate exponential distribution of the scales,\nto be compared with the multivariate log-normal distribution in [11]. The Laplace distribution has\nthe advantage that it is the most sparsifying prior that, in combination with a linear model, still leads\nto a unimodal posterior [16]. Furthermore, we described an ef\ufb01cient method for approximating\nmarginals of interest whereas in [11] an iterative coordinate-ascent method was used to compute the\nMAP solution. Since (the ef\ufb01ciency of) our method for approximate inference only depends on the\nsparsity of the multivariate scale distribution, and not on its precise form, it should be feasible to\ncompute approximate marginals for the model presented in [11] as well.\nConcluding, we believe the scale mixture representation of the multivariate Laplace distribution\nto be a promising approach to Bayesian distributed source localization. It allows a wide range of\nconstraints to be included and, due to the characteristics of the scale mixture, posteriors can be\napproximated ef\ufb01ciently even for very large models.\n\nAcknowledgments\n\nThe authors gratefully acknowledge the support of the Dutch technology foundation STW (project\nnumber 07050) and the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic\nAffairs and the Netherlands Ministry of Education, Culture and Science. Tom Heskes is supported\nby Vici grant 639.023.604.\n\n8\n\n\fReferences\n[1] P. R. Amestoy, T. A. Davis, and I. S. Duff. Algorithm 837: Amd, an approximate minimum\ndegree ordering algorithm. ACM Transactions on Mathematical Software, 30(3):381\u2013388,\n2004.\n\n[2] D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal\n\nStatistical Society, Series B, 36(1):99\u2013102, 1974.\n\n[3] S. Baillet and L. Garnero. A Bayesian approach to introducing anatomo-functional priors in the\nEEG/MEG inverse problem. IEEE Transactions on Biomedical Engineering, 44(5):374\u2013385,\n1997.\n\n[4] J. M. Bernardo and J. F. M. Smith. Bayesian Theory. Wiley, 1994.\n[5] T. Eltoft, T. Kim, and T. Lee. On the multivariate Laplace distribution. IEEE Signal Processing\n\nLetters, 13(5):300\u2013303, 2006.\n\n[6] M. I. Garrido, J. M. Kilner, K. E. Stephan, and K. J. Friston. The mismatch negativity: A\n\nreview of underlying mechanisms. Clinical Neurophysiology, 120:453\u2013463, 2009.\n\n[7] G. Golub and C. van Loan. Matrix Computations. John Hopkins University Press, Baltimore,\n\nMD, 3rd edition, 1996.\n\n[8] P. C. Hansen. Rank-De\ufb01cient and Discrete Ill-Posed Problems: Numerical Aspects of Linear\nInversion. Monographs on Mathematical Modeling and Computation. Society for Industrial\nMathematics, 1987.\n\n[9] S. Haufe, V. V. Nikulin, A. Ziehe, K.-R. M\u00a8uller, and G. Nolte. Combining sparsity and rota-\n\ntional invariance in EEG/MEG source reconstruction. NeuroImage, 42(2):726\u2013738, 2008.\n\n[10] N. T. Longford. Classes of multivariate exponential and multivariate geometric distributions\nderived from Markov processes. In H. W. Block, A. R. Sampson, and T. H. Savits, editors,\nTopics in statistical dependence, volume 16 of IMS Lecture Notes Monograph Series, pages\n359\u2013369. IMS Business Of\ufb01ce, Hayward, CA, 1990.\n\n[11] S. Lyu and E. P. Simoncelli. Statistical modeling of images with \ufb01elds of Gaussian scale\nmixtures. In B. Sch\u00a8olkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information\nProcessing Systems 19, pages 945\u2013952. MIT Press, Cambridge, MA, 2007.\n[12] T. Minka. Expectation propagation for approximate Bayesian inference.\n\nIn J. Breese and\nD. Koller, editors, Proceedings of the Seventeenth Conference on Uncertainty in Arti\ufb01cial In-\ntelligence, pages 362\u2013369. Morgan Kaufmann, 2001.\n\n[13] T. Minka. Power EP. Technical report, Microsoft Research, Cambridge, 2004.\n[14] G. Nolte. The magnetic lead \ufb01eld theorem in the quasi-static approximation and its use\nfor magnetoencephalography forward calculation in realistic volume conductors. Physics in\nMedicine & Biology, 48(22):3637\u20133652, 2003.\n\n[15] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C.\n\nCambridge University Press, 3rd edition, 2007.\n\n[16] M. W. Seeger. Bayesian inference and optimal design for the sparse linear model. Journal of\n\nMachine Learning Research, 9:759\u2013813, 2008.\n\n[17] K. Takahashi, J. Fagan, and M. S. Chen. Formation of a sparse bus-impedance matrix and its\napplication to short circuit study. In 8th IEEE PICA Conference, pages 63\u201369, Minneapolis,\nMN, 1973.\n\n[18] K. Uutela, M. H\u00a8am\u00a8al\u00a8ainen, and E. Somersalo. Visualization of magnetoencephalographic data\n\nusing minimum current estimates. NeuroImage, 10:173\u2013180, 1999.\n\n[19] D. Wipf and S. Nagarajan. A uni\ufb01ed Bayesian framework for MEG/EEG source imaging.\n\nNeuroImage, 44(3):947\u2013966, 2009.\n\n9\n\n\f", "award": [], "sourceid": 360, "authors": [{"given_name": "Marcel", "family_name": "Gerven", "institution": null}, {"given_name": "Botond", "family_name": "Cseke", "institution": null}, {"given_name": "Robert", "family_name": "Oostenveld", "institution": null}, {"given_name": "Tom", "family_name": "Heskes", "institution": null}]}