本文共 3095 字,大约阅读时间需要 10 分钟。
MDS aims to embed data in a lower dimensional space in such a way that pair-wise distances between data points are preserved.
Say we have N points xi∈Rn for i∈[1,N] , let X=[x1,x2,⋯,xN] , we don’t know the postion of xi . We are only supplied with the pair-wise Euclidean distances among these points. Now the objection is to find out N points yi∈Rk,k<n , let Y=[y1,y2,⋯,yN] , such that the distance in pairs of X is the same as these of Y.
Given the distance matrix DX , each element of DX can be written as:
(DXij)2=(xi−xj)T(xi−xj)=∥xi∥2−2xTixj+∥xj∥2 we can easily see that
DX=Z−2XTX+ZT Here, Z=zeT and z=[∥x1∥2∥x2∥2⋯∥xN∥2]T . Therefore Z takes the form
Z=⎡⎣⎢⎢⎢⎢⎢⎢∥x1∥2∥x1∥2⋮∥x1∥2∥x1∥2∥x1∥2⋮∥x1∥2⋯⋯⋱⋯∥x1∥2∥x1∥2⋮∥x1∥2⎤⎦⎥⎥⎥⎥⎥⎥ Now, let’s translate the mean of the set of hypothetical point set X to the origin. Note that this operation does not change the Euclidean distance between any pairs of points.
For better understanding, we introduce 1NAeeT and 1NeeTA . Here, A is a N-by-N matrix which taks the form:
A=⎡⎣⎢⎢⎢⎢⎢A11A21⋮AN1A12A22⋮AN2⋯⋯⋱⋯A1NA2N⋮ANN⎤⎦⎥⎥⎥⎥⎥ Hence,
1NAeeT=1N⎡⎣⎢⎢⎢⎢⎢A11A21⋮AN1A12A22⋮AN2⋯⋯⋱⋯A1NA2N⋮ANN⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢⎢11⋮111⋮1⋯⋯⋱⋯11⋮1⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢1N∑j=1NA1j1N∑j=1NA2j⋮1N∑j=1NANj1N∑j=1NA1j1N∑j=1NA2j⋮1N∑j=1NANj⋯⋯⋱⋯1N∑j=1NA1j1N∑j=1NA2j⋮1N∑j=1NANj⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢mean of first row of Amean of second row of A⋮mean of Nth row of Amean of first row of Amean of second row of A⋮mean of Nth row of A⋯⋯⋱⋯mean of first row of Amean of second row of A⋮mean of Nth row of A⎤⎦⎥⎥⎥⎥⎥ similiarly,
1NeeTA=1N⎡⎣⎢⎢⎢⎢⎢11⋮111⋮1⋯⋯⋱⋯11⋮1⎤⎦⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢⎢A11A21⋮AN1A12A22⋮AN2⋯⋯⋱⋯A1NA2N⋮ANN⎤⎦⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢1N∑i=1NAi11N∑i=1NAi1⋮1N∑i=1NAi11N∑i=1NAi21N∑i=1NAi2⋮1N∑i=1NAi2⋯⋯⋱⋯1N∑i=1NAiN1N∑i=1NAiN⋮1N∑i=1NAiN⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢mean of first column of Amean of first column of A⋮mean of first column of Amean of second column of Amean of second column of A⋮mean of second column of A⋯⋯⋱⋯mean of Nth column of Amean of Nth column of A⋮mean of Nth column of A⎤⎦⎥⎥⎥⎥⎥ The centering matrix is defined as:
Let’s now apply double centering to
DX to get
AX=HDXH=(IN−1NeeT)(Z−2XTX+ZT)(IN−1NeeT)=(IN−1NeeT)Z(IN−1NeeT)−2(IN−1NeeT)XTX(IN−1NeeT)+(IN−1NeeT)ZT(IN−1NeeT)=−2(IN−1NeeT)XTX(IN−1NeeT)=−2(X(IN−1NeeT))TX(IN−1NeeT)=−2X~TX~ where
X~=X(IN−1NeeT)
Remember, the task was to find a concrete set of N points Y in k dimensions so that the pairwise Euclidean distances betwwen all the pairs in the concrete set Y is a close approximation to the pair-wise distances given to us in the matrix DX i.e. we want to find DY such that
Note that after applying the “double centering” operation to both
X and
Y , equation above yields
BY=argmin∥BX−BY∥2F=DY=∥X~TX~−Y~TY~∥2F The above equation is a well known optimization problem that can be solved via Singular Value Decomposition(SVD) of BX .
BX≈UDUT=(UD12)(D12UT)=YTY~ Here, U is N by k matrix and D is k by k diagonal matrix with k largest singular values on the diagonal and Y~=D12UT is k by N matrix. Finally, we get N embedding points in k dimension as the column vectors of Y~
Reference:
转载地址:http://mnxdi.baihongyu.com/