6.5. 在线极速学习机

6.5.1. 简介

6.5.2. OSELM 原理

下面介绍 在线极速学习机 (Oniline Sequential Extreme Learning Machine, OS-ELM) 原理.

考虑如下ELM问题

Hβ=T\bm{H} \boldsymbol{\beta}=\bm{T}

其最小二乘解为

β^=(HTH)1HTT\hat{\boldsymbol{\beta}}=\left(\bm{H}^{\bm{T}} \bm{H}\right)^{-\bm{1}} \bm{H}^{\bm{T}} \bm{T}

如果 HTH\bm{H}^{\bm{T}} \bm{H} 趋于奇异, 可以选择较小的隐层神经元节点数 LL , 或者增加样本数量.

给定初始训练集 S0={(xi,ti)}i=1N0{\mathbb S}_0 = \{({\bm x}_i, {\bm t}_i)\}_{i=1}^{N_0} , 其中, N0N_0 为样本数, 且 N0>LN_0 > L . 从而有初始解

β0=K01H0TT0,  K0=H0TH0{\bm \beta_{0}}=\bm{K}_{0}^{-1} \bm{H}_{0}^{T} \bm{T}_{0}, \;\bm{K}_{0}=\bm{H}_{0}^{T} \bm{H}_{0}

假如又可以获得训练集 S1={(xi,ti)}i=N0+1N0+N1{\mathbb S}_1 = \{({\bm x}_i, {\bm t}_i)\}_{i=N_0+1}^{N_0+N_1} , 其中 N1N_1 为新样本数目, 问题变为最小化

[H0H1]β[T0T1]\left\| \left[ \begin{array}{l}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right] {\bm \beta}-\left[ \begin{array}{l}{\bm{T}_{0}} \\ {\bm{T}_{1}}\end{array}\right] \right\|

其解为:

β1=K11[H0H1]T[T0T1],  K1=[H0H1]T[H0H1]\boldsymbol{\beta}_{1}=\bm{K}_{1}^{-1} \left[ \begin{array}{c}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right]^{T} \left[ \begin{array}{c}{\bm{T}_{0}} \\ {\bm{T}_{1}}\end{array}\right], \; \bm{K}_{1}=\left[ \begin{array}{l}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right]^{T} \left[ \begin{array}{l}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right]

又因为

K1=[H0TH1T][H0H1]=K0+H1TH1\begin{aligned} \bm{K}_{1} &=\left[ \begin{array}{cc}{\bm{H}_{0}^{T}} & {\bm{H}_{1}^{T}}\end{array}\right] \left[ \begin{array}{l}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right] \\ &=\bm{K}_{0}+\bm{H}_{1}^{T} \bm{H}_{1} \end{aligned}

[H0H1]T[T0T1]=H0TT0+H1TT1=K0K01H0TT0+H1TT1=K0β0+H1TT1=(K1H1TH1)β0+H1TT1=K1β0H1TH1β0+H1TT1\begin{aligned} \left[ \begin{array}{c}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right]^{T} \left[ \begin{array}{c}{\bm{T}_{0}} \\ {\bm{T}_{1}}\end{array}\right] &=\bm{H}_{0}^{T} \bm{T}_{0}+\bm{H}_{1}^{T} \bm{T}_{1} \\ &=\bm{K}_{0} \bm{K}_{0}^{-1} \bm{H}_{0}^{T} \bm{T}_{0}+\bm{H}_{1}^{T} \bm{T}_{1} \\ &=\bm{K}_{0} {\bm \beta_{0}}+\bm{H}_{1}^{T} \bm{T}_{1} \\ &=\left(\bm{K}_{1}-\bm{H}_{1}^{T} \bm{H}_{1}\right) {\bm \beta_{0}}+\bm{H}_{1}^{T} \bm{T}_{1} \\ &=\bm{K}_{1} {\bm \beta_{0}}-\bm{H}_{1}^{T} \bm{H}_{1} \boldsymbol{\beta}_{0}+\bm{H}_{1}^{T} \bm{T}_{1} \end{aligned}

所以, β1{\boldsymbol \beta}_1 可以表示成 β0{\boldsymbol \beta}_0 的函数

β1=K11[H0H1]T[T0T1]=K11(K1β0H1TH1β0+H1TT1)=β0+K11H1T(T1H1β0)\begin{aligned} {\bm \beta_{1}} &=\bm{K}_{1}^{-1} \left[ \begin{array}{c}{\bm{H}_{0}} \\ {\bm{H}_{1}}\end{array}\right]^{T} \left[ \begin{array}{c}{\bm{T}_{0}} \\ {\bm{T}_{1}}\end{array}\right] \\ &=\bm{K}_{1}^{-1}\left(\bm{K}_{1} \boldsymbol{\beta}_{0}-\bm{H}_{1}^{T} \bm{H}_{1} \boldsymbol{\beta}_{0}+\bm{H}_{1}^{T} \bm{T}_{1}\right) \\ &=\boldsymbol{\beta}_{0}+\bm{K}_{1}^{-1} \bm{H}_{1}^{T}\left(\bm{T}_{1}-\bm{H}_{1} {\bm \beta_{0}}\right) \end{aligned}

其中, K1=K0+H1TH1\bm{K}_{1}=\bm{K}_{0}+\bm{H}_{1}^{T} \bm{H}_{1} . 依次类推, 可以得到权重 β\bm \beta 的迭代更新公式

(6.6)Kk+1=Kk+Hk+1THk+1βk+1=βk+Kk+11Hk+1T(Tk+1Hk+1βk),\begin{aligned} \bm{K}_{k+1} &=\bm{K}_{k}+\bm{H}_{k+1}^{T} \bm{H}_{k+1} \\ {\bm \beta}_{k+1} &={\bm \beta}_{k}+\bm{K}_{k+1}^{-1} \bm{H}_{k+1}^{T}\left(\bm{T}_{k+1}-\bm{H}_{k+1} \boldsymbol{\beta}_{k}\right), \end{aligned}

其中, Kk1RL×L\bm{K}_{k}^{-1} \in {\mathbb R}^{L \times L} , HkRNk×L\bm{H}_{k} \in {\mathbb R}^{N_k\times L} , βkRL×m{\bm \beta}_k \in {\mathbb R}^{L \times m} , TkRNk×m{\bm T}_k \in {\mathbb R}^{N_k\times m} k=1,2,k=1,2,\cdots . 根据输出权重的迭代更新公 式.6.6 可知, 每次更新权重都需要计算一次一个 Kk+11RL×L\bm{K}_{k+1}^{-1} \in {\mathbb R}^{L \times L} 的逆.

由 Sherman-Morrison-Woodbury 等 式.7.2 (参见 矩阵论常用总结 小结) 知

Kk+11=(Kk+Hk+1THk+1)1=Kk1Kk1Hk+1T(I+Hk+1Kk1Hk+1T)1Hk+1Kk1.\begin{aligned} \bm{K}_{k+1}^{-1} &=\left(\bm{K}_{k}+\bm{H}_{k+1}^{T} \bm{H}_{k+1}\right)^{-1} \\ &=\bm{K}_{k}^{-1}-\bm{K}_{k}^{-1} \bm{H}_{k+1}^{T}\left(\bm{I}+\bm{H}_{\bm{k}+1} \bm{K}_{\bm{k}}^{-1} \bm{H}_{\bm{k}+1}^{\mathrm{T}}\right)^{-1} \bm{H}_{k+1} \bm{K}_{k}^{-1}. \end{aligned}

若记 Pk+1=Kk+11{\bm P}_{k+1} = {\bm K}^{-1}_{k+1} , 则有如下迭代公式

(6.7)Pk+1=PkPkHk+1T(I+Hk+1PkHk+1T)1Hk+1Pkβk+1=βk+Pk+1Hk+1T(Tk+1Hk+1βk)\begin{aligned} \bm{P}_{k+1} &=\bm{P}_{k}-\bm{P}_{k} \bm{H}_{k+1}^{T}\left(\bm{I}+\bm{H}_{k+1} \bm{P}_{k} \bm{H}_{k+1}^{T}\right)^{-1} \bm{H}_{k+1} \bm{P}_{k} \\ \boldsymbol{\beta}_{k+1} &=\boldsymbol{\beta}_{k}+\bm{P}_{k+1} \bm{H}_{k+1}^{T}\left(\bm{T}_{k+1}-\bm{H}_{k+1} \boldsymbol{\beta}_{k}\right) \end{aligned}

其中, IRNk×Nk{\bm I}\in{\mathbb R}^{N_k\times N_k} 为单位矩阵, 当 Nk<LN_k < L 时, 式.6.7 可以减小计算量.

若样本以 one-by-one 的形式输入 OSELM, 则更新公 式.6.7 简化为

(6.8)Pk+1=PkPkhk+1hk+1TPk1+hk+1TPkhk+1βk+1=βk+Pk+1hk+1(tk+1Thk+1Tβk).\begin{array}{l}{\bm{P}_{k+1}=\bm{P}_{k}-\frac{\bm{P}_{k} \bm{h}_{k+1} \bm{h}_{k+1}^{T} \bm{P}_{k}}{1+\bm{h}_{k+1}^{T} \bm{P}_{k} \bm{h}_{k+1}}} \\ {{\bm \beta}_{k+1}={\bm \beta}_{k}+\bm{P}_{k+1} \bm{h}_{k+1}\left(\bm{t}_{k+1}^{T}-\bm{h}_{k+1}^{T} {\bm \beta}_{k}\right)}\end{array}.

上述在线训练算法总结如下:

小技巧

输入: 样本序列 S0={(xi,ti)}i=1N0\mathbb{S}_{0}=\left\{\left(\bm{x}_{i}, \bm{t}_{i}\right)\right\}_{i=1}^{N_{0}} , S1={(xi,ti)}i=N0+1N0+N1{\mathbb S}_{1}=\left\{\left(\bm{x}_{i}, \bm{t}_{i}\right)\right\}_{i=N_{0}+1}^{N_{0}+N_{1}} , \cdots , Sk+1={(xi,ti)}i=j=1kNj+1j=1k+1Nj{\mathbb S}_{k+1}=\left\{\left(\bm{x}_{i}, \bm{t}_{i}\right)\right\}_{i=\sum_{j=1}^k N_{j}+1}^{\sum_{j=1}^{k+1} N_{j}} , k=1,2,,Kk=1, 2, \cdots, K .

输出: 更新输出层权重 βK{\bm \beta}_{K}

  1. 初始训练(k=0k=0, N0>LN_0 > L):

    计算初始样本集 S0{\mathbb S}_0 的隐藏层输出 H0{\bm H}_0 , 估计初始权重 β0=P0H0TT0{\bm \beta}_0 = \bm{P}_{0} \bm{H}_{0}^{T} \bm{T}_{0} , 其中 P0=(H0TH0)1\bm{P}_{0}=\left(\bm{H}_{0}^{T} \bm{H}_{0}\right)^{-1} , T0=[t1,,tN0]T\bm{T}_{0}=\left[\bm{t}_{1}, \dots, \bm{t}_{N_{0}}\right]^{T} .

  2. 序列学习(k=1,2,,Kk=1, 2, \cdots, K, NkN_k 为样本数):

    • Nk>1N_k > 1 , 计算第 k+1k+1 个样本集 Sk+1{\mathbb S}_{k+1} 的隐藏层输出 Hk+1{\bm H}_{k+1}, 并根据 式.6.7 更新权重;

    • Nk=1N_k = 1 , 计算第 k+1k+1 个样本集 Sk+1{\mathbb S}_{k+1} 的隐藏层输出 hk+1{\bm h}_{k+1}, 则根据 式.6.11 更新权重;

    k=Kk = K 时停止训练, 并输出权重 βK{\bm \beta}_{K} .