\[
L(x_1, x_2, ... x_n, \lambda_1, ... \lambda_k) = f(x_1, x_2, ... x_n) - \sum_{i=1}^{k} \lambda_i g_i(x_1, x_2, ... x_n)
\]
其中 f 為想要最佳化的函數,g 為 k 個約束的函數,\(\lambda\) 為 multipliers(乘數)。Lagrange Multipliers 的證明比較複雜,因此以下用一個例子來解釋:
\[
Minimize\ f(x) = x_1^2 + x_2^2
\\
subject\ to\ g(x) = a_1x_1 + a_2x_2 = b
\]
先寫成 Lagrangian 的形式:
\[
L(x, \lambda) = x_1^2 + x_2^2 + \lambda(a_1x_1 + a_2x_2 - b)
\]
對 \(x_1, x_2, \lambda\) 偏微分:
\[
\frac{\partial L}{\partial x_1} = 2x_1 + \lambda a_1 = 0
\\
\frac{\partial L}{\partial x_2} = 2x_2 + \lambda a_2 = 0
\\
\frac{\partial L}{\partial \lambda} = a_1x_1 + a_2x_2 - b = 0
\]
由以上三個方程式求出三個變數:
\[
x_1^* = \frac{a_1b}{a_1^2 + a_2^2}
\\
x_2^* = \frac{a_2b}{a_1^2 + a_2^2}
\\
\lambda = \frac{-2b}{a_1^2 + a_2^2}
\]
將 \(x_1^*, x_2^*\) 代入 f(x) 可得到極值:
\[
(x_1^*)^2 + (x_2^*)^2 = \frac{b^2}{a_1^2 + a_2^2}
\]
以上很簡單地介紹 Lagrange Multipliers。(註:上面的例子是 Linear Algebra and Learning from Data [2] 書中 VI.2 的例子。)
參考資料
[1] https://en.wikipedia.org/wiki/Lagrange_multiplier[2] https://math.mit.edu/~gs/learningfromdata/
沒有留言:
張貼留言