Skip to main content

机器学习的训练指南

· 2 min read

基本概念

  • Linear model(b+wxb + wx)
  • Sigmoid [y=b+icisigmoid(bi+jwijxj)][ y = b + \sum_i c_i \, \text{sigmoid} \left( b_i + \sum_j w_{ij}x_j \right) ]
  • ReLU [y=b+2icimax(0,bi+jwijxj)][ y = b + \sum_{2i} c_i \max\left(0, b_i + \sum_j w_{ij}x_j\right) ]
  • Multiple layers --> Deep

Gradient Descent

Neuro --> Neural Network

Many layers mean Deep --> Deep learning
Deep = Many hidden layers

Question: Deep or "Fat" ?

Issue: Overfitting

数据

  • 训练数据(Training data):{(x1,y^1),(x2,y^2),,(xN,y^N)}\{(x^1, \hat{y}^1), (x^2, \hat{y}^2), \dots, (x^N, \hat{y}^N)\}

  • 测试数据(Testing data):{xN+1,xN+2,,xN+M}\{x^{N+1}, x^{N+2}, \dots, x^{N+M}\}

  • Speech Recognition

  • Image Recognition

  • Speaker Recognition

  • Machine Translation

训练步骤

指南

Model bias

  • The model is too simple: Redesign your model to make it more flexible.
    • 增加输入的信息内容 (More features)
    • Deep learning (More neurons, layers)

Optimization

  • Large loss not always imply model bias. There is other possibility.

Model Bias vs Optimization Issue

  • Gaining the insights from comparison

  • Start from shallower networks (or other models), which are easier to optimize.
  • If deeper networks do not obtain smaller loss on training data, then there is optimization issue.

Overfitting

  • Small loss on training data, large loss on testing data.

  • Solution

    • Data
      • More training data
      • Data augmentation
    • Constrained model (like y=a+bx+cx2y=a+bx+cx^2)
      • Less parameters, sharing parameters
      • Less features
      • Early stopping
      • Regularization
      • Dropout
    • But not contrain too much (like y=a+bxy=a+bx, this is a issue of model bias)

Mismatch

  • Your training and testing data have different distributions.
  • Be aware of how data is generated.

对你训练资料和测试资料的产生方式有理解,才能知道是不是 Mismatch 问题。

Cross Validation

  • Training Set
    • Training Set 90%
    • Validation Set 10%
  • Testing Set
    • public
    • private

N-Fold Cross Validation