Skip to main content

机器学习的训练指南

June 29, 2025 · 2 min read

基本概念

Linear model( $b + wx$ )
Sigmoid $[ y = b + \sum_i c_i \, \text{sigmoid} \left( b_i + \sum_j w_{ij}x_j \right) ]$
ReLU $[ y = b + \sum_{2i} c_i \max\left(0, b_i + \sum_j w_{ij}x_j\right) ]$
Multiple layers --> Deep

Gradient descent

Neuro --> Neural Network

Many layers mean Deep --> Deep learning
Deep = Many hidden layers

Question: Deep or "Fat" ?

Issue: Overfitting

数据

训练数据（Training data）： $\{(x^1, \hat{y}^1), (x^2, \hat{y}^2), \dots, (x^N, \hat{y}^N)\}$
测试数据（Testing data）： $\{x^{N+1}, x^{N+2}, \dots, x^{N+M}\}$
Speech Recognition
Image Recognition
Speaker Recognition
Machine Translation

训练步骤

指南

Model bias

The model is too simple: Redesign your model to make it more flexible.
- 增加输入的信息内容 (More features)
- Deep learning (More neurons, layers)

Optimization

Large loss not always imply model bias. There is other possibility.

Model Bias vs Optimization Issue

Gaining the insights from comparison

Start from shallower networks (or other models), which are easier to optimize.
If deeper networks do not obtain smaller loss on training data, then there is optimization issue.

Overfitting

Small loss on training data, large loss on testing data.
Solution
- Data
  - More training data
  - Data augmentation
- Constrained model (like $y=a+bx+cx^2$ $y = a + b x + c x^{2}$ )
  - Less parameters, sharing parameters
  - Less features
  - Early stopping
  - Regularization
  - Dropout
- But not contrain too much (like $y=a+bx$ , this is a issue of model bias)

Mismatch

Your training and testing data have different distributions.
Be aware of how data is generated.

对你训练资料和测试资料的产生方式有理解，才能知道是不是 Mismatch 问题。

Cross Validation

Training Set
- Training Set 90%
- Validation Set 10%
Testing Set
- public
- private

N-Fold Cross Validation

基本概念
数据
训练步骤
指南
Cross Validation
- N-Fold Cross Validation