Unsupervised Learning
Principal Components Analysis (PCA) of colored faces
- The mean value of all 415 faces

- Use
numpy.linalg.svdandnp.linalg.eigto implement PCA - The first four eigenfaces of the first four largest eigenvalues

Visualization of Chinese word embedding
- Use
gensimto implement word embedding - Use
TSNEto implement dimension reduction - Plot the result 2D figure

Image clustering
- Determine if two images come from the same dataset
- The result of different feature extraction methods
| Public set score | Private set score | |
|---|---|---|
| TSNE to all vectors to 2 dimension | 0.02954 | 0.02910 |
| Autoencoder+DNN 128 64 epochs = 10, k-means | 0.52631 | 0.52531 |
| Autoencoder+DNN 128 64 epochs = 200, k-means | 0.96402 | 0.96237 |
where $p=\frac{tp}{tp+fp}$, $r=\frac{tp}{tp+fn}$
| prediction positive | prediction negative | |
|---|---|---|
| ground true positive | true positive (tp) | false negative (fn) |
| ground true negative | false positive (fp) | true negative (tn) |
- Visualize all features with their labels

Result
- Achieved 225/333 (Top $68\%$) rank in the Kaggle competition