Unsupervised Learning
Principal Components Analysis (PCA) of colored faces
- The mean value of all 415 faces
- Use
numpy.linalg.svd
andnp.linalg.eig
to implement PCA - The first four eigenfaces of the first four largest eigenvalues
Visualization of Chinese word embedding
- Use
gensim
to implement word embedding - Use
TSNE
to implement dimension reduction - Plot the result 2D figure
Image clustering
- Determine if two images come from the same dataset
- The result of different feature extraction methods
Public set score | Private set score | |
---|---|---|
TSNE to all vectors to 2 dimension | 0.02954 | 0.02910 |
Autoencoder+DNN 128 64 epochs = 10, k-means | 0.52631 | 0.52531 |
Autoencoder+DNN 128 64 epochs = 200, k-means | 0.96402 | 0.96237 |
where $p=\frac{tp}{tp+fp}$, $r=\frac{tp}{tp+fn}$
prediction positive | prediction negative | |
---|---|---|
ground true positive | true positive (tp) | false negative (fn) |
ground true negative | false positive (fp) | true negative (tn) |
- Visualize all features with their labels
Result
- Achieved 225/333 (Top $68\%$) rank in the Kaggle competition