This repositories show how to fine-tune a Vision Transformer model from TensorFlow Hub on the Image Scene Detection dataset.This dataset is the part of the competition which is Mobile AI Workshop @ CVPR 2021.
Sayan Nath
This repositories show how to fine-tune
a Vision Transformer model from TensorFlow Hub on the Image Scene Detection dataset.
A newly collected Camera Scene Classification dataset consisting of images belonging to 30 different classes. This dataset is the part of the competition which is Mobile AI Workshop @ CVPR 2021. You can find the dataset details here.
These models are available on TensorFlow Hub for Vision Transformer.
Note: As we want to fine-tune our model so we used the feature-extractor model and build the image classifier.
Sl No | Models | No of Parameters | Accuracy | Validation Accuracy |
---|---|---|---|---|
1 | ViT-S/16 | 21,677,214 | 99.73% | 96.87% |
2 | ViT R26-S/32(light aug) | 36,058,462 | 99.70% | 96.67% |
3 | ViT R26-S/32(medium aug) | 36,058,462 | 99.80% | 97.17% |
4 | ViT B/32 | 87,478,302 | 99.43% | 96.87% |
5 | MobileNetV3Small | 2,070,158 | 95.20% | 92.73% |
6 | MobileNetV2 | 2,929,246 | 95.06% | 88.89% |
7 | BigTransfer (BiT) | 99.53% | 96.97% |
Note: Last three results are benchmarked during thr CVPR Competition. You can find the repository here.
✅ ViT S/16
✅ ViT R26-S/32 (Light Augmentation)
✅ ViT R26-S/32 (Medium Augmentation)
✅ ViT B/32
⬜ ViT R50-L/32
⬜ ViT B/16
⬜ ViT L/16
⬜ ViT B/8
Sl No | Models | Colab Notebook | TensorBoard |
---|---|---|---|
1 | ViT-S/16 | Link | Link |
2 | ViT R26-S/32(light aug) | Link | Link |
3 | ViT R26-S/32(medium aug) | Link | Link |
4 | ViT B/32 | Link | Link |
Each directory of model contains the particular notebook, python script, metric graph, train-logs(in .csv) and TensorBoard callbacks.
[1] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al.
[4] jax2tf tool
[5] Image Classification with Vision Transformer in Keras
[6] ViT-jax2tf
[7] Vision Transformers are Robust Learners, Repository
[8] Vision Transformer TF-Hub Model Collection
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.