ViT-TF-Hub-Application

10

1

Sayan Nath

Added on September 14, 2024

This repositories show how to fine-tune a Vision Transformer model from TensorFlow Hub on the Image Scene Detection dataset.This dataset is the part of the competition which is Mobile AI Workshop @ CVPR 2021.

ViT-TF-Hub-Application

This repositories show how to fine-tune a Vision Transformer model from TensorFlow Hub on the Image Scene Detection dataset.This dataset is the part of the competition which is Mobile AI Workshop @ CVPR 2021.

Sayan Nath

README.md

Vision Transformer TF-Hub Application

PngItem_3011351 (1)

Description

This repositories show how to fine-tune a Vision Transformer model from TensorFlow Hub on the Image Scene Detection dataset.

Dataset Used

A newly collected Camera Scene Classification dataset consisting of images belonging to 30 different classes. This dataset is the part of the competition which is Mobile AI Workshop @ CVPR 2021. You can find the dataset details here.

Models

These models are available on TensorFlow Hub for Vision Transformer.

Image Classifiers

Feature Extractors

Note: As we want to fine-tune our model so we used the feature-extractor model and build the image classifier.

Benchmark Results

Sl No Models No of Parameters Accuracy Validation Accuracy
1 ViT-S/16 21,677,214 99.73% 96.87%
2 ViT R26-S/32(light aug) 36,058,462 99.70% 96.67%
3 ViT R26-S/32(medium aug) 36,058,462 99.80% 97.17%
4 ViT B/32 87,478,302 99.43% 96.87%
5 MobileNetV3Small 2,070,158 95.20% 92.73%
6 MobileNetV2 2,929,246 95.06% 88.89%
7 BigTransfer (BiT) 99.53% 96.97%

Note: Last three results are benchmarked during thr CVPR Competition. You can find the repository here.

Notebooks

ViT S/16
ViT R26-S/32 (Light Augmentation)
ViT R26-S/32 (Medium Augmentation)
ViT B/32
ViT R50-L/32
ViT B/16
ViT L/16
ViT B/8

Links

Sl No Models Colab Notebook TensorBoard
1 ViT-S/16 Link Link
2 ViT R26-S/32(light aug) Link Link
3 ViT R26-S/32(medium aug) Link Link
4 ViT B/32 Link Link

Each directory of model contains the particular notebook, python script, metric graph, train-logs(in .csv) and TensorBoard callbacks.

References

[1] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al.

[2] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers by Steiner et al.

[3] Vision Transformer GitHub

[4] jax2tf tool

[5] Image Classification with Vision Transformer in Keras

[6] ViT-jax2tf

[7] Vision Transformers are Robust Learners, Repository

[8] Vision Transformer TF-Hub Model Collection

Acknowledgements

  • Thanks to Sayak Paul for building the models of ViT so that we can use Vision Transformer in a straight way.
  • Thanks to the authors of Vision Transformers for their efforts put into open-sourcing the models.

Contributors

Related Content

YoloV7 in Tensorflow.js

This repository is an implementation of Yolov7 using Tensorflow.js. The code runs directly on the browser and the detector was trained on the MS COCO dataset to recognizes up to 80 different classes.
GitHubUpdated 17 months ago

machine_learning_complete

This is a comprehensive repository containing 30+ notebooks on Python programming, data manipulation, data analysis, data visualization, data cleaning, classical machine learning, Computer Vision and Natural Language Processing(NLP).
GitHubUpdated 18 months ago

Semantic Sementation model within ML pipeline

This repository shows how to build a Machine Learning Pipeline for Semantic Segmentation task with TensorFlow Extended(TFX) and various GCP products such as Vertex Pipeline, Training, and Enpoint.
GitHubUpdated 21 months ago

MLOps for Vision Models from 🤗 with TFX

This repository shows how to build Machine Learning pipeline for a vision model (TensorFlow) from 🤗 Transformers using the TensorFlow Ecosystem.
GitHubUpdated 29 months ago

Usage of TF based SegFormer in transformers

This repository demonstrates how to use TensorFlow based SegFormer model in 🤗 transformers package with Jupyter Notebook and Gradio application.
GitHubUpdated 32 months ago

We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.