On-Edge Deployment of Vision Transformers for Medical Diagnostics Using the Kvasir-Capsule Dataset

This paper aims to explore the possibility of utilizing vision transformers (ViTs) for on-edge medical diagnostics by experimenting with the Kvasir-Capsule image classification dataset, a large-scale image dataset of gastrointestinal diseases.Quantization techniques made available through TensorFlow Lite (TFLite), including post-training float-16 (F16) quantization and quantization-aware training (QAT), are applied to achieve reductions in model size, without compromising performance.The seven ViT models selected for this study are EfficientFormerV2S2, EfficientViT_B0, EfficientViT_M4, MobileViT_V2_050, MobileViT_V2_100, MobileViT_V2_175, and RepViT_M11.

Three metrics are considered when analyzing Construction of multi-modal social media dataset for fake news detection a model: (i) F1-score, (ii) model size, and (iii) performance-to-size ratio, where performance is the F1-score and size is the model size in megabytes (MB).In terms of F1-score, we show that MobileViT_V2_175 with F16 quantization outperforms all other models with an F1-score of 0.9534.

On the other hand, MobileViT_V2_050 trained using QAT was scaled down to a model size of 1.70 MB, making it the smallest model amongst the variations this paper Crossing Gender Boundaries or Challenging Masculinities? Female Combatants in the Kenya Defence Forces’ (KDF) War against Al-Shabaab Militants examined.MobileViT_V2_050 also achieved the highest performance-to-size ratio of 41.

25.Despite preferring smaller models for latency and memory concerns, medical diagnostics cannot afford poor-performing models.We conclude that MobileViT_V2_175 with F16 quantization is our best-performing model, with a small size of 27.

47 MB, providing a benchmark for lightweight models on the Kvasir-Capsule dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *