Comprehensive overview of 12 machine learning models and their performance
vision
Deep residual network with skip connections enabling training of very deep networks
vision
Dense connectivity pattern where each layer receives feature maps from all preceding layers
vision
Modern ConvNet design inspired by Vision Transformers with improved training strategies
vision
Pure transformer architecture applied to image classification with patch-based processing
vision
Hierarchical vision transformer with shifted windowing for efficient computation
nlp
Bidirectional encoder representations from transformers for language understanding
nlp
Distilled BERT model with reduced parameters while maintaining performance
nlp
Robustly optimized BERT with improved training methodology
multimodal
Concatenates image and text features early in the processing pipeline
multimodal
Combines predictions from separate image and text models
multimodal
Uses learned attention weights to optimally combine multimodal features
classical
Ensemble of decision trees with random feature selection