Model Optimizations
In addition to tuning performance using ONNX Runtime configurations, there are techniques that can be applied to reduce model size and/or complexity to improve performance.
Table of contents
- Quantize ONNX models
- Float16 and mixed precision models
- Graph optimizations
- ORT model format
- ORT model format runtime optimization