Hugging Face has released new capabilities within Sentence Transformers that allow developers to train and fine-tune multimodal embedding and reranker models. The framework expansion enables practitioners to work with models that process both text and images simultaneously, extending beyond traditional text-only embeddings. This development addresses growing demand for more sophisticated model architectures that can handle complex, multi-format data in real-world applications. The new training methodologies provide practitioners with practical tools to customize embedding and reranking models for domain-specific use cases. By leveraging Sentence Transformers' established infrastructure, developers can now build systems that understand semantic relationships across different modalities. The release includes documentation and examples to facilitate adoption among the open-source machine learning community.