Visual Foundational Models

Image encoders are often trained for project-specific tasks, failing to capture an optimal representation of the medical scan. We aim to develop powerful encoders capable of generating high-quality embeddings that encode the complex relationships between pixels tuned to the medical domain. Subsequently, we work on various downstream tasks to utilize the embeddings for clinical use-cases, such as:

  • Automatic Radiology Report Generation with Image Grounding
  • Promptable Segmentation via Textual Queries