Dataset Access

MedAlign is a clinician-generated dataset for instruction following with electronic medical records.

Release Timeline

The MedAlign dataset contains:

  • 1314 clinician-generated instructions, 983 after removing duplicates;
  • 276 longitudinal EHRs;
  • 303 clinician-generated responses to instruction-EHR pairs.

All these assets will be shared in coming months, under a standard research DUA.

Additional Details

For more information, please read the main MedAlign paper.

Questions?

For questions and feedback, please post on the discussion board.