Data Modeling Practices in Healthcare: A Review of Literature and Recommended Direction

Author: Rachel Masake

Primary Advisor: Jiajie Zhang, PhD

Committee Members: Craig W. Johnson, PhD

Masters thesis, The University of Texas School of Health Information Sciences at Houston.


Purpose The ability and ease of leveraging medical data for operations, patient care, and research is highly dependent on how the data is structured. Data modeling in healthcare is a challenging task because of the vastness of the domain, variety of participants, complexity of knowledge, and properties of medical data. We study data modeling practices in healthcare by assessing modeling purposes, methodologies and challenges presented by medical data properties. We identify gaps and explore the application of ontological concepts in addressing presented challenges. Methods A qualitative literature review of 38 data modeling projects indexed in PubMed and published between 1992 and October 2007 was done. Results Sixty six percent (25/38) of data modeling projects were found to be dual or tri-purpose. Generic data models were used in sixty percent (15/25) of multipurpose modeling projects. Seventy eight percent (30/38) of projects reused or were linked to previously developed models. Disparate data sources and diverse formats were the most addressed challenges. None of the project reviewed, addressed how unstructured data was modeled. Conclusion Data modeling projects in healthcare would benefit from multipurpose approaches that address challenges presented by disparate data sources, diverse formats and unstructured data. Healthcare data models can gain more flexibility, extensibility and be enriched by the application of ontological concepts.