Maria Ma


Hi, I’m Maria Ma. I got my MPH from the UC Berkeley School of Public Health, with a focus in Infectious Disease and Vaccinology. I took a bunch of classes in the Epidemiology department. My undergraduate degree was in Microbiology.

In my past life, I was a synthetic biologist working on filamentous fungi and yeast, making enzymes and yeast.

I am based in the San Francisco Bay Area, and am currently working as a data analyst for the value-based care company Aledade. Previously, I was an Epidemiologist for LA County Department of Public Health, and a Data Scientist/ Impact Analyst at a global health nonprofit. Email me at i.am.maria.ma at gmail.

What I Do

I move numbers around, deduplicate records, set up automation. I write reproducible statistical analysis code (R), turn data into figures, then write about those figures in the most accessible way I can. Sometimes, I turn these into packages for internal use. Most of what I do is data cleaning and setting up adhoc “databases”.

I like creating tools for other people to use, to help the organization use their data better. This includes both dashboards and other automation tools, using whatever interface is most accessible to my teammates. My current work has me in a role where much of that work is done with Google Sheets and Microsoft Excel.

  • PS: I am very good at Google Sheets

What I’m Good At

I love designing elegant studies, using both observational data and setting up data collection. This mostly means I’m great at thinking about causal inference. I’m also pretty good at R (tidyverse and ggplot), and data visualization. I approach all of my work with failure mode analysis always in the back of my head. Healthcare is my domain; I see it as my duty to make sure we don’t harm people.

I really like Sankey diagrams.

I speak conversational Mandarin Chinese, I understand Shanghainese, and I’m slowly learning French.

Interests

I’m interested in improving quality of healthcare, and I believe in the promise of technology to achieve this.

Projects

  • COVID-19 Outcomes by Vaccination Status
    • A dashboard I built for LA County.
  • Synthetic Health Data with Agent Based Models
    • Agent based models/ “state machines” seem to me to be the most reasonable and reliable way to generate synthetic health data. Using existing machine learning approaches can generate flawed data that doesn’t make sense due to a time & state-dependent componnt, eg, a woman giving birth shortly after having a miscarriage. Existing simulation models work off of robust data sources which are also generally tailored for a Global North context. The data we often see in rural settings in the Global South is much less rich and often has significant gaps. This project is in progress, my goal is to try to generate a synthetic dataset that also mimics the missingness of the data. A synthetic dataset such as this could help people build and test tools (and maybe even algorithms) without potentially exposing real peoples’ health data.
  • My comprehensive Masters paper
    • In this project, I took a novel approach to modeling antibiotic resistance by trying to make it relate to how much antibiotic is used, as well as mutation rates. Ultimately, I wanted to try to model antibiotic resistant diseases and estimate the economic impact that they might have. This is primarily theoretical work, as there weren’t accessible databases that I could use to estimate parameters with.
  • Hospital closures and rural access project - Write up here
    • This project uses Python and R to look at access to emergency care across the US. Rural hospital closures impact health access for an already disadvantaged population. I was interested in seeing by how much. A few months after I finished this project, Pew came up with a similar analysis, that used survey data.
  • An effort to better understand the 2017 Cholera epidemic in Yemen - Write up here
    • A project using R to clean up and visualize data related to the ongoing cholera epidemic in Yemen. This crisis was not getting quite the attention it deserved.

I’m compiling a list of robust, publically available data sources. You can find that here.

Other projects (for fun)

  • sea_forager
    • A function that scrapes tide-forecast.com to look for the best upcoming times for tidal foraging.
  • bakeR
    • A function to help scale up or scale down recipes

Where else to find me

My LinkedIn is here.

My Medium account is here, where I try to write things about public health and R.

My Tableau Public account is here, where you can browse some of the other things I’ve worked on.