Research Grade Data

scribbles made on



Prompted by work that I did for a fantastic mission driven client & partner,, I consolidated my personal writings from the last five years working with RWE or RWE data infrastructure. The goal of this piece is to delineate clearly what Research Grade data entails, with concrete examples of how to get there.

First of all, what's Real World Evidence (RWE) anyway? It’s high-quality data collected in the real-world (not clinical trials).

RWE data + (stats, simulations, ML) = RWE research

Through RWE research we unleash the power of our network & learn from each others’ journeys. No one suffers in vain.

We can study continuously, more efficiently, and cut R&D costs. Therapies can get to market faster and for cheaper.

Some examples of RWE:

  • Label extensions
  • Synthetic control arm (ethics, yo)
  • Drug evaluation in the wild!

Check out some examples here of published RWE from my previous alma matter, or scroll through my portfolio for a taste of recent publications.

Tenants of RWE

  • True
  • Interoperable
  • Comprehensive
  • Generalizable
  • Timely
  • Scalable


Did Lisa actually receive a Ketamine infusion on 1/1/2021?

  • Data that is true is clean, traceable & auditable.

What we can do:

  • Obsessive validations on the UI
  • Implementing strict historicity of the dataset
  • Cross workflow validation
  • Use clinically validated frameworks for asking questions


How many times did Lisa get infused with Ketamine last year? How many people in CA in the network started SSRIs at some point in 2020?

Datasets are harmonized to clinical data ontologies, can be aggregated and explored in time-series.

What we can do:

  • Standardize medications to RXNORM
  • Standardize procedures, labs, vitals, surgeries to LOINC
  • Standardize therapy modalities & protocols
  • Standardize race/ethnicity/gender/sexuality against census


We know that Lisa got an infusion a few months ago, but what therapy was she on last year? What was her first line of therapy for depression? Has she taken hormones before?

We will strive to have the-closest-to-complete patient medical records.

What we can do:

  • Think of the full picture when converting. Was there previous data that you could import?
  • CCDA Referral interface: how can we get more data interoperability between providers?
  • Code defensively, with an eye for the fact that production breaks could impair patient data continuity
  • Build in additional data streams: ADT, HIE, wearables, etc.


Will research findings be generalizable to the wider population who suffers from the targeted disease / symptoms? Who will this intervention work for?

With the rise of precision medicine, a therapy doesn’t have to work for everyone but ideally we should understand who it’s working for. And we shouldn’t just do research on rich white people (historically the case).

What we can do:

  • Welcome as many geographies, genders, races, ethnicities, sexualities and backgrounds into our patient population 🤗
  • Make it easy for patients to report on demographics (audio assistance, voice assistance)
  • Think about access and socioeconomic diversity: how can software make care more affordable and thus your data more generalizable?
  • Partner with care coordinating platforms that could help more patients get access / coverage


What’s the last data point that you have on Jane Doe? Can we reach out to her for a research intervention?

What we can do:

  • Augment datasets with mortality data and drop off end points
  • Make patient/provider experience smooth to avoid drop-off


Do you have a sizable population (adjusted for occurance of the disease in the general population) who is undergoing the treatment therapy or a control therapy?

The power of insights is in heterogenous, large datasets. We cannot be abstracting patient charts and curating clinical trial staffing by hand.

What we can do:

  • Make all data inputs as structured as they can be (avoids manual curation)
  • Enable quick and foul-proof data entry for providers and patients (so we have higher adherence)
  • Write models to abstract unstructured notes into structured notes

questions? want to chat? you can find me at