About OHDSI Viewer

OHDSI Analysis Viewer

Table of contents

  1. Introduction
  2. How to use the viewer
  3. Analysis types
    1. Characterization
    2. Population-level effect estimation
    3. Patient-level prediction

Introduction

This is an interactive shiny app for exploring standardized outputs for OHDSI analyses including:

  • characterization (descriptive studies)
  • population-level effect estimation(causal inference)
  • patient-level prediction (inference)

Full details of all the analysis tools can be found on the HADES website

How to use the viewer

Please use the left hand menu to select the type of analysis to explore (click on a button). This show the results that can be interactively explored.

Analysis types

Characterization

The OHDSI community have developed a suite of tools for conducting characterization studies including:

  • incidence rate calculation
  • baseline characterization
  • treatment pathways
  • and more

Population-level effect estimation

The OHDSI community have developed several packages that enable users with data in the OMOP common data model to perform causal inference studies.

Patient-level prediction

The OHDSI community have developed several packages that enable users with data in the OMOP common data model to develop and validate patient-level prediction models.

Cohort Level Diagnostics

Cohort Definition

Export Cohorts Zip
Loading...

Filter logical values with "T" and "F"

Loading...
Loading...
Loading...

                                  

                                

                                

Cohort Counts

Description

A table showing the number of cohort entries and unique subjects per cohort per data source. Because one person can have more than one cohort entry, the number of entries can be higher than the number of persons.

Options

You may select multiple data sources in the side bar to see counts from different data sources side-by-side.

What to look for

  • Are there cohorts that are empty in some data sources?
  • Are the relative counts (relative to the other cohorts in the same data source) comparable across data sources? Note that the color bars show the relative counts.
  • Are the cohorts of expected and sufficient size? For example, if we want to study the effect of an exposure, a rule-of-thumb is that we require at least 2,500 in the exposure cohort.
Loading...

Inclusion Rule Statistics

Loading...

Index Events

Description

A table showing the concepts belonging to the concept sets in the entry event definition that are observed on the index date. In other words, the table lists the concepts that likely triggered the cohort entry. The counts indicate number of cohort entries where the concepts was observed on the index date. Note that multiple concepts can be present on the index date, so the sum of counts might be greater than the cohort entry count.

Options

You can select multiple databases in the side bar to see counts from different databases side-by-side.

Select the cohort to explore in the side bar.

What to look for

  • Is one concept unexpectedly dominating? For example, if our cohort identifies exposure to drugs in a class, but we notice almost everyone enters the cohort based on a single drug, we may wonder whether our results will generalize to the class.
  • Are the highest ranking concepts different across databases? For example, is everyone in one database initiating high-dose prescriptions, and everyone in another database low-dose prescriptions?
       
Loading...

Cohort Characterization

Description

A table showing cohort characteristics (covariates). These characteristics are captured on or before the cohort start date. There is a Pretty and a Raw version of this table.

The Pretty table shows the standard OHDSI characteristics table, which includes only covariates that were manually selected to provide a general overview of the comorbidities and medications of the cohort. These are all binary covariates, and the table shows the proportion (%) of the cohort entries having the covariate.

The Raw table shows all captured covariates. These include binary and continuous covariates (e.g. the Charlson comorbidity index). For each covariate the table lists the mean, which for binary covariates is equal to the proportion, and the standard deviation (SD).

Options

You can select multiple databases in the side bar to see cohort characteristics from different databases side-by-side in the same table.

Select the cohort to explore in the side bar.

Select either the Pretty or the Raw table at the top of the table.

What to look for

  • Are the characteristics of the cohort as expected? For example, do people have the expected comorbidities?
  • Do the characteristics of the cohort differ much per database?
Loading...

Percentage displayed where only proportional data is selected

Loading...
Loading...

Compare Cohort Characterization

Description

A table or plot showing cohort characteristics (covariates) for two cohorts side-by-side. These characteristics are captured at different time windows that can be selected

The Raw table shows all captured covariates. These include binary and continuous covariates (e.g. the Charlson comorbidity index). For each covariate the table lists the mean, which for binary covariates is equal to the proportion, the standard deviation (SD), and the standardized difference of the mean (StdDiff).

The plot shows all covariates, include binary and continuous covariates. The x-axis represents the mean value in the target cohort, the y-axis the mean value in the comparator cohort. Each dot represents a covariate, and the color indicates the domain of the covariate being plotted. In the plot, domains are fixed (even though additional domains may exist in data) to ensure the color of the domains are consistently applied.

Filters maybe used to limit the number of covariates being visualized/tabulated. Filters are available for analysis names and domain names.

You can either select different cohorts in the same database, the same cohort in different database or different cohorts in different databases

What to look for

  • Are there major differences between the two cohorts? For example, if we wish to compute a propensity score between two cohorts, concepts that have very high proportion in one cohort and a very low proportion in the other may lead to a perfectly predictive model.
  • In general, how comparable are two cohorts? If we wish to compare two exposures, but the cohorts differ over many characteristics, we may be able to fit a propensity model and compute an estimate, but we may have concerns over the generalizability of the results.

Compare cohort characterization

Loading...
Loading...

Cohort Overlap (subjects)

Description

Stacked bar graph showing the overlap between two cohorts, and a table listing several overlap statistics.

The stacked bar shows the overlap in terms of subjects. It shows the number of subjects that belong to each cohort and to both. The diagram does not consider whether the subjects were in the different cohorts at the same time.

The table show the same information and more:

  • Subject in either cohort: The number of subjects that enter one or both cohorts. (The union)
  • Subject in both cohort: The number of subjects that enter both cohorts, although not necessarily at the same time. (The intersection)
  • Subject in target not in comparator: The number of subjects that enter the target cohort, but not the comparator cohort. (Subtracting the comparator from the target)
  • Subject in comparator not in target: The number of subjects that enter the comparator cohort, but not the target cohort. (Subtracting the comparator from the target)
  • Subject in target before comparator: The number of subjects that enter both cohorts, but enter the target cohort before entering the comparator cohort. This number considers only the first entry per cohort per person.
  • Subject in comparator before target: The number of subjects that enter both cohorts, but enter the comparator cohort before entering the target cohort. This number considers only the first entry per cohort per person.
  • Subject in target and comparator on same day: The number of subjects that enter both cohorts on the same date. This number considers only the first entry per cohort per person.
  • Subject having target start during comparator: The number of subjects that enter the target cohort during the comparator cohort, meaning comparator cohort start date <= target cohort start date <= comparator cohort end date. This number considers only the first entry per cohort per person.
  • Subject having comparator start during target: The number of subjects that enter the comparator cohort during the target cohort, meaning target cohort start date <= comparator cohort start date <= target cohort end date. This number considers only the first entry per cohort per person.

Options

You can select one or more database in the side bar.

You can select the (target) cohort(s) and comparator cohort(s) in the side bar.

What to look for

  • Are there many people in both cohorts? For example, if we want to compare two exposures, are there many people that receive both?
  • Is the overlap of sufficient size for a specific research question? For example, if we wish to study the effect of an exposure on an outcome, we may require a minimum number of outcomes during exposure.
Loading...
Loading...

Orphan Concepts

Description

A table showing the concept(s) observed in the datasource that are not included in a concept set of a cohort, but maybe considered. The following logic is used to identify concepts that might be relevant:

  1. Given a concept set expression, find all included concepts.
  2. Find all names of those concepts, including synonyms, and the names of source concepts that map to them.
  3. Search for concepts (standard and source) that contain any of those names as substring.
  4. Filter those concepts to those that are not in the original set of concepts (i.e. orphans).
  5. Restrict the set of orphan concepts to those that appear in the CDM data source as either source concept or standard concept.

The Subjects column contains the number of subjects in the entire data source that have the specific concept, i.e. it is not restricted to people in the cohort. This is a data source level characterization. Source concepts are identified in the _source_concept_id fields of the Common Data Model, (e.g. drug_source_concept_id) and are used to identify the specific source codes used in a data source. Standard concepts are found using the _concept_id fields (e.g. drug_concept_id), and use the same coding system across all databases.

Options

You can select a data source in the side bar to see the concepts and counts observed in that data source.

Select the cohort and the specific concept set within that cohort to explore in the side bar.

What to look for

  • Are there concepts that are not included in the concept but should be? Note that the provided list likely contains many false positives.
       
Loading...

Execution meta-data

Each entry relates to execution on a given cdm. Results are merged between executions incrementally

Loading...


                                    

Concepts in Data Source

Description

A table showing the concept ids observed in the database that are included in a concept set(s) of the selected cohort. The Subjects column contains the number of subjects in the entire database that have the specific concept. This count is not restricted to people in the cohort - but represents a database level characterization. Source concepts are identified in the _source_concept_id fields of the Common Data Model, (e.g. drug_source_concept_id) and are used to identify the specific source codes used in a database. Standard concepts are found using the _concept_id fields (e.g. drug_concept_id), and use the same coding system across all databases. Note: Per CDM conventions standard concept ids, may be used to populate _source_concept_id fields in domain tables, but non-standard concept ids may not be used to populate the standard fields in those domain tables.

Options

You can select a database in the side bar to see the concepts and counts observed in that database.

Select the cohort and the specific concept set within that cohort to explore in the side bar.

You can switch between Source Concepts and Standard Concepts at the top of the table.

What to look for

  • Are there source codes included that should not be? For example, in a concept set for hypertensive disorder, are hypotension codes included by accident?
  • Are all expected codes present? For example, if we have a list of ICD-10 codes that have been used in literature to identify a cohort, are all those codes present?
Loading...

Time Distributions

Description

Boxplot and a table showing the distribution of time (in days) before and after the cohort index date (cohort start date), and the time between cohort start and end date. The information is shown for all cohort entries, so not limiting to the first per person.

The boxplot shows:

  • Whiskers: The minimum and maximum observed number of days.
  • Box: The 25th to 75th percentile.
  • Line: The median

The table show the same information and more:

  • Average: the mean of the distribution
  • SD: Standard Deviation
  • Min: The minimum
  • P10: The 10th percentile
  • P25: The 25th percentile
  • Median: The median (50th percentile)
  • P75: The 75th percentile
  • P90: The 90th percentile
  • Max: The maximum

Options

You can select multiple data sources in the side bar to see time distributions from different data sources in the same plot and table.

Select the cohort to explore in the side bar.

What to look for

  • For exposure cohorts: is there sufficient time after index (either within the cohort for on-treatment analyses, or until the end of observation for intent-to-treat type analyses) to observe the outcome of interest?
  • Are there many cohorts with length = 0 when this is not expected?
  • Are the distributions comparable across data sources?

Time Distributions

Loading...

Visit Context

Description

A table showing the relationship between the cohort start date and visits recorded in the database. For each database, the table shows:

  • Visits Before: the number of visits recorded before the cohort start date. Note that if a person is in the same cohort twice, visits may be counted twice.
  • Visits Ongoing: the number of visits that were ongoing (excluding the visit start date) when the cohort started. Note that if a person is in the same cohort twice, visits may be counted twice.
  • Starting Simultaneous: the number of visits that started on the same day the cohort started.
  • Visits After: the number of visits recorded after the cohort start date. Note that if a person is in the same cohort twice, visits may be counted twice.

Options

You can select multiple databases in the side bar to see counts from different databases side-by-side.

Select the cohort to explore in the side bar.

What to look for

  • Are cohorts starting in the right context? E.g. some cohorts may be expected to start predominantly in an inpatient setting.
Loading...

Incidence Rates

Description

A graph showing the incidence rate, optionally stratified by age (in 10-year bins), gender, and calendar year.

The incidence rate is computed as 1000 * the number of people first entering the cohort / the number of years people were eligible to enter the cohort for the first time. The eligible person time is defined as the time when

  • A person was observed in the data source (based on the observation_period table).
  • Had the required amount of prior observation time as specified in the cohort entry event criteria. For example, if the cohort definition requires 365 days of observation prior to cohort entry, patients are not eligible to enter the cohort in the first 365 days of their observation period, and this time is not counted in the eligible time.
  • If the person enters the cohort, then only the time up to cohort entry. Because we only consider the first cohort entry, persons are no longer eligible to enter to cohort after their first entry.

Note: If your cohort definition has an inclusion rule that restricts persons based on prior observation time, then this might lead to underestimation of incidence rate as the same prior observation time restriction would not be applied to the denominator. We recommend that you revise the cohort definition to make prior observation time rule part of entry event criteria.

Options

You can select multiple data sources in the side bar to see graphs from different data sources in the same plot.

Select the cohort to explore in the side bar.

At the top left of the plot, you can choose whether to stratify the data by age, gender, or calendar year.

At the top right of the plot, you can choose whether to use the same y-axis for all data sources.

If you move the mouse over the plot, you can see the precise value.

What to look for

  • Are the observed incidence rates in line with expectations? For example, if we have an estimate of the population incidence based on an external source, is the incidence rate comparable to that estimate?
  • Are the age and gender distributions in line with expectations? For example, are contraceptives only prescribed in women?
  • Is the incidence rate stable over time? If there are sudden peaks or drops, this may indicate coding issues.

Loading...

Inclusion Rules

Description

A table showing the number of subject that match specific inclusion rules in the cohort definition. Note that this table will be empty if no inclusion rules have been specified.

The table contains the following columns:

  • Sequence: The order in which the inclusion rules are applied to the cohort.
  • Name: The name of the inclusion rule.
  • Meet: The number of cohort entries (records) that meet the entry event definition and the specific inclusion rule indicated in the row.
  • Gain: The number of cohort entries (records) that would be gained if this inclusion rule was dropped.
  • Total: The number of cohort entries (records) meeting the entry event definition. In other words, the number of cohort entries before applying any of the inclusion rules.
  • Remain: The number of cohort entries (records) remaining after applying the specific inclusion rule, and all preceding rules.

Options

You can select a database in the side bar to see the inclusion rule statistics observed in that database.

Select the cohort to explore in the side bar.

What to look for

  • Are there inclusion rules that nobody meets in a database? For example, requiring a specialist visit that is not recorded in a specific database.
  • Are there inclusion rules that have no effect in a database? For example, requiring no occurrence of a prior disease code that is not recorded in a database.
  • Are there inclusion rules that drastically reduce the population? In this case we might worry about generalizability. For example, if we require a diagnostic procedure, and only a small fraction meets this criteria, we may wonder if this identifies a special population that differs from the overall population in significant ways.
Loading...

Characterization Viewer

Target Viewer

Options

Table

Download
Loading...

Outcome Stratified

Options

Incidence Rates

Options

Table

Download
Loading...

Time-to-events

Options

Results

Loading...

Dechallenge Rechallenge

Options

Results

Download
Loading...

Prediction Viewer

Model Designs Summary

This shows the different model designs that have had models developed or validated.

Each model design has a model type (e.g., logistic regression or decision tree), a target cohort (the patients the model is developed to be applied to), an outcome (what is being predicted) and a TAR (time-at-risk when the outcome is being predicted). The AUROC summaries give the min/mean/max performance for the model design across multiple databases. The number of databases tell you how many different databases there are where the model design was diagnosed, developed or validated.

  • Click 'View Diagnostics' to see the diagnostic results for the model design across databases (this highlights any potential issues in the design that may cause bias).
  • Click 'View Results' to see the summary results for all the development/validations across databases.
  • Click 'View Report' to view a summary report containing all the results for the given model design.

All Database Results For Selected Model Design

This shows the summary model development or validation performance results for the selected model design across databases.

Click 'View Result' to explore the performance for a specific database in more detail.

Full Result Explorer

This shows the settings and results for a specific model design when model development is done in the selected database or validation of a model with the model design is done in the selected database.

  • Design Settings: shows the settings used for the model design - it is possible to reproduce the model development with these setting and the database.
  • Model: shows the covariates in the model, the variable importance (value) and how often the covariate occurs for paients with and without the outcome.
  • Threshold dependant: A plot showing the sensitivity, specificity and precision for different cut-offs. There is also an interactive explorer.
  • Discrimination: You can view the test/train/cross validation (or any subset) discrimiantion performance here. Select to see the plots.
  • Calibration: You can view the test/train/cross validation (or any subset) calibration performance here. Select to see the plots.
  • Net Benefit: View the net benefit performance for any subset
  • [Not always shown] Validation: If exploring a model development, you will see the validation tab that lets you see any external validation performances for the model.

Settings Dashboard

Binary

Loading...

Measurements

Loading...

Covariates

Model Table

Probability threshold plot:

Cutoff Slider:

Dashboard

Cutoff Performance

Summary

Click view to see the corresponding plots:

Loading...
Loading...

Summary

Click view to see corresponding plots:

Select net benefit type to view:

Net Benefit Plot

Loading...

Summary

Select one or more rows to generate comparison ROC and calibration plots

Roc Plot

Loading...

Calibration Plot

Loading...

Cohort Method

Cohort Method Evidence Explorer

Processing...
Table 3. Fitted propensity model, listing all coviates with non-zero coefficients. Positive coefficients indicate predictive of the target exposure.
Figure 2. Preference score distribution. The preference score is a transformation of the propensity score that adjusts for differences in the sizes of the two treatment groups. A higher overlap indicates subjects in the two groups were more similar in terms of their predicted probability of receiving one treatment over the other.
Download
Figure 4. Systematic error. Effect size estimates for the negative controls (true hazard ratio = 1) and positive controls (true hazard ratio > 1), before and after calibration. Estimates below the diagonal dashed lines are statistically significant (alpha = 0.05) different from the true effect size. A well-calibrated estimator should have the true effect size within the 95 percent confidence interval 95 percent of times.
Figure 8. Fitted null distributions per data source.

Self Controlled Case Series

Self Controlled Case Series Evidence

Processing...
Table 1. For each variable of interest: the number of cases (people with at least one outcome), the number of years those people were observed, the number of outcomes, the number of subjects with at least one exposure, the number of patient-years exposed, the number of outcomes while exposed, and the minimum detectable relative risk (MDRR).
Figure 1. Attrition, showing the number of cases (number of subjects with at least one outcome), and number of outcomes (number of ocurrences of the outcome) after each step in the study.
Table 2. The fitted non-zero coefficent (incidence rate ratio) and 95 percent confidence interval for all variables in the model.
Figure 2a. Spline fitted for age.
Figure 2b. Spline fitted for season
Figure 2c. Spline fitted for calendar time
Figure 3. Number of subjects observed for 3 consecutive months, centered on the indicated month.
Figure 4. Per calendar month the number of people observed, the unadjusted rate of the outcome, and the rate of the outcome after adjusting for age, season, and calendar time, if specified in the model. Red indicates months where the adjusted rate was significantly different from the mean adjusted rate.
Figure 5. The number of events and subjects observed per week relative to the start of the first exposure (indicated by the thick vertical line).
Figure 6. Histograms for the number of months between the first occurrence of the outcome and the end of observation, stratified by whether the end of observation was censored (inferred as not being equal to the end of database time), or uncensored (inferred as having the subject still be observed at the end of database time).
Figure 7. Systematic error. Effect size estimates for the negative controls (true incidence rate ratio = 1) and positive controls (true incidence rate ratio > 1), before and after calibration. Estimates below the diagonal dashed lines are statistically significant (alpha = 0.05) different from the true effect size. A well-calibrated estimator should have the true effect size within the 95 percent confidence interval 95 percent of times.