Cohort Counts
Description
A table showing the number of cohort entries and unique subjects per cohort per data source. Because one person can have more than one cohort entry, the number of entries can be higher than the number of persons.
Options
You may select multiple data sources in the side bar to see counts from different data sources side-by-side.
What to look for
- Are there cohorts that are empty in some data sources?
- Are the relative counts (relative to the other cohorts in the same data source) comparable across data sources? Note that the color bars show the relative counts.
- Are the cohorts of expected and sufficient size? For example, if we want to study the effect of an exposure, a rule-of-thumb is that we require at least 2,500 in the exposure cohort.
|
Comments
Incidence Rates
Description
A graph showing the incidence rate, optionally stratified by age (in 10-year bins), gender, and calendar year.
The incidence rate is computed as 1000 * the number of people first entering the cohort / the number of years people were eligible to enter the cohort for the first time. The eligible person time is defined as the time when
- A person was observed in the data source (based on the observation_period table).
- Had the required amount of prior observation time as specified in the cohort entry event criteria. For example, if the cohort definition requires 365 days of observation prior to cohort entry, patients are not eligible to enter the cohort in the first 365 days of their observation period, and this time is not counted in the eligible time.
- If the person enters the cohort, then only the time up to cohort entry. Because we only consider the first cohort entry, persons are no longer eligible to enter to cohort after their first entry.
Note: If your cohort definition has an inclusion rule that restricts persons based on prior observation time, then this might lead to underestimation of incidence rate as the same prior observation time restriction would not be applied to the denominator. We recommend that you revise the cohort definition to make prior observation time rule part of entry event criteria.
Options
You can select multiple data sources in the side bar to see graphs from different data sources in the same plot.
Select the cohort to explore in the side bar.
At the top left of the plot, you can choose whether to stratify the data by age, gender, or calendar year.
At the top right of the plot, you can choose whether to use the same y-axis for all data sources.
If you move the mouse over the plot, you can see the precise value.
What to look for
- Are the observed incidence rates in line with expectations? For example, if we have an estimate of the population incidence based on an external source, is the incidence rate comparable to that estimate?
- Are the age and gender distributions in line with expectations? For example, are contraceptives only prescribed in women?
- Is the incidence rate stable over time? If there are sudden peaks or drops, this may indicate coding issues.
Time Distributions
Description
Boxplot and a table showing the distribution of time (in days) before and after the cohort index date (cohort start date), and the time between cohort start and end date. The information is shown for all cohort entries, so not limiting to the first per person.
The boxplot shows:
- Whiskers: The minimum and maximum observed number of days.
- Box: The 25th to 75th percentile.
- Line: The median
The table show the same information and more:
- Average: the mean of the distribution
- SD: Standard Deviation
- Min: The minimum
- P10: The 10th percentile
- P25: The 25th percentile
- Median: The median (50th percentile)
- P75: The 75th percentile
- P90: The 90th percentile
- Max: The maximum
Options
You can select multiple data sources in the side bar to see time distributions from different data sources in the same plot and table.
Select the cohort to explore in the side bar.
What to look for
- For exposure cohorts: is there sufficient time after index (either within the cohort for on-treatment analyses, or until the end of observation for intent-to-treat type analyses) to observe the outcome of interest?
- Are there many cohorts with length = 0 when this is not expected?
- Are the distributions comparable across data sources?
Time Distributions
Comments
Concepts in Data Source
Description
A table showing the concept ids observed in the database that are included in a concept set(s) of the selected cohort. The Subjects column contains the number of subjects in the entire database that have the specific concept. This count is not restricted to people in the cohort - but represents a database level characterization. Source concepts are identified in the _source_concept_id fields of the Common Data Model, (e.g. drug_source_concept_id) and are used to identify the specific source codes used in a database. Standard concepts are found using the _concept_id fields (e.g. drug_concept_id), and use the same coding system across all databases. Note: Per CDM conventions standard concept ids, may be used to populate _source_concept_id fields in domain tables, but non-standard concept ids may not be used to populate the standard fields in those domain tables.
Options
You can select a database in the side bar to see the concepts and counts observed in that database.
Select the cohort and the specific concept set within that cohort to explore in the side bar.
You can switch between Source Concepts and Standard Concepts at the top of the table.
What to look for
- Are there source codes included that should not be? For example, in a concept set for hypertensive disorder, are hypotension codes included by accident?
- Are all expected codes present? For example, if we have a list of ICD-10 codes that have been used in literature to identify a cohort, are all those codes present?
|
|
Comments
Orphan Concepts
Description
A table showing the concept(s) observed in the datasource that are not included in a concept set of a cohort, but maybe considered. The following logic is used to identify concepts that might be relevant:
- Given a concept set expression, find all included concepts.
- Find all names of those concepts, including synonyms, and the names of source concepts that map to them.
- Search for concepts (standard and source) that contain any of those names as substring.
- Filter those concepts to those that are not in the original set of concepts (i.e. orphans).
- Restrict the set of orphan concepts to those that appear in the CDM data source as either source concept or standard concept.
The Subjects column contains the number of subjects in the entire data source that have the specific concept, i.e. it is not restricted to people in the cohort. This is a data source level characterization. Source concepts are identified in the _source_concept_id fields of the Common Data Model, (e.g. drug_source_concept_id) and are used to identify the specific source codes used in a data source. Standard concepts are found using the _concept_id fields (e.g. drug_concept_id), and use the same coding system across all databases.
Options
You can select a data source in the side bar to see the concepts and counts observed in that data source.
Select the cohort and the specific concept set within that cohort to explore in the side bar.
What to look for
- Are there concepts that are not included in the concept but should be? Note that the provided list likely contains many false positives.
|
|
Comments
Index Events
Description
A table showing the concepts belonging to the concept sets in the entry event definition that are observed on the index date. In other words, the table lists the concepts that likely triggered the cohort entry. The counts indicate number of cohort entries where the concepts was observed on the index date. Note that multiple concepts can be present on the index date, so the sum of counts might be greater than the cohort entry count.
Options
You can select multiple databases in the side bar to see counts from different databases side-by-side.
Select the cohort to explore in the side bar.
What to look for
- Is one concept unexpectedly dominating? For example, if our cohort identifies exposure to drugs in a class, but we notice almost everyone enters the cohort based on a single drug, we may wonder whether our results will generalize to the class.
- Are the highest ranking concepts different across databases? For example, is everyone in one database initiating high-dose prescriptions, and everyone in another database low-dose prescriptions?
|
|
|
Comments
Visit Context
Description
A table showing the relationship between the cohort start date and visits recorded in the database. For each database, the table shows:
- Visits Before: the number of visits recorded before the cohort start date. Note that if a person is in the same cohort twice, visits may be counted twice.
- Visits Ongoing: the number of visits that were ongoing (excluding the visit start date) when the cohort started. Note that if a person is in the same cohort twice, visits may be counted twice.
- Starting Simultaneous: the number of visits that started on the same day the cohort started.
- Visits After: the number of visits recorded after the cohort start date. Note that if a person is in the same cohort twice, visits may be counted twice.
Options
You can select multiple databases in the side bar to see counts from different databases side-by-side.
Select the cohort to explore in the side bar.
What to look for
- Are cohorts starting in the right context? E.g. some cohorts may be expected to start predominantly in an inpatient setting.
|
|
Comments
Cohort Overlap (subjects)
Description
Stacked bar graph showing the overlap between two cohorts, and a table listing several overlap statistics.
The stacked bar shows the overlap in terms of subjects. It shows the number of subjects that belong to each cohort and to both. The diagram does not consider whether the subjects were in the different cohorts at the same time.
The table show the same information and more:
- Subject in either cohort: The number of subjects that enter one or both cohorts. (The union)
- Subject in both cohort: The number of subjects that enter both cohorts, although not necessarily at the same time. (The intersection)
- Subject in target not in comparator: The number of subjects that enter the target cohort, but not the comparator cohort. (Subtracting the comparator from the target)
- Subject in comparator not in target: The number of subjects that enter the comparator cohort, but not the target cohort. (Subtracting the comparator from the target)
- Subject in target before comparator: The number of subjects that enter both cohorts, but enter the target cohort before entering the comparator cohort. This number considers only the first entry per cohort per person.
- Subject in comparator before target: The number of subjects that enter both cohorts, but enter the comparator cohort before entering the target cohort. This number considers only the first entry per cohort per person.
- Subject in target and comparator on same day: The number of subjects that enter both cohorts on the same date. This number considers only the first entry per cohort per person.
- Subject having target start during comparator: The number of subjects that enter the target cohort during the comparator cohort, meaning comparator cohort start date <= target cohort start date <= comparator cohort end date. This number considers only the first entry per cohort per person.
- Subject having comparator start during target: The number of subjects that enter the comparator cohort during the target cohort, meaning target cohort start date <= comparator cohort start date <= target cohort end date. This number considers only the first entry per cohort per person.
Options
You can select one or more database in the side bar.
You can select the (target) cohort(s) and comparator cohort(s) in the side bar.
What to look for
- Are there many people in both cohorts? For example, if we want to compare two exposures, are there many people that receive both?
- Is the overlap of sufficient size for a specific research question? For example, if we wish to study the effect of an exposure on an outcome, we may require a minimum number of outcomes during exposure.
Comments
Cohort Characterization
Description
A table showing cohort characteristics (covariates). These characteristics are captured on or before the cohort start date. There is a Pretty and a Raw version of this table.
The Pretty table shows the standard OHDSI characteristics table, which includes only covariates that were manually selected to provide a general overview of the comorbidities and medications of the cohort. These are all binary covariates, and the table shows the proportion (%) of the cohort entries having the covariate.
The Raw table shows all captured covariates. These include binary and continuous covariates (e.g. the Charlson comorbidity index). For each covariate the table lists the mean, which for binary covariates is equal to the proportion, and the standard deviation (SD).
Options
You can select multiple databases in the side bar to see cohort characteristics from different databases side-by-side in the same table.
Select the cohort to explore in the side bar.
Select either the Pretty or the Raw table at the top of the table.
What to look for
- Are the characteristics of the cohort as expected? For example, do people have the expected comorbidities?
- Do the characteristics of the cohort differ much per database?
Comments
Compare Cohort Characterization
Description
A table or plot showing cohort characteristics (covariates) for two cohorts side-by-side. These characteristics are captured at different time windows that can be selected
The Raw table shows all captured covariates. These include binary and continuous covariates (e.g. the Charlson comorbidity index). For each covariate the table lists the mean, which for binary covariates is equal to the proportion, the standard deviation (SD), and the standardized difference of the mean (StdDiff).
The plot shows all covariates, include binary and continuous covariates. The x-axis represents the mean value in the target cohort, the y-axis the mean value in the comparator cohort. Each dot represents a covariate, and the color indicates the domain of the covariate being plotted. In the plot, domains are fixed (even though additional domains may exist in data) to ensure the color of the domains are consistently applied.
Filters maybe used to limit the number of covariates being visualized/tabulated. Filters are available for analysis names and domain names.
You can either select different cohorts in the same database, the same cohort in different database or different cohorts in different databases
What to look for
- Are there major differences between the two cohorts? For example, if we wish to compute a propensity score between two cohorts, concepts that have very high proportion in one cohort and a very low proportion in the other may lead to a perfectly predictive model.
- In general, how comparable are two cohorts? If we wish to compare two exposures, but the cohorts differ over many characteristics, we may be able to fit a propensity model and compute an estimate, but we may have concerns over the generalizability of the results.
Compare cohort characterization
Comments
Execution meta-data
Each entry relates to execution on a given cdm. Results are merged between executions incrementally