Tuesday, December 19, 2017

[Book Exercise] Regression Methods in Biostatistics

Chapter 2 exercise:

https://github.com/xinyutan/book_regression/blob/master/Ch2_EDM.ipynb

Chapter 3 exercise:

https://github.com/xinyutan/book_regression/blob/master/Ch3_exercise.ipynb

Thoughts: I feel low efficiency when reading by myself. I feel myself learning a lot better when I am in a class. Why is that? 1. I think listening to something is a better way for me to absorb information 2. It's better to have constant feedback and assurance. However, I must learn to learn by reading myself. 

Chapter 4:

Notes: https://github.com/xinyutan/book_regression/blob/master/Notes_Chapter4LinearRegression.ipynb

Notes about different relationships between predictors.

Exercise: https://github.com/xinyutan/book_regression/blob/master/Ch4_exercise.ipynb

Wednesday, December 6, 2017

Questions regarding to risk stratification

As I finished reading the book Risk Stratification - A practical guide for clinicians, I get so confused by the "goal" of risk stratification. I thought it would be easy for me and I more or less understand, but the clinical concept is so confusing!

I will just take notes of some questions here. Hopefully, I will understand them later.

  1. In Page 112, it says that only when the epidemiology problem is worked out, we should consider risk stratification. Then, what is the population-based epidemiology? What does it try to solve?
Okay, I found a dataset that I might be able to use for practice: http://archive.ics.uci.edu/ml/datasets/Thoracic+Surgery+Data

Friday, December 1, 2017

First view of risk stratification

Risk stratification is widely used in medicine.

At the first sight, risk stratification is nothing more than to learn the coefficients of linear/logistic regression. But if it's this simple, then why do medical professionals spend so much effort elaborate it? What's the difference? What are the things that we overlook in machine learning but need to pay special attention in medical context?

I am reading Risk Stratification - A practical guide for clinicians to find out.

==

By reading some paragraphs into it, the goals for two studies are very different. For machine learning classification, our goal is to predict as accurate as possible. So we use whatever methods that works the best. Normally, logistic regression is not so great. However, here, our goal is more focused on the "variables"; we want to know exactly how much this particular variable contributes to the outcome - linear model has an advantage in its clarity.

==

Regarding to data collection, be careful of harvesting effects where subtle effects that remove people from risk before they come to attention.  We should always ask whether the data collection is complete , does it exclude certain population that we overlook (e.g., die before/during the intervention)?

==

About the difference between risk stratifications and clinical research study: the goal and the data are all different. For risk stratification, we want to know what are the other risk factors that cause the treatment result different. Therefore, the data will only contain the people with treatment (??? can't we make treatment a variable as well???). For clinical research study, however, we specifically want to know the effect of the treatment. Therefore, we want to keep all other factors the same, at least similar.

It seems that risk stratification is more on "understanding", clinical research studies more on "testing a well-defined hypothesis".

Several type of experiment design:
  1. randomized trial (strict control)
  2. cohort study (no manual control, only observe)
  3. case-control (focus on positive cases. Due to the lack of whole population with a certain exposure/treatment, then not suitable for risk calculation.)
==

Rate standardization:
One good example is the mortality rate due to cancer. If we do not account the demographics change, we would conclude that the mortality rate increased 50% from 1940 to 1980. However, we also know that in 1980, there are a lot more senior population. If we take age difference into account, we would find that the mortality rate only increases 10%. This example shows a very thoughtful concern in medical problem.

Steps after calculating the Observed/Evaluated ratio (need to read more on statistics):
  1. (chi-square for discrete data) statistical test
  2. confidence interval 
  3. If the statistical test shows non-significant, check statistical power. 
==

Some common concerns for machine learning task as well:
1. Need to consider if the training data and testing data follow the same distribution (features and labels).
2. Selection bias - is the treatment population selected non-randomly?