Indian Journal of Dermatology
  Publication of IADVL, WB
  Official organ of AADV
Indexed with Science Citation Index (E) , Web of Science and PubMed
 
Users online: 2594  
Home About  Editorial Board  Current Issue Archives Online Early Coming Soon Guidelines Subscriptions  e-Alerts    Login  
    Small font sizeDefault font sizeIncrease font size Print this page Email this page
IJD® MODULE ON BIOSTATISTICS AND RESEARCH METHODOLOGY FOR THE DERMATOLOGIST - MODULE EDITOR: SAUMYA PANDA
Year : 2016  |  Volume : 61  |  Issue : 6  |  Page : 593-601

Biostatistics series module 6: Correlation and linear regression


1 Department of Pharmacology, Institute of Postgraduate Medical Education and Research, Kolkata, West Bengal, India
2 Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Mumbai, Maharashtra, India

Correspondence Address:
Avijit Hazra
Department of Pharmacology, Institute of Postgraduate Medical Education and Research, 244B Acharya J. C. Bose Road, Kolkata - 700 020, West Bengal
India
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/0019-5154.193662

Rights and Permissions

Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.


[FULL TEXT] [PDF]*
Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)
 

 Article Access Statistics
    Viewed14693    
    Printed229    
    Emailed0    
    PDF Downloaded385    
    Comments [Add]    
    Cited by others 73    

Recommend this journal