What are Descriptive Statistics and Use Cases
Who to get the descriptive stats from given set of data ?
One way is 1: Univariate selection:
Univeriate selection is a statistical method used for feature selection. It helps in determining the feature that has the strongest relationship with the target variable i.e selecting those features in our data that contribute most to the prediction variable or output in which we are interested.
In python we can, use the SelectKBest method from sklearn.feature selection we can select the top
10 features from a dataset that are most related to the output variable. We can do that by separating the independent variables (features) and the dependent variable (target) from the data and initializing the SelectKBest method with the chi2 score function and k=10, which means it will select the top 12 features.
2: Correlation:
Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). We can determine the correlation between features in a dataset by using the corr() function. We use a heatmap to graphical represent our data where individual values are represented as colors. It’s a way to visualize the correlation matrix i.e quickly understanding which features are positively or negatively correlated with each other, identify patterns and trends in large datasets by summarizing the data in a visually intuitive manner and can also help identify outliers or unusual values.
Descriptive statistics offer numerous benefits for data analysis, providing a foundation for understanding
Simplifies Complex Data: Descriptive statistics condense large datasets into manageable and interpretable metrics like mean, median, mode, and standard deviation.
Highlights Central Tendencies: Measures such as the mean or median give a quick snapshot of the data's central value.
Real world usecaes.
Business Intelligence: Identifying sales trends, customer behavior patterns, and operational metrics.
Healthcare: Summarizing patient demographics and treatment outcomes.
Education: Analyzing test score distributions and student performance.