Good article. I will say a couple things. One mathematical Statistics cant fix bad data (This should be obvious). It can only look for patterns in data. The linear regression example is actually a much worse problem when you look at high dimensional data. That is when it is 2d its very easy to look at and see if things make sense. If I have 100D data I can’t do that easily. In fact with high dimension some very bad things can happen. There are tools to deal with this though. There is a concept called leverage in Linear regression. That outlier point has massive leverage and would call the regression into question instantly. In fact keep in mind that the equation for linear regression is pretty meaningless unless you make some assumptions of the error terms. If that assumption is wrong you can be in trouble. How much trouble kinda depends
You're absolutely right about higher dimensional data being difficult to visualize, but that's why the problem is hard! You must still always do some sort of exploratory data analysis, and visualize it in some meaningful way (in my opinion).
Good article. I will say a couple things. One mathematical Statistics cant fix bad data (This should be obvious). It can only look for patterns in data. The linear regression example is actually a much worse problem when you look at high dimensional data. That is when it is 2d its very easy to look at and see if things make sense. If I have 100D data I can’t do that easily. In fact with high dimension some very bad things can happen. There are tools to deal with this though. There is a concept called leverage in Linear regression. That outlier point has massive leverage and would call the regression into question instantly. In fact keep in mind that the equation for linear regression is pretty meaningless unless you make some assumptions of the error terms. If that assumption is wrong you can be in trouble. How much trouble kinda depends
You're absolutely right about higher dimensional data being difficult to visualize, but that's why the problem is hard! You must still always do some sort of exploratory data analysis, and visualize it in some meaningful way (in my opinion).
I agree. High dimensional data is tough in more ways than one look up "Curse of Dimensionality" if your interested