Solve the optimization problem of tree, should we make each rectangle contains exactly one training data...
2
$begingroup$
I was reading Trevor Hastie and Robert Tibshirani's book "An Introduction to Statistical Learning with Applications in R". In page 306, when talking about the objective function of tree model, the book says: "The goal is to find boxes $R_1,...,R_J$ " that minimize the RSS, given by" $$sum_{j=1}^Jsum_{iin R_j}(y_i-hat{y}_{R_j})^2,$$ where $hat{y}_{R_j}$ is the mean response for the training observations within the $j$ th box. Unfortunately, it is computationally infeasible to consider every possible partition of the feature space into $J$ boxes." My question is: isn't the optimal solution to this RSS very obvious? We just partition the whole feature into $N$ rectangles such that each rectangle only contains one data point, then we achieve zero RSS. Let...