Data Mining and Machine Learning
- Custom Research Papers
- Mar 22, 2021
- 1 min read
#This HW uses the same datafile as previous HWs.
# Please refer to the CSV file “titanic_data” that contains data about each
# passenger aboard the HMS Titanic when it sank in 1912.
# The file has several columns given as:
# survival: Survival 0 = No, 1 = Yes
# pclass: Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
# sex: Sex
# Age: Age in years
# sibsp # of siblings / spouses aboard the Titanic
# parch # of parents / children aboard the Titanic
# ticket: Ticket number
# fare: Passenger fare
# cabin: Cabin number
# embarked: Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
#Treat Survived as your y variable, and the other variables as your x variables.
#The goal is to build a decision tree and a random forest model to predict whether a person survives or not.
#Please include the following in your work:
#1. Classification report showing precision, recall, F-score etc.
#2. Which model works better? Decision tree or random forest?
#3. Tune the hyper-parameters of the decision tree and random forest model. How the performance of the tuned models compare with un-tuned models?
#4. How does your models compare to the logistics regression/ SVM models from previous HW?
#5. What is overfitting? How the overfitting problem can be resolved in decision trees?
#6. Create an ensemble using the classification models learned so far. See if the ensemble works better than the individual models.
Comments