Titanic - Kaggle

This project was done as an assignment for my data mining class. The titanic kaggle dataset popularly known for the go to dataset in getting started with data science. It gave me good knowledge on how to perform certain operations and predictions on data after completing the tasks. I was able to improve the score using my own techniques to solve the problem.

  1. Process of execution:
    • Importing Libraries
    • Finding null values
    • Performing EDA on dataset
    • Feature Engineering
    • Training Model
  2. Used various classification models that are available in Scikit-learn for benchmarking:
    • LogReg
    • Gausian NB
    • SVC
    • AdaBoost
    • Decision Tree
    • XG Boost
    • Random Forest
  3. It was evident that after visualising the data I got clarity on the features that matter and what dont.
  4. Why I chose XG Boost ?
    • Execution Speed.
    • Model Performance.
    • XGBoost dominates structured or tabular datasets on classification and regression predictive modeling problems.
    • After reading through many articles on kaggle submission it is said that XG boost is the go-to algorithm for competition winners on the Kaggle competitive data science platform.
  5. Submitted prediction and got a kaggle score of 0.78947
CONCLUSION
  • Basic pipeline is a must to understanding problem statement and gettting started.
  • Performing EDA can be very powerful as it can be used to obtain hidden insights
  • Deciding on the right model is essential. It is always advised to fix on a base model to bechmark the score with.
  • Feature engineering can boost score significantly when features are used properly.