This project aims to develop a Machine Learning model to predict the presence of heart disease in a patient based on various health parameters. The model is trained using classification algorithms and evaluated using metrics like accuracy, precision, recall, and F1-score.
The dataset used in this project is heart.csv. Ensure it is placed in the same directory as the Jupyter Notebook before running the code.
If you don’t have the dataset, you can download it from Kaggle.
Source: Kaggle
Number of Samples: 303
Target Variable: target (1 = Heart disease, 0 = No heart disease)
| Feature | Description |
|———–|————|
| age | Age of the patient |
| sex | Gender (1 = Male, 0 = Female) |
| cp | Chest pain type (0-3, where 3 is most severe) |
| trestbps | Resting blood pressure (in mm Hg) |
| chol | Serum cholesterol (mg/dL) |
| fbs | Fasting blood sugar (1 = > 120 mg/dL, 0 = Normal) |
| restecg | Resting electrocardiographic results (0-2) |
| thalach | Maximum heart rate achieved |
| exang | Exercise-induced angina (1 = Yes, 0 = No) |
| oldpeak | ST depression induced by exercise relative to rest |
| slope | Slope of the peak exercise ST segment (0-2) |
| ca | Number of major vessels (0-3) colored by fluoroscopy |
| thal | Thalassemia (0-3, where 3 is abnormal) |
The evaluation metrics for the best-performing model are as follows:
Additionally, the confusion matrix and ROC curve provide a visual representation of the model’s performance.
Below are the performance visualizations:
Confusion Matrix:

ROC Curve:

To run this project locally, follow these steps: