๐ DATA SCIENCE MASTER TUTORIAL
Labels: DATA SCIENCE, PYTHON, TUTORIAL, MACHINE LEARNING, BEGINNER, PROJECTS
Data Science combines programming, mathematics, and domain expertise to draw meaningful insights from data. It includes:
- Data collection
- Cleaning & preprocessing
- Analysis & visualization
- Model building
- Deployment
Use Anaconda to manage Python and Jupyter.
# Download Anaconda: https://www.anaconda.com
# Launch Jupyter Notebook:
jupyter notebook
import pandas as pd
df = pd.read_csv('data.csv')
Other sources: APIs, Web scraping, SQL databases
df.drop_duplicates(inplace=True)
df['Age'].fillna(df['Age'].median(), inplace=True)
print(df.describe())
print(df['Gender'].value_counts())
๐ป Try It Yourself:
import matplotlib.pyplot as plt
df['Age'].hist()
plt.show()
import seaborn as sns
sns.heatmap(df.corr(), annot=True)
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\\.', expand=False)
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X = df[['Age', 'Sex']]
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score, confusion_matrix
predictions = model.predict(X_test)
print(confusion_matrix(y_test, predictions))
print("Accuracy:", accuracy_score(y_test, predictions))
Use Flask or Streamlit to deploy your model.
# Example with Streamlit
import streamlit as st
st.title("Survival Prediction App")
Apply all steps using Titanic dataset from Kaggle: Titanic - Machine Learning from Disaster.
End of Master Tutorial
Comments
Post a Comment