Data Science Master Tutorial (Part 2)

Data Science Master Tutorial

๐Ÿ“˜ DATA SCIENCE MASTER TUTORIAL

Labels: DATA SCIENCE, PYTHON, TUTORIAL, MACHINE LEARNING, BEGINNER, PROJECTS

Data Science combines programming, mathematics, and domain expertise to draw meaningful insights from data. It includes:

  • Data collection
  • Cleaning & preprocessing
  • Analysis & visualization
  • Model building
  • Deployment

Use Anaconda to manage Python and Jupyter.

# Download Anaconda: https://www.anaconda.com
# Launch Jupyter Notebook:
jupyter notebook
import pandas as pd
df = pd.read_csv('data.csv')

Other sources: APIs, Web scraping, SQL databases

df.drop_duplicates(inplace=True)
df['Age'].fillna(df['Age'].median(), inplace=True)
print(df.describe())
print(df['Gender'].value_counts())

๐Ÿ’ป Try It Yourself:



import matplotlib.pyplot as plt
df['Age'].hist()
plt.show()
import seaborn as sns
sns.heatmap(df.corr(), annot=True)
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\\.', expand=False)
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X = df[['Age', 'Sex']]
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LogisticRegression()
model.fit(X_train, y_train)
from sklearn.metrics import accuracy_score, confusion_matrix
predictions = model.predict(X_test)
print(confusion_matrix(y_test, predictions))
print("Accuracy:", accuracy_score(y_test, predictions))

Use Flask or Streamlit to deploy your model.

# Example with Streamlit
import streamlit as st
st.title("Survival Prediction App")

Apply all steps using Titanic dataset from Kaggle: Titanic - Machine Learning from Disaster.

End of Master Tutorial

Comments