Top 10 Machine Learning Algorithms: When to Use Each One (With Code)

Top 10 Machine Learning Algorithms: When to Use Each One (With Code)

Choosing the wrong algorithm wastes days of tuning. This guide cuts straight to: what problem each algorithm solves, a minimal working Python example, and when you should reach for something else instead.

All examples use scikit-learn unless noted. Install dependencies:

pip install scikit-learn xgboost

1. Linear Regression

Problem it solves: Predict a continuous value when the relationship between features and target is approximately linear.

from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, preds)):.3f}")
# Coefficients tell you feature importance direction
print(dict(zip(fetch_california_housing().feature_names, model.coef_)))

When NOT to use it: When features interact non-linearly, when you have many irrelevant features (use Ridge/Lasso instead), or when residuals are not normally distributed.


2. Logistic Regression

Problem it solves: Binary or multi-class classification with interpretable probability outputs. Your baseline for any classification problem.

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test)))
# Get calibrated probabilities
probs = model.predict_proba(X_test)[:, 1]

When NOT to use it: When decision boundaries are highly non-linear. Try it first anyway — it's fast and gives you a baseline to beat.


3. Decision Tree

Problem it solves: Classification or regression with non-linear boundaries. Fully interpretable: you can print the exact rules it learned.

from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X, y)

# Print the actual learned rules
print(export_text(model, feature_names=load_iris().feature_names))

When NOT to use it: Standalone decision trees overfit badly on noisy data. Use Random Forest or Gradient Boosting in production. Keep decision trees for explanations, not predictions.


4. Random Forest

Problem it solves: Robust classification and regression by averaging many decorrelated trees. Handles missing values, mixed feature types, and gives feature importances out of the box.

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score
import numpy as np

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier(n_estimators=200, max_features="sqrt", random_state=42, n_jobs=-1)
scores = cross_val_score(model, X, y, cv=5, scoring="roc_auc")
print(f"ROC-AUC: {scores.mean():.3f} ± {scores.std():.3f}")

# Feature importances
model.fit(X, y)
importances = sorted(zip(load_breast_cancer().feature_names, model.feature_importances_),
                     key=lambda x: -x[1])
for name, imp in importances[:5]:
    print(f"  {name}: {imp:.3f}")

When NOT to use it: When you need a model you can explain to a non-technical stakeholder rule-by-rule. Also slow to predict on very large forests — consider XGBoost instead.


5. Support Vector Machine (SVM)

Problem it solves: High-accuracy classification, especially effective on high-dimensional data (text, images) and small-to-medium datasets where the margin between classes matters.

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# SVM requires feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = SVC(kernel="rbf", C=10, gamma="scale")
model.fit(X_train, y_train)
print(f"Accuracy: {accuracy_score(y_test, model.predict(X_test)):.3f}")

When NOT to use it: Datasets over ~100k samples — training time scales poorly. For large datasets, use SGDClassifier with hinge loss (linear SVM approximation) or switch to gradient boosting.


6. K-Nearest Neighbors (KNN)

Problem it solves: Instance-based classification/regression with no training phase. Useful for recommendation-style problems where "similar inputs have similar outputs" is a safe assumption.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()  # KNN is distance-based — scaling is mandatory
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
print(f"Accuracy: {accuracy_score(y_test, model.predict(X_test)):.3f}")

When NOT to use it: High-dimensional data (curse of dimensionality), large datasets (prediction is O(n)), or when you need probability calibration. It also has no built-in feature selection.


7. K-Means Clustering

Problem it solves: Unsupervised grouping of data into k clusters. Common uses: customer segmentation, document grouping, anomaly detection as a preprocessing step.

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score
import numpy as np

X, _ = make_blobs(n_samples=500, centers=4, random_state=42)

# Find optimal k using silhouette score
for k in range(2, 8):
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = km.fit_predict(X)
    score = silhouette_score(X, labels)
    print(f"k={k}  silhouette={score:.3f}")

# Fit with optimal k
best_model = KMeans(n_clusters=4, random_state=42, n_init=10)
X_labeled = best_model.fit_predict(X)

When NOT to use it: Non-spherical clusters (try DBSCAN), when you don't know k in advance (use silhouette or elbow method to tune), or when cluster sizes vary wildly.


8. Naive Bayes

Problem it solves: Fast probabilistic text classification. Despite the "naive" independence assumption, it outperforms far more complex models on spam detection, sentiment analysis, and short-text categorization.

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Example: 20 newsgroups text classification
from sklearn.datasets import fetch_20newsgroups
cats = ["sci.space", "rec.sport.hockey", "talk.politics.guns"]
data = fetch_20newsgroups(categories=cats, remove=("headers", "footers", "quotes"))

X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)
vec = TfidfVectorizer(max_features=10000)
X_train_vec = vec.fit_transform(X_train)
X_test_vec = vec.transform(X_test)

model = MultinomialNB(alpha=0.1)
model.fit(X_train_vec, y_train)
print(classification_report(y_test, model.predict(X_test_vec), target_names=cats))

When NOT to use it: When features are strongly correlated (the independence assumption breaks down badly) or when you need well-calibrated probabilities beyond simple classification.


9. Gradient Boosting (XGBoost)

Problem it solves: Tabular data classification and regression. XGBoost wins most Kaggle competitions on structured data. It builds trees sequentially, each correcting the errors of the previous one.

import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = xgb.XGBRegressor(
    n_estimators=500,
    learning_rate=0.05,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8,
    early_stopping_rounds=20,
    eval_metric="rmse",
    random_state=42,
)
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
preds = model.predict(X_test)
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, preds)):.3f}")

When NOT to use it: Unstructured data (images, text, audio) — use neural networks. Also slower to train than Random Forest; if speed matters more than a few accuracy points, try LightGBM.


10. Neural Networks (MLP)

Problem it solves: Learning arbitrarily complex mappings from input to output, especially with large datasets. Dominates image recognition, NLP, and time-series forecasting.

from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = MLPClassifier(
    hidden_layer_sizes=(256, 128),
    activation="relu",
    max_iter=500,
    early_stopping=True,
    random_state=42,
)
model.fit(X_train, y_train)
print(classification_report(y_test, model.predict(X_test)))

When NOT to use it: Small datasets (under ~5k samples) — you'll overfit badly and Random Forest will beat you. Neural networks need data volume. For tabular data under 100k rows, try XGBoost first.


Quick Reference: Which Algorithm for Which Problem?

TaskStart withIf that's not enough
RegressionLinear RegressionRandom Forest, XGBoost
Binary classificationLogistic RegressionXGBoost, SVM
Multi-class classificationLogistic RegressionRandom Forest, Neural Net
Text classificationNaive Bayes + TF-IDFFine-tuned BERT
ClusteringK-MeansDBSCAN, Gaussian Mixture
Image / audioCNN (PyTorch/TF)Fine-tune pretrained model

Related articles