Classification Report

Module Interface

ClassificationReport

class torchmetrics.ClassificationReport(**kwargs)[source]

Compute a classification report with precision, recall, F-measure and support for each class.

This is a wrapper that automatically selects the appropriate task-specific metric based on the task argument. It uses a collection of existing TorchMetrics classification metrics internally, allowing you to customize which metrics are included in the report.

\[\text{Precision}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c}\]
\[\text{Recall}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FN}_c}\]
\[\text{F1}_c = 2 \cdot \frac{\text{Precision}_c \cdot \text{Recall}_c}{\text{Precision}_c + \text{Recall}_c}\]
Where:
  • \(c\) is the class/label index

  • \(\text{TP}_c, \text{FP}_c, \text{FN}_c\) are true positives, false positives, and false negatives for class \(c\)

Parameters:
  • task – The classification task type. One of 'binary', 'multiclass', or 'multilabel'.

  • threshold – Threshold for transforming probability to binary (0,1) predictions (for binary/multilabel)

  • num_classes – Number of classes (required for multiclass)

  • num_labels – Number of labels (required for multilabel)

  • target_names – Optional list of names for each class/label

  • digits – Number of decimal places to display in the report

  • output_dict – If True, return a dict instead of a string report

  • zero_division – Value to use when dividing by zero. Can be 0, 1, or “warn”

  • ignore_index – Specifies a target value that is ignored and does not contribute to the metric calculation

  • top_k – Number of highest probability predictions considered (for multiclass)

  • metrics – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”. You can use aliases like “f1” or “f-measure” for “f1-score”.

Example (Binary Classification):
>>> from torch import tensor
>>> from torchmetrics.classification import ClassificationReport
>>> target = tensor([0, 1, 0, 1])
>>> preds = tensor([0, 1, 1, 1])
>>> report = ClassificationReport(task="binary")
>>> report.update(preds, target)
>>> print(report.compute())  
               precision     recall   f1-score    support

0                   1.00       0.50       0.67          2
1                   0.67       1.00       0.80          2

accuracy                                  0.75          4
macro avg           0.83       0.75       0.73          4
weighted avg        0.83       0.75       0.73          4
Example (Custom Metrics):
>>> report = ClassificationReport(
...     task="multiclass",
...     num_classes=3,
...     metrics=["precision", "recall", "specificity"]
... )
static __new__(cls, task, threshold=0.5, num_classes=None, num_labels=None, target_names=None, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, top_k=1, metrics=None, **kwargs)[source]

Initialize task metric.

Return type:

Metric

BinaryClassificationReport

class torchmetrics.classification.BinaryClassificationReport(threshold=0.5, target_names=None, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, metrics=None, **kwargs)[source]

Compute a classification report with precision, recall, F-measure and support for binary tasks.

This metric wraps a configurable set of classification metrics (precision, recall, F1-score, etc.) into a single report similar to sklearn’s classification_report.

Internally, binary classification is treated as a 2-class multiclass problem to provide per-class metrics for both class 0 and class 1.

\[\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}\]
\[\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}\]
\[\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}\]
\[\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}\]

Where \(\text{TP}\), \(\text{FP}\), \(\text{TN}\) and \(\text{FN}\) represent the number of true positives, false positives, true negatives and false negatives respectively.

As input to forward and update the metric accepts the following input:

  • preds (Tensor): A tensor of predictions of shape (N, ...). If preds is a floating point tensor with values outside [0,1] range, we consider the input to be logits and will auto-apply sigmoid. Additionally, we convert to int tensor with thresholding.

  • target (Tensor): A tensor of targets of shape (N, ...)

As output to forward and compute the metric returns either:

  • A formatted string report if output_dict=False

  • A dictionary of metrics if output_dict=True

Parameters:
  • threshold (float) – Threshold for transforming probability to binary (0,1) predictions

  • target_names (Optional[Sequence[str]]) – Optional list of names for each class. Defaults to [“0”, “1”].

  • digits (int) – Number of decimal places to display in the report

  • output_dict (bool) – If True, return a dict instead of a string report

  • zero_division (Union[str, float]) – Value to use when dividing by zero. Can be 0, 1, or “warn”

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • metrics (Optional[List[str]]) – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”.

Example

>>> from torch import tensor
>>> from torchmetrics.classification import BinaryClassificationReport
>>> target = tensor([0, 1, 0, 1])
>>> preds = tensor([0, 1, 1, 1])
>>> report = BinaryClassificationReport()
>>> report.update(preds, target)
>>> print(report.compute())  
               precision     recall   f1-score    support

0                   1.00       0.50       0.67          2
1                   0.67       1.00       0.80          2

accuracy                                  0.75          4
macro avg           0.83       0.75       0.73          4
weighted avg        0.83       0.75       0.73          4

MulticlassClassificationReport

class torchmetrics.classification.MulticlassClassificationReport(num_classes, target_names=None, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, top_k=1, metrics=None, **kwargs)[source]

Compute a classification report with precision, recall, F-measure and support for multiclass tasks.

This metric wraps a configurable set of classification metrics (precision, recall, F1-score, etc.) into a single report similar to sklearn’s classification_report.

\[\text{Precision}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c}\]
\[\text{Recall}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FN}_c}\]
\[\text{F1}_c = 2 \cdot \frac{\text{Precision}_c \cdot \text{Recall}_c}{\text{Precision}_c + \text{Recall}_c}\]
\[\text{Support}_c = \text{TP}_c + \text{FN}_c\]

For average metrics:

\[\text{Macro F1} = \frac{1}{C} \sum_{c=1}^{C} \text{F1}_c\]
\[\text{Weighted F1} = \sum_{c=1}^{C} \frac{\text{Support}_c}{N} \cdot \text{F1}_c\]
Where:
  • \(C\) is the number of classes

  • \(N\) is the total number of samples

  • \(c\) is the class index

  • \(\text{TP}_c, \text{FP}_c, \text{FN}_c\) are true positives, false positives, and false negatives for class \(c\)

As input to forward and update the metric accepts the following input:

  • preds (Tensor): A tensor of predictions of shape (N, ...) or (N, C, ...) where C is the number of classes. Can be either probabilities/logits or class indices.

  • target (Tensor): A tensor of targets of shape (N, ...)

As output to forward and compute the metric returns either:

  • A formatted string report if output_dict=False

  • A dictionary of metrics if output_dict=True

Parameters:
  • num_classes (int) – Number of classes in the dataset

  • target_names (Optional[Sequence[str]]) – Optional list of names for each class. If None, classes will be 0, 1, …, num_classes-1.

  • digits (int) – Number of decimal places to display in the report

  • output_dict (bool) – If True, return a dict instead of a string report

  • zero_division (Union[str, float]) – Value to use when dividing by zero. Can be 0, 1, or “warn”

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • top_k (int) – Number of highest probability predictions considered for finding the correct label

  • metrics (Optional[List[str]]) – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”. You can use aliases like “f1” or “f-measure” for “f1-score”.

Example

>>> from torch import tensor
>>> from torchmetrics.classification import MulticlassClassificationReport
>>> target = tensor([0, 1, 2, 2, 2])
>>> preds = tensor([0, 0, 2, 2, 1])
>>> report = MulticlassClassificationReport(num_classes=3)
>>> report.update(preds, target)
>>> print(report.compute())  
               precision     recall   f1-score    support

0                   0.50       1.00       0.67          1
1                   0.00       0.00       0.00          1
2                   1.00       0.67       0.80          3

accuracy                                  0.60          5
macro avg           0.50       0.56       0.49          5
weighted avg        0.70       0.60       0.61          5
Example (custom metrics):
>>> report = MulticlassClassificationReport(
...     num_classes=3,
...     metrics=["precision", "specificity"]
... )
>>> report.update(preds, target)
>>> result = report.compute()

MultilabelClassificationReport

class torchmetrics.classification.MultilabelClassificationReport(num_labels, target_names=None, threshold=0.5, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, metrics=None, **kwargs)[source]

Compute a classification report with precision, recall, F-measure and support for multilabel tasks.

This metric wraps a configurable set of classification metrics (precision, recall, F1-score, etc.) into a single report similar to sklearn’s classification_report.

\[\text{Precision}_l = \frac{\text{TP}_l}{\text{TP}_l + \text{FP}_l}\]
\[\text{Recall}_l = \frac{\text{TP}_l}{\text{TP}_l + \text{FN}_l}\]
\[\text{F1}_l = 2 \cdot \frac{\text{Precision}_l \cdot \text{Recall}_l}{\text{Precision}_l + \text{Recall}_l}\]

For micro-averaged metrics:

\[\text{Micro Precision} = \frac{\sum_l \text{TP}_l}{\sum_l (\text{TP}_l + \text{FP}_l)}\]
\[\text{Micro Recall} = \frac{\sum_l \text{TP}_l}{\sum_l (\text{TP}_l + \text{FN}_l)}\]
\[\text{Micro F1} = \frac{2 \cdot P_{micro} \cdot R_{micro}}{P_{micro} + R_{micro}}\]
Where:
  • \(L\) is the number of labels

  • \(l\) is the label index

  • \(\text{TP}_l, \text{FP}_l, \text{FN}_l\) are true positives, false positives, and false negatives for label \(l\)

As input to forward and update the metric accepts the following input:

  • preds (Tensor): A tensor of predictions of shape (N, L) where L is the number of labels. Can be either probabilities/logits or binary predictions.

  • target (Tensor): A tensor of targets of shape (N, L) containing 0s and 1s

As output to forward and compute the metric returns either:

  • A formatted string report if output_dict=False

  • A dictionary of metrics if output_dict=True

Parameters:
  • num_labels (int) – Number of labels in the dataset

  • target_names (Optional[Sequence[str]]) – Optional list of names for each label. If None, labels will be 0, 1, …, num_labels-1.

  • threshold (float) – Threshold for transforming probability to binary (0,1) predictions

  • digits (int) – Number of decimal places to display in the report

  • output_dict (bool) – If True, return a dict instead of a string report

  • zero_division (Union[str, float]) – Value to use when dividing by zero. Can be 0, 1, or “warn”

  • ignore_index (Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculation

  • metrics (Optional[List[str]]) – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”.

Example

>>> from torch import tensor
>>> from torchmetrics.classification import MultilabelClassificationReport
>>> target = tensor([[1, 0, 1], [0, 1, 0], [1, 1, 0]])
>>> preds = tensor([[1, 0, 1], [0, 1, 1], [1, 0, 0]])
>>> report = MultilabelClassificationReport(num_labels=3)
>>> report.update(preds, target)
>>> print(report.compute())  
               precision     recall   f1-score    support

0                   1.00       1.00       1.00          2
1                   1.00       0.50       0.67          2
2                   0.50       1.00       0.67          1

micro avg           0.80       0.80       0.80          5
macro avg           0.83       0.83       0.78          5
weighted avg        0.90       0.80       0.80          5
samples avg         0.83       0.83       0.78          5