Classification Report¶
Module Interface¶
ClassificationReport¶
- class torchmetrics.ClassificationReport(**kwargs)[source]¶
Compute a classification report with precision, recall, F-measure and support for each class.
This is a wrapper that automatically selects the appropriate task-specific metric based on the
taskargument. It uses a collection of existing TorchMetrics classification metrics internally, allowing you to customize which metrics are included in the report.\[\text{Precision}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c}\]\[\text{Recall}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FN}_c}\]\[\text{F1}_c = 2 \cdot \frac{\text{Precision}_c \cdot \text{Recall}_c}{\text{Precision}_c + \text{Recall}_c}\]- Where:
\(c\) is the class/label index
\(\text{TP}_c, \text{FP}_c, \text{FN}_c\) are true positives, false positives, and false negatives for class \(c\)
- Parameters:
task¶ – The classification task type. One of
'binary','multiclass', or'multilabel'.threshold¶ – Threshold for transforming probability to binary (0,1) predictions (for binary/multilabel)
num_classes¶ – Number of classes (required for multiclass)
num_labels¶ – Number of labels (required for multilabel)
target_names¶ – Optional list of names for each class/label
digits¶ – Number of decimal places to display in the report
output_dict¶ – If True, return a dict instead of a string report
zero_division¶ – Value to use when dividing by zero. Can be 0, 1, or “warn”
ignore_index¶ – Specifies a target value that is ignored and does not contribute to the metric calculation
top_k¶ – Number of highest probability predictions considered (for multiclass)
metrics¶ – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”. You can use aliases like “f1” or “f-measure” for “f1-score”.
- Example (Binary Classification):
>>> from torch import tensor >>> from torchmetrics.classification import ClassificationReport >>> target = tensor([0, 1, 0, 1]) >>> preds = tensor([0, 1, 1, 1]) >>> report = ClassificationReport(task="binary") >>> report.update(preds, target) >>> print(report.compute()) precision recall f1-score support 0 1.00 0.50 0.67 2 1 0.67 1.00 0.80 2 accuracy 0.75 4 macro avg 0.83 0.75 0.73 4 weighted avg 0.83 0.75 0.73 4
- Example (Custom Metrics):
>>> report = ClassificationReport( ... task="multiclass", ... num_classes=3, ... metrics=["precision", "recall", "specificity"] ... )
BinaryClassificationReport¶
- class torchmetrics.classification.BinaryClassificationReport(threshold=0.5, target_names=None, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, metrics=None, **kwargs)[source]¶
Compute a classification report with precision, recall, F-measure and support for binary tasks.
This metric wraps a configurable set of classification metrics (precision, recall, F1-score, etc.) into a single report similar to sklearn’s classification_report.
Internally, binary classification is treated as a 2-class multiclass problem to provide per-class metrics for both class 0 and class 1.
\[\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}\]\[\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}\]\[\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}\]\[\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}}\]Where \(\text{TP}\), \(\text{FP}\), \(\text{TN}\) and \(\text{FN}\) represent the number of true positives, false positives, true negatives and false negatives respectively.
As input to
forwardandupdatethe metric accepts the following input:preds(Tensor): A tensor of predictions of shape(N, ...). If preds is a floating point tensor with values outside [0,1] range, we consider the input to be logits and will auto-apply sigmoid. Additionally, we convert to int tensor with thresholding.target(Tensor): A tensor of targets of shape(N, ...)
As output to
forwardandcomputethe metric returns either:A formatted string report if
output_dict=FalseA dictionary of metrics if
output_dict=True
- Parameters:
threshold¶ (
float) – Threshold for transforming probability to binary (0,1) predictionstarget_names¶ (
Optional[Sequence[str]]) – Optional list of names for each class. Defaults to [“0”, “1”].digits¶ (
int) – Number of decimal places to display in the reportoutput_dict¶ (
bool) – If True, return a dict instead of a string reportzero_division¶ (
Union[str,float]) – Value to use when dividing by zero. Can be 0, 1, or “warn”ignore_index¶ (
Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculationmetrics¶ (
Optional[List[str]]) – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”.
Example
>>> from torch import tensor >>> from torchmetrics.classification import BinaryClassificationReport >>> target = tensor([0, 1, 0, 1]) >>> preds = tensor([0, 1, 1, 1]) >>> report = BinaryClassificationReport() >>> report.update(preds, target) >>> print(report.compute()) precision recall f1-score support 0 1.00 0.50 0.67 2 1 0.67 1.00 0.80 2 accuracy 0.75 4 macro avg 0.83 0.75 0.73 4 weighted avg 0.83 0.75 0.73 4
MulticlassClassificationReport¶
- class torchmetrics.classification.MulticlassClassificationReport(num_classes, target_names=None, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, top_k=1, metrics=None, **kwargs)[source]¶
Compute a classification report with precision, recall, F-measure and support for multiclass tasks.
This metric wraps a configurable set of classification metrics (precision, recall, F1-score, etc.) into a single report similar to sklearn’s classification_report.
\[\text{Precision}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FP}_c}\]\[\text{Recall}_c = \frac{\text{TP}_c}{\text{TP}_c + \text{FN}_c}\]\[\text{F1}_c = 2 \cdot \frac{\text{Precision}_c \cdot \text{Recall}_c}{\text{Precision}_c + \text{Recall}_c}\]\[\text{Support}_c = \text{TP}_c + \text{FN}_c\]For average metrics:
\[\text{Macro F1} = \frac{1}{C} \sum_{c=1}^{C} \text{F1}_c\]\[\text{Weighted F1} = \sum_{c=1}^{C} \frac{\text{Support}_c}{N} \cdot \text{F1}_c\]- Where:
\(C\) is the number of classes
\(N\) is the total number of samples
\(c\) is the class index
\(\text{TP}_c, \text{FP}_c, \text{FN}_c\) are true positives, false positives, and false negatives for class \(c\)
As input to
forwardandupdatethe metric accepts the following input:As output to
forwardandcomputethe metric returns either:A formatted string report if
output_dict=FalseA dictionary of metrics if
output_dict=True
- Parameters:
target_names¶ (
Optional[Sequence[str]]) – Optional list of names for each class. If None, classes will be 0, 1, …, num_classes-1.digits¶ (
int) – Number of decimal places to display in the reportoutput_dict¶ (
bool) – If True, return a dict instead of a string reportzero_division¶ (
Union[str,float]) – Value to use when dividing by zero. Can be 0, 1, or “warn”ignore_index¶ (
Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculationtop_k¶ (
int) – Number of highest probability predictions considered for finding the correct labelmetrics¶ (
Optional[List[str]]) – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”. You can use aliases like “f1” or “f-measure” for “f1-score”.
Example
>>> from torch import tensor >>> from torchmetrics.classification import MulticlassClassificationReport >>> target = tensor([0, 1, 2, 2, 2]) >>> preds = tensor([0, 0, 2, 2, 1]) >>> report = MulticlassClassificationReport(num_classes=3) >>> report.update(preds, target) >>> print(report.compute()) precision recall f1-score support 0 0.50 1.00 0.67 1 1 0.00 0.00 0.00 1 2 1.00 0.67 0.80 3 accuracy 0.60 5 macro avg 0.50 0.56 0.49 5 weighted avg 0.70 0.60 0.61 5
- Example (custom metrics):
>>> report = MulticlassClassificationReport( ... num_classes=3, ... metrics=["precision", "specificity"] ... ) >>> report.update(preds, target) >>> result = report.compute()
MultilabelClassificationReport¶
- class torchmetrics.classification.MultilabelClassificationReport(num_labels, target_names=None, threshold=0.5, digits=2, output_dict=False, zero_division=0.0, ignore_index=None, metrics=None, **kwargs)[source]¶
Compute a classification report with precision, recall, F-measure and support for multilabel tasks.
This metric wraps a configurable set of classification metrics (precision, recall, F1-score, etc.) into a single report similar to sklearn’s classification_report.
\[\text{Precision}_l = \frac{\text{TP}_l}{\text{TP}_l + \text{FP}_l}\]\[\text{Recall}_l = \frac{\text{TP}_l}{\text{TP}_l + \text{FN}_l}\]\[\text{F1}_l = 2 \cdot \frac{\text{Precision}_l \cdot \text{Recall}_l}{\text{Precision}_l + \text{Recall}_l}\]For micro-averaged metrics:
\[\text{Micro Precision} = \frac{\sum_l \text{TP}_l}{\sum_l (\text{TP}_l + \text{FP}_l)}\]\[\text{Micro Recall} = \frac{\sum_l \text{TP}_l}{\sum_l (\text{TP}_l + \text{FN}_l)}\]\[\text{Micro F1} = \frac{2 \cdot P_{micro} \cdot R_{micro}}{P_{micro} + R_{micro}}\]- Where:
\(L\) is the number of labels
\(l\) is the label index
\(\text{TP}_l, \text{FP}_l, \text{FN}_l\) are true positives, false positives, and false negatives for label \(l\)
As input to
forwardandupdatethe metric accepts the following input:As output to
forwardandcomputethe metric returns either:A formatted string report if
output_dict=FalseA dictionary of metrics if
output_dict=True
- Parameters:
target_names¶ (
Optional[Sequence[str]]) – Optional list of names for each label. If None, labels will be 0, 1, …, num_labels-1.threshold¶ (
float) – Threshold for transforming probability to binary (0,1) predictionsdigits¶ (
int) – Number of decimal places to display in the reportoutput_dict¶ (
bool) – If True, return a dict instead of a string reportzero_division¶ (
Union[str,float]) – Value to use when dividing by zero. Can be 0, 1, or “warn”ignore_index¶ (
Optional[int]) – Specifies a target value that is ignored and does not contribute to the metric calculationmetrics¶ (
Optional[List[str]]) – List of metrics to include in the report. Defaults to [“precision”, “recall”, “f1-score”]. Supported metrics: “precision”, “recall”, “f1-score”, “accuracy”, “specificity”.
Example
>>> from torch import tensor >>> from torchmetrics.classification import MultilabelClassificationReport >>> target = tensor([[1, 0, 1], [0, 1, 0], [1, 1, 0]]) >>> preds = tensor([[1, 0, 1], [0, 1, 1], [1, 0, 0]]) >>> report = MultilabelClassificationReport(num_labels=3) >>> report.update(preds, target) >>> print(report.compute()) precision recall f1-score support 0 1.00 1.00 1.00 2 1 1.00 0.50 0.67 2 2 0.50 1.00 0.67 1 micro avg 0.80 0.80 0.80 5 macro avg 0.83 0.83 0.78 5 weighted avg 0.90 0.80 0.80 5 samples avg 0.83 0.83 0.78 5