Skip to main content
Achieving high model performance is an iterative process of training, analyzing, and refining. It’s not just about training a model once, but about creating a continuous loop of improvement. This involves two key strategies:
  • First, a deep dive into your model’s results to understand where and why it’s making mistakes, which includes identifying both model weaknesses and errors in your ground truth labels.
  • Second, it involves being strategic about which data you choose to label next. By focusing your labeling efforts on the data that will provide the most value, you can improve your model’s performance more efficiently.

Find and fix your model’s errors

Model error analysis is the process of investigating your model’s incorrect predictions to understand the root cause of its failures. Models rarely fail randomly; they often struggle with specific, predictable patterns. By identifying these patterns, you can move beyond simple accuracy scores and take targeted actions. This might involve sourcing more data that captures these edge cases, giving your model the specific examples it needs to learn and improve.

How to find model errors

  1. Select a model run: Go to the Models tab and select the model and specific model run you want to analyze.
  2. Filter for prediction disagreements:
    • In the filter panel on the left, click + Add filter.
    • Select the iou (Intersection over Union) metric.
    • Set the condition to < (less than) a low value, such as 0.5. This will isolate data rows where the model’s predicted annotation has poor spatial overlap with the ground truth label.
  3. Sort by lowest confidence:
    • At the top of the data row gallery, find the Sort by dropdown menu.
    • Select Confidence and choose the Ascending order (arrow pointing up). This brings the predictions your model is least certain about to the front.
  4. Inspect and identify patterns:
    • Click on the data rows at the top of the sorted list to open the detailed view.
    • Visually inspect the image and the annotations. Look for common characteristics among the errors. For example, you may notice the model consistently fails on blurry images, in low-light conditions, or with objects at unusual angles.
  5. Use the metrics and projector views for high-level insights:
    • Click on the Metrics tab within the model run to see a class-by-class breakdown of precision and recall. This can quickly point you to which object classes are causing the most problems.
    • Click on the Projector tab to see a 2D or 3D visualization of your data’s embeddings. Look for areas where the color-coded clusters for different classes overlap, as this indicates the model is “confusing” them.

How to fix model errors

  1. **Select the relevant data rows: **
    • While in the model run’s gallery view, hover over the data rows that represent a failure pattern and click the checkbox in the top-left corner of each thumbnail.
  2. Find visually similar data:
    • Once you have a few examples selected, a menu will appear at the bottom of the screen.
    • Click the Find similar button (which looks like a magic wand). This will take you to Catalog and automatically search for other data in your dataset that is visually similar to your selection.
  3. Create a batch and send for labeling:
    • From the “similar data” search results in Catalog, select the unlabeled data rows you want to add to your training set.
    • Click the + Add to batch button at the bottom of the screen.
    • Give the batch a descriptive name (e.g., “Low-light failure cases”).
    • Navigate to your labeling project, select this batch, and assign it to your labelers.

Find and fix your labeling mistakes

The quality of your training data is the single most important factor in your model’s performance. Labeling mistakes can silently degrade performance, but your model itself can be a powerful tool for finding them. When a high-confidence model prediction contradicts a ground truth label, it’s a strong signal that the human labeler may have made a mistake.

How to find labeling mistakes

  1. **Filter for prediction disagreements: **Just as before, navigate to your model run and add a filter for iou < 0.5 to find where predictions and ground truth labels do not align.
  2. Sort by highest confidence:
    • In the Sort by dropdown menu, select Confidence.
    • This time, choose Descending order (arrow pointing down).
    • This surfaces data rows where the model was very confident (> 0.9) but its prediction still disagreed with the ground truth—a strong indicator of a potential labeling error.
  3. Inspect the data: Click on the top results to inspect them. If the model’s prediction looks correct and the ground truth label is clearly wrong, you’ve found a labeling mistake.
  4. Use the projector view to spot outliers:
    • Navigate to the Projector tab.
    • In the display options, choose to Color by: Feature schema (your ground truth classes).
    • Look for “islands” of a single color within a larger cluster of a different color. For example, a single blue dot (labeled ‘car’) in the middle of a large red cluster (labeled ‘truck’) is very likely a mislabeled data point. Click on the dot to view the data row directly.

How to fix labeling mistakes

  1. Select data rows with confirmed mistakes: In the gallery view of your model run, use the checkboxes to select all the data rows that you have confirmed contain labeling errors.
  2. Open the selection in Catalog: In the menu at the bottom of the screen, click Open in Catalog.
  3. Tag for re-labeling:
    • With the items still selected in Catalog, go to the Metadata panel on the right.
    • Click the + icon to add a new tag.
    • Create and apply a tag named re-label or rework.
  4. Filter and send to the rework step in your project:
    • Go to your labeling project and navigate to the Data Rows tab.
    • Create a filter for Metadata, and select the re-label tag you just created.
    • Select all the filtered data rows.
    • Click the Send to Rework button. This will put the data back into the labeling queue for correction.

Prioritize high-value data to label (active learning)

Labeling data is often the most expensive and time-consuming part of any machine learning project. Active learning is a strategy that helps you maximize your return on investment by intelligently selecting which data to label next. Instead of labeling data at random, you focus your efforts on the examples that your model finds most confusing. The intuition is simple: the model learns more from data it is uncertain about than from data it can already predict with high confidence.

How to perform uncertainty sampling

  1. Generate predictions on unlabeled data: Use your currently trained model to generate predictions and confidence scores on a large pool of your unlabeled data.
  2. Upload predictions to a new model run:
    • Go to the Models tab and select your model.
    • Click + New model run.
    • Follow the instructions to upload your unlabeled data rows along with the corresponding model predictions and confidence scores in the specified format.
  3. Sort by lowest confidence:
    • Once the model run is created, open it.
    • Go to the Sort by dropdown menu at the top of the gallery.
    • Select Confidence and choose Ascending order. This will surface the data your model is most uncertain about.
  4. Select and batch the most uncertain data:
    • Select the top N data rows from the sorted list by using the checkboxes. This N is your budget for how much new data you want to label.
    • From the menu at the bottom, click + Add to batch.
    • Give the batch a descriptive name, like “Uncertainty sampling batch 1”.
  5. Assign the batch for labeling:
    • Navigate to your labeling project.
    • Find the new batch you just created, select it, and assign it to your labelers.
  6. Retrain your model: After this high-value data has been labeled, add it to your training dataset and retrain your model. This targeted approach should yield a more significant performance improvement than randomly sampling data.