Home

Novel View Synthesis

Research internship on reducing annotation requirements for visual object classification using generative models.

Python Generative AI

This project was conducted during a research internship at Silèane.
Due to industrial confidentiality constraints, the code and datasets are not open-source.

The objective was to investigate whether generative novel-view synthesis could reduce the amount of human-annotated data required to train robust object classification models in industrial settings, where annotation is costly and time-consuming.

#Problem Setting

In industrial perception pipelines, object classifiers often rely on large annotated datasets to handle variations in viewpoint, lighting, and appearance. However, collecting such datasets is expensive and does not scale well to new objects or frequent changes in production.

The core question addressed in this internship was:

Can a model trained on a very small number of annotated images achieve strong generalization by leveraging synthetic visual variations generated by a diffusion-based model?

#System Overview

The proposed pipeline takes as input a single annotated image of an object and generates multiple realistic variations by modifying:

  • viewpoint,
  • lighting conditions,
  • texture appearance.

Diffusion models were used to generate these variations while preserving object identity, with the goal of improving robustness to visual distribution shifts during training.

The augmented dataset was then used to train an object classification network, and its performance was compared against a baseline trained only on human-annotated data.

#Technical Challenges & Design Choices

  • Controlling generative diversity while maintaining object identity to avoid label noise.
  • Preventing distribution drift between synthetic and real data.
  • Balancing augmentation strength to improve generalization without overfitting to artifacts.
  • Designing an evaluation protocol suitable for low-data regimes.

#Outcome

The final system was able to generate up to 46 distinct visual variations from a single annotated image.

Experiments showed that, with only a dozen human-annotated images, the classifier reached up to 98% accuracy on the target object classification task—significantly outperforming models trained without synthetic augmentation.

This work demonstrated the practical potential of generative models as a data-efficient alternative to large-scale manual annotation in industrial vision pipelines.

Share this page