2D + 3D CAD Design

Deep Learning on Point clouds: Implementing PointNet in Google Colab

PointNet is a simple and effective Neural Network for point cloud recognition. In this tutorial we will implement it using PyTorch.

Nikita Karaev

Apr 13, 2020·7 min read

1. Introduction

3D data is crucial for self-driving cars, autonomous robots, virtual and augmented reality. Different from 2D images that are represented as pixel arrays, it can be represented as polygonal meshvolumetric pixel gridpoint cloud, etc.

image from: Create 3D model from a single 2D image in PyTorch

In Computer Vision and Machine Learning today, 90% of the advances deal only with two-dimensional images.

1. 1. Point clouds

Point cloud is a widely used 3D data form, which can be produced by depth sensors, such as LIDARs and RGB-D cameras.

It is the simplest representation of 3D objects: only points in 3D spaceno connectivity. Point clouds can also contain normals to points.

Nearly all 3d scanning devices produce point clouds.

Devices that can capture Point Clouds (Iphone 11, Asus Zenfone AR, Sony Xperia XZ1). Image from: this course

Moreover, recently Apple introduced Ipad Pro with LiDAR Scanner that measures the distance to surrounding objects up to 5 meters away.

1. 2. Deep Learning on Point clouds

So, let’s think how we can process point clouds. CNNs work great for images. Can we use them for 3D?

Idea: generalize 2D convolutions to regular 3D grids

image from: arxiv paper

This actually works.

The main problem is inefficient representation: cubic voxel grid of size 100 will have 1,000,000 voxels.

1. 3. PointNet

But if we try to work with point clouds instead?

There are three main constraints:

  • Point clouds are unordered. Algorithm has to be invariant to permutations of the input set.
  • If we rotate a chair, it’s still a chair, right? Network must be invariant to rigid transformations.
  • Network should capture interactions among points.

The authors of PointNet introduce a neural network that takes all these properties into account. It manages to solve classification, part and semantic segmentations tasks. Let’s implement it!

image from: arxiv paper

2. Implementation

In this section we will reimplement the classification model from the original paper in Google Colab using PyTorch.

You can find the full notebook at:

2. 1. Dataset

In the original paper authors evaluated PointNet on the ModelNet40 shape classification benchmark. It contains 12,311 models from 40 object categories, split into 9,843 training and 2,468 for testing.

For the sake of simplicity let’s use a smaller version of the same dataset: ModelNet10. It consists of objects from 10 categories, 3,991 models for training and 908 for testing.

3D ShapeNets: A Deep Representation for Volumetric Shapes

3D shape is a crucial but heavily underutilized cue in object recognition, mostly due to the lack of a good generic…

Don’t forget to turn on GPU if you want to start training directly

Let’s import necessary libraries:

We can download the dataset directly to the Google Colab Runtime:

This dataset consists of .off files that contain meshes represented by vertices and triangular faces. Vertices are just points in a 3D space and each triangle is formed by 3 vertex indices.

We will need a function to read .off files:

This is what a full mesh looks like:

Mesh in one of .off files. Created using plotly

As you can see, it is a bed 🛏

But if we get rid of faces and keep only 3D-points, it doesn’t look like a bed anymore!

Mesh vertices

Actually, flat parts of a surface don’t require any points for mesh construction. That’s why points are primarily located at angles and rounded parts of the bed.

2. 2. Point sampling

So, as points are not uniformly distributed across object’s surface, it could be difficult for our PointNet to classify them. (Especially knowing that this point cloud doesn’t even look like a bed).

A solution to this could be very simple: let’s uniformly sample points on the object’s surface.

We shouldn’t forget that faces can have different areas.

So, we may assign probability of choosing a particular face proportionally to its area. This is how it can be done:

We will have dense layers in our Network architecture. That’s why we want a fixed number of points in a point cloud. Let’s sample faces from the constructed distribution. After that we sample one point per chosen face:

Some faces can have more than one sampled point while other can not have points at all.

Point cloud created by sampling points on the mesh surface

This point cloud looks much more like a bed! 🛏

2. 3. Augmentations

Let’s think about other possible problems. We know that objects can have different sizes and can be placed in different parts of our coordinate system.

So, let’s translate the object to the origin by subtracting mean from all its points and normalizing its points into a unit sphere. To augment the data during training, we randomly rotate objects around Z-axis and add Gaussian noise as described in the paper:

This is the same bed normalized, with rotation and noise:

Rotated point cloud with added noise

2. 4. Model

Okay, we’ve done with the dataset and pre-processing. Let’s think about the model architecture. The architecture and the key ideas behind it are already explained very well, for example, in this article:

An In-Depth Look at PointNet

PointNet is a seminal paper in 3D perception, applying deep learning to point clouds for object classification and…

We remember that the result should be invariant to input points permutations and geometric transformations, such as rigid transformations.

image from: arxiv paper

Let’s start implementing it in PyTorch:

First of all, our tensors will have size (batch_size, num_of_points, 3). In this case MLP with shared weights is just 1-dim convolution with a kernel of size 1.

To ensure invariance to transformations, we apply the 3×3 transformation matrix predicted by T-Net to coordinates of input points. Interestingly, we can’t encode translations in 3D space by a 3-dimensional matrix. Anyway, we’ve already translated point clouds to the origin during pre-processing.

An important point here is initialisation of the output matrix. We want it to be identity by default to start training with no transformations at all. So, we just add an identity matrix to the output:

We will use the same but 64-dim T-Net to align extracted point features after applying MLP.

To provide permutation invariance, we apply a symmetric function (max pooling) to the extracted and transformed features so the result does not depend on the order of input points anymore.

Let’s combine it all together:

Then, let’s just wrap it all in one class with the last MLP and LogSoftmax at the output:

Finally, we will define the loss function. As we used LogSoftmax for stabilitywe should apply NLLLoss instead of CrossEntropyLoss. Also, we will add two regularization terms in order transformations matrices to be close to orthogonal ( AAᵀ = I ):

2. 5. Training

The final step! We can just use a classic PyTorch training loop. This is definitely not the most interesting part so let’s omit it.

Again, the full Google Colab notebook with a training loop can be found following this link.

Let’s just take a look at the result after training for 15 epochs on GPU. The training itself takes around 3 hours but it may vary depending on the type of GPU assigned to the current session by Colab.

With a simple training loop the overall validation accuracy of 85% can be reached after 13 epochs comparing to 89% for 40 classes in the original work. The point here was to implement the full model, not really to get the best possible score. So, we will leave tweaking the training loop and other experiments as exercise.

Interestingly, our model sometimes confuses dressers with nightstands, toilets with chairs and desks with tables which is rather understandable (except toilets):

3. Final words

You’ve done it! 🎉🎊👏

You implemented PointNet, a Deep Learning architecture that can be used for a variety of 3D recognition tasks. Even though we implemented the classification model here, segmentationnormal estimation or other tasks require only minor changes in the model and dataset classes.

The full notebook is available at

Thank you for reading! I hope this tutorial was useful to you. If it’s the case, please let me know in a comment. By the way, this is my first Medium article so I will be grateful to receive feedback from you in comments or via a private message!


[1] Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (2017), CVPR 2017

[2] Adam Conner-Simons, Deep learning with point clouds (2019), MIT Computer Science & Artificial Intelligence Lab

[2] Loic Landrieu, Semantic Segmentation of 3D point Cloud (2019), Université Paris-Est — Machine Learning and Optimization working Group

[4] Charles R. Qi et al., Volumetric and Multi-View CNNs for Object Classification on 3D Data (2016),

Nikita Karaev

Engineering student, École Polytechnique Paris | Passionate about AI |


Thanks to Zigfrid Zvezdin.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don’t want to miss. Take a look.

You’ll need to sign in or create an account to receive this newsletter.


Your home for data science. A Medium publication sharing concepts, ideas and codes.

·Apr 13, 2020

Introduction to clustering and k-means clusters. Detailed overview and sklearn implementation.

K-Means clustering is one of the most powerful clustering algorithms in the Data Science and Machine Learning world. It is very simple, yet it delivers wonderful results. And because clustering is a very important step for understanding a dataset, in this article we are going to discuss what is clustering, why do we need it and what is k-means clustering going to help us with in data science.

Article overview:

  • What is Clustering
  • What is Unsupervised Machine Learning
  • Clustering applications
  • K-Means Clustering explained
  • K-Means Clustering Algorithm
  • K-Means Clustering Implementation using Scikit-Learn and Python

What is Clustering

Clustering is the task of grouping data into two…

Read more · 9 min read

·Apr 13, 2020

Introduction to the Requests Library


The python requests library simplifies HTTP request tasks such as getting information from websites, posting information, downloading images, following redirects and much more. In this post, we will discuss the basics of the python request library.

Let’s get started!

First, let’s make sure we have the requests library installed. In a command line type:

pip install requests

Now, in a python script import the requests library:

import requests

We will be pulling content from, an image sharing and hosting site:

Read more · 4 min read

·Apr 13, 2020


Automated detection of harmful Asbestos fibers at Construction Sites using Deep Learning based Semantic Segmentation

By: Soma Biswas, Debmalya Biswas

Abstract. Airborne respirable fibers, such as asbestos are hazardous to health and occupational health and safety guidelines and laws require detection and identification of all the asbestos containing materials. However, detection and identification of asbestos fibers is a complex, time-consuming and expensive process. In this work, we present a Deep Learning based Semantic Segmentation model that is able to automate the asbestos analysis process, reducing the turnaround time from hours to minutes. The proposed deep neural network provides end-to-end automation of the analysis process, starting with transforming the input Scanning Electron Microscope (SEM) images, to…

Read more · 11 min read


·Apr 13, 2020

How data visualizations can mislead or misinform the public in the midst of a pandemic, and why domain expertise matters. #VizResponsibly

Photo by Markus Spiske on Unsplash

Everything in the world seems upside down these days.

Simple, everyday activities that we used to take granted — going out for a meal, for a cup of coffee, or to see friends and family, have been vastly transformed if not suspended altogether.

The very few activities that are still available to us like shopping, for necessities, have been turned into a game of The Outside World Is (Invisible) Lava.

Given that, it’s no surprise to see people’s anxieties and boredom ratcheted up (including my own) while stuck at home or very close to it. …

Read more · 8 min read


·Apr 13, 2020

Understanding Social Connections in Newspapers

Source: me


Who are the most influential individuals on the news? What does the sprawling web of politicians, companies and celebrities really look like? How is Meghan Markle related to Argos?

If you’ve ever found yourself lying in bed sleeplessly, wondering any of the above, you’ve clicked on the right article. For we will explore a novel way of representing the vast information fed to us every minute of every hour of every day through news channels and opinion pieces: we are going to build a social network of people on the news.

The Why

We live in an age of media and information…