# Deep Learning on Point clouds: Implementing PointNet in Google Colab

# 1. Introduction

3D data is crucial for self-driving cars, autonomous robots, virtual and augmented reality. Different from 2D images that are represented as pixel arrays, it can be represented as polygonal mesh, volumetric pixel grid, point cloud, etc.

In Computer Vision and Machine Learning today, 90% of the advances deal only with two-dimensional images.

## 1. 1. Point clouds

Point cloud is a widely used 3D data form, which can be produced by depth sensors, such as LIDARs and RGB-D cameras.

It is the **simplest representation** of 3D objects: **only points **in 3D space**, **no connectivity. Point clouds can also contain normals to points.

Nearly all 3d scanning devices produce point clouds.

Moreover, recently Apple introduced Ipad Pro with LiDAR Scanner that measures the distance to surrounding objects up to 5 meters away.

## 1. 2. Deep Learning on Point clouds

So, let’s think how we can process point clouds. CNNs work great for images. Can we use them for 3D?

Idea: generalize 2D convolutions to regular 3D grids

This actually works.

The main **problem** is **inefficient representation:** cubic voxel grid of size 100 will have 1,000,000 voxels.

## 1. 3. PointNet

But if we try to work with point clouds instead?

There are three main **constraints**:

- Point clouds are unordered. Algorithm has to be
**invariant to permutations**of the input set. - If we rotate a chair, it’s still a chair, right? Network must be
**invariant to****rigid transformations**. - Network should capture interactions among points.

The authors of PointNet introduce a neural network that takes all these properties into account. It manages to solve **classification**, part and semantic **segmentations** tasks. Let’s implement it!

# 2. Implementation

In this section we will reimplement the **classification model** from the original paper in **Google Colab** using **PyTorch.**

You can find the **full notebook** at: https://github.com/nikitakaraevv/pointnet/blob/master/nbs/PointNetClass.ipynb

## 2. 1. Dataset

In the original paper authors evaluated PointNet on the ModelNet40 shape classification benchmark. It contains 12,311 models from 40 object categories, split into 9,843 training and 2,468 for testing.

For the sake of simplicity let’s use a smaller version of the same dataset: **ModelNet10. **It consists of objects from **10 categories,** 3,991 models for training and 908 for testing.

## 3D ShapeNets: A Deep Representation for Volumetric Shapes

### 3D shape is a crucial but heavily underutilized cue in object recognition, mostly due to the lack of a good generic…

3dvision.princeton.edu

Don’t forget to turn on GPU if you want to start training directly

Let’s import necessary libraries:

We can download the dataset directly to the Google Colab Runtime:

This dataset consists of **.off** files that contain meshes represented by vertices and triangular faces. **Vertices** are just points in a 3D space and each **triangle** is formed by 3 vertex indices.

We will need a function to read **.off** files:

This is what a full mesh looks like:

As you can see, it is a bed 🛏

But if we get rid of faces and keep only 3D-points, it doesn’t look like a bed anymore!

Actually, flat parts of a surface don’t require any points for mesh construction. That’s why points are primarily located at angles and rounded parts of the bed.

## 2. 2. Point sampling

So, as **points are not uniformly distributed **across object’s surface, it could be difficult for our PointNet to classify them. (Especially knowing that this point cloud doesn’t even look like a bed).

A solution to this could be very simple: let’s **uniformly sample points** on the object’s surface.

We shouldn’t forget that faces can have different areas.

So, we may assign probability of choosing a particular face **proportionally to its area**. This is how it can be done:

We will have dense layers in our Network architecture. That’s why we want **a fixed number of points** in a point cloud. Let’s sample faces from the constructed distribution. After that we sample one point per chosen face:

Some faces can have more than one sampled point while other can not have points at all.

This point cloud looks much more like a bed! 🛏

## 2. 3. Augmentations

Let’s think about other possible problems. We know that **objects can have different sizes** and can be placed **in different parts of our coordinate system**.

So, let’s **translate** the object **to the origin** by subtracting mean from all its points and **normalizing** its points into a unit sphere. To augment the data during training, we **randomly rotate** objects around Z-axis and add **Gaussian noise** as described in the paper:

This is the same bed normalized, with **rotation** and **noise**:

## 2. 4. Model

Okay, we’ve done with the dataset and pre-processing. Let’s think about the model architecture. The architecture and the key ideas behind it are already explained very well, for example, in this article:

## An In-Depth Look at PointNet

### PointNet is a seminal paper in 3D perception, applying deep learning to point clouds for object classification and…

medium.com

We remember that the result should be **invariant** to input points *permutations* and geometric *transformations*, such as rigid transformations.

Let’s start implementing it in **PyTorch**:

First of all, our tensors will have size `(batch_size, num_of_points, 3)`

. In this case *MLP* with shared weights is just 1-dim *convolution* with a kernel of size 1.

To ensure **invariance to transformations**, we apply the 3×3 transformation matrix predicted by T-Net to coordinates of input points. Interestingly, we can’t encode translations in 3D space by a 3-dimensional matrix. Anyway, we’ve already translated point clouds to the origin during pre-processing.

An important point here is initialisation of the output matrix. We want it to be identity by default to start training with no transformations at all. So, we just add an identity matrix to the output:

We will use the same but 64-dim T-Net to align extracted point features after applying *MLP*.

To provide **permutation invariance,** we apply a symmetric function (max pooling) to the extracted and transformed features so the result does not depend on the order of input points anymore.

Let’s combine it all together:

Then, let’s just wrap it all in one class with the last *MLP *and* **LogSoftmax** *at the output:

Finally, we will define the** loss function**. As we used *LogSoftmax *for stability*, *we should apply *NLLLoss* instead of *CrossEntropyLoss. *Also, we will add two regularization terms in order transformations matrices to be close to orthogonal ( *AAᵀ = I )*:

## 2. 5. Training

**The final step!** We can just use a classic PyTorch training loop. This is definitely not the most interesting part so let’s omit it.

Again, the full Google Colab notebook with a training loop can be found **following this link****.**

Let’s just take a look at the result after training for **15 epochs** on GPU. The training itself takes around **3 hours** but it may vary depending on the type of GPU assigned to the current session by Colab.

With a simple training loop the overall validation **accuracy of 85%** can be reached after 13 epochs comparing to 89% for 40 classes in the original work. The point here was to implement the full model, not really to get the best possible score. So, we will leave tweaking the training loop and other experiments as exercise.

Interestingly, our model sometimes confuses dressers with nightstands, toilets with chairs and desks with tables which is rather understandable (except toilets):

# 3. Final words

You’ve done it! 🎉🎊👏

You implemented **PointNet**, a Deep Learning architecture that can be used for a variety of 3D recognition tasks. Even though we implemented the **classification** model here, **segmentation**, **normal estimation** or other tasks require only minor changes in the model and dataset classes.

The full notebook is available at https://github.com/nikitakaraevv/pointnet/blob/master/nbs/PointNetClass.ipynb.

*Thank you for reading! I hope this tutorial was useful to you. If it’s the case, please let me know in a comment. By the way, this is my first Medium article so I will be grateful to receive feedback from you in comments or via a private message!*

# References:

[1] Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (2017), CVPR 2017

[2] Adam Conner-Simons, Deep learning with point clouds (2019), MIT Computer Science & Artificial Intelligence Lab

[2] Loic Landrieu, Semantic Segmentation of 3D point Cloud (2019), Université Paris-Est — Machine Learning and Optimization working Group

[4] Charles R. Qi et al., Volumetric and Multi-View CNNs for Object Classification on 3D Data (2016), arxiv.org

## Nikita Karaev

Engineering student, École Polytechnique Paris | Passionate about AI | https://www.linkedin.com/in/nikitakaraev/

Thanks to Zigfrid Zvezdin.

## Sign up for The Variable

### By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don’t want to miss. Take a look.

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Marius Borcan

·Apr 13, 2020

*Introduction to clustering and k-means clusters. Detailed overview and sklearn implementation.*

**K-Means **clustering is one of the most powerful clustering algorithms in the Data Science and Machine Learning world. It is very simple, yet it delivers wonderful results. And because clustering is a very important step for understanding a dataset, in this article we are going to discuss what is clustering, why do we need it and what is k-means clustering going to help us with in data science.

# Article overview:

- What is Clustering
- What is Unsupervised Machine Learning
- Clustering applications
- K-Means Clustering explained
- K-Means Clustering Algorithm
- K-Means Clustering Implementation using Scikit-Learn and Python

# What is Clustering

**Clustering **is the task of grouping data into two…

Read more · 9 min read

Sadrach Pierre, Ph.D.

·Apr 13, 2020

## Introduction to the Requests Library

The python requests library simplifies HTTP request tasks such as getting information from websites, posting information, downloading images, following redirects and much more. In this post, we will discuss the basics of the python request library.

Let’s get started!

First, let’s make sure we have the requests library installed. In a command line type:

`pip install requests`

Now, in a python script import the requests library:

`import requests`

We will be pulling content from imgur.com, an image sharing and hosting site:

Read more · 4 min read

Debmalya Biswas

·Apr 13, 2020

## MAKING SENSE OF BIG DATA

## Automated detection of harmful Asbestos fibers at Construction Sites using Deep Learning based Semantic Segmentation

By: Soma Biswas, Debmalya Biswas

**Abstract .**

*Airborne respirable fibers, such as asbestos are hazardous to health and occupational health and safety guidelines and laws require detection and identification of all the asbestos containing materials. However, detection and identification of asbestos fibers is a complex, time-consuming and expensive process.*

*In this work, we present a Deep Learning based Semantic Segmentation model that is able to automate the asbestos analysis process, reducing the turnaround time from hours to minutes. The proposed deep neural network provides end-to-end automation of the analysis process, starting with transforming the input Scanning Electron Microscope (SEM) images, to…*

Read more · 11 min read

JP Hwang

·Apr 13, 2020

## How data visualizations can mislead or misinform the public in the midst of a pandemic, and why domain expertise matters. #VizResponsibly

Everything in the world seems upside down these days.

Simple, everyday activities that we used to take granted — going out for a meal, for a cup of coffee, or to see friends and family, have been vastly transformed if not suspended altogether.

The very few activities that are still available to us like shopping, for necessities, have been turned into a game of *The Outside World Is (Invisible) Lava.*

Given that, it’s no surprise to see people’s anxieties and boredom ratcheted up (including my own) while stuck at home or very close to it. …

Read more · 8 min read

Marcell Ferencz

·Apr 13, 2020

## Understanding Social Connections in Newspapers

# Preamble

*Who are the most influential individuals on the news? What does the sprawling web of politicians, companies and celebrities really look like? How is Meghan Markle related to Argos?*

If you’ve ever found yourself lying in bed sleeplessly, wondering any of the above, you’ve clicked on the right article. For we will explore a novel way of representing the vast information fed to us every minute of every hour of every day through news channels and opinion pieces: we are going to build a social network of people on the news.

# The Why

We live in an age of media and information…