Usage of Package

First import the package, and pathlib which is required to handle files.

[ ]:
import pointcloudset as pcs
print(f"package version: {pcs.__version__}")
from pathlib import Path

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)

import plotly.express as px

%load_ext autoreload
%autoreload 2
%config IPCompleter.greedy=True

Ignore the INFO messages, which comes from the rospy package.

Reading a ROS file into the Dataset

[ ]:
testbag = Path().cwd().parent.joinpath("../../../tests/testdata/test.bag")
[ ]:
testset = pcs.Dataset.from_file(testbag,topic="/os1_cloud_node/points",keep_zeros=False)

This reads the bagfile into the Dataset. Dataset only reads frames from the bagfile if needed, in order to save memory and make it possible to work which huge bagfiles.

NotE You can also read ROS 2 files in the same way.

[ ]:
print(testset)
[ ]:
len(testset)

In order to see whats availble use “tab” to see the availble properties and methods. Alterantivly, use help(), dir(), and the documentation. Also shift tab is nice inside jupyter lab.

Lets enquire the start and end time of the dataset

[ ]:
testset.start_time
[ ]:
testset.end_time
[ ]:
testset.duration

Working with the whole Dataset

You can work with the whole dataset. Even if they are huge, since the package used parallel processing with dask in the background. So make sure that your docker or computer has access to as many CPU cores as possbile.

[ ]:
testset.animate(hover_data=True, color="intensity")
[ ]:
testset.min()

The Dataset class supports the basic functions like min, max, mean and std. They all work on 3 different level: dataset, pointcloud and point. Lets investigate the differences. The default is over the whole dataset.

[ ]:
min_pointcloud = testset.min("pointcloud")
min_pointcloud

So now we have a pandas DataFrame which gives us the min values of each column for each pointcloud. This can also be used for plotting.

[ ]:
px.line(min_pointcloud,x="timestamp", y="x min")

Now lets investigate on the point level.

[ ]:
min_point = testset.min("point")
min_point

So we got a DataFrame with the min value for each point of the whole Dataset. Note that the points are identified by the orginial_id. For some lidars this does not make sense since the points locations changes over time, so please think beforehand if its is usefull for your lidar. Nevertheless, for the Ouster lidars this can be used and is very usefull.

Also note the “N” column which gives the count of the point over the dataset.

All thes methods are based on the aggregate method similar to the one from pandas. It works also on “dataset”, “frame” and “point” level.

[ ]:
testset.agg("min","dataset")
[ ]:
testset.agg(["min","max","mean"],"point")
[ ]:
testset.agg({"x":["max","min"]},"point")
[ ]:
testset.agg({"x":"max"},"point")

Working with a PointCloud

They are based on pandas dataframes and pyntcloud.

Getting a PointCloud from a Dataset

First grab the first p in the dataset:

[ ]:
testpointcloud = testset[0]
[ ]:
print(testpointcloud)

Note that the number of points can vary from frame to frame, since all zero elements are deltede on import (see option keep_zero in the dataset).

[ ]:
len(testpointcloud)

Reading from a pointcloud file

Reads all common formats, provided by pyntcloud.

[ ]:
lasfile = Path("../../../../tests/testdata/las_files/diamond.las")
[ ]:
testpointcloud2 = pcs.PointCloud.from_file(lasfile)
[ ]:
print(testpointcloud2)

Plotting

Plotting is based on plotly which gives interactive plots.

[ ]:
testpointcloud.plot(color="intensity", point_size=0.5)

This plot uses plotly as the backend, which can be rather time consuming. There is currently a limit of 300k points which can be plotted which is enough to plot an Ouster lidar with 128 lines. (set in config.py)

WARNING: delte the output cells with plotly plots, they make the file very big.

Working with pointclouds

The PointCloud consists mainly of the properties “data”, “points” and “timestamp”

[ ]:
testpointcloud.data
[ ]:
testpointcloud.timestamp

So data contains everything as a pandas dataframe. With all its power.

[ ]:
testpointcloud.describe()

Since PointCloud.data is just a pandas datframe. You can do whater you can do with dataframe.

[ ]:
testpointcloud.data.hist();

Pointcloud processing with build in methods

Although you can do a lot with just PointCloud.data and PointCloudpoints, on its own the PointCloud object has methods build in for processing, which in turn return a frame object. The use the power of dataframes, pyntcloud and open3d.

[ ]:
newpointcloud = testpointcloud.limit("x",-5,5).limit("intensity",400,1000).filter("quantile","reflectivity", ">",0.5)
[ ]:
newpointcloud.describe()

So this is now a smaller PointCloud with x ranging from -5 to 5, and with intenisties above 400. Processing steps can be chained together since the return a new PointCloud object.

You can also plot the newpointcloud and investiget it further with tooltips on each point.

[ ]:
newpointcloud.plot("intensity",hover_data=["range"])

Plane segmenation, Clustering and Overlaying Several Plots

Please note that not all processing methods are demonstrated here. For more info please refer to the html documenation of the PointCloud class.

[ ]:
plane = newpointcloud.plane_segmentation(distance_threshold= 0.01,ransac_n= 3,num_iterations= 50, return_plane_model=True)
print(len(plane))
[ ]:
plane
[ ]:
newpointcloud.bounding_box
[ ]:
clusters = newpointcloud.get_cluster(eps=0.5, min_points= 10)
cluster1 = newpointcloud.take_cluster(1,clusters)
cluster2 = newpointcloud.take_cluster(2,clusters)
print(len(cluster1))
print(len(cluster2))
[ ]:
type(cluster1)
[ ]:
newpointcloud.plot(color=None, overlay={"Cluster 1": cluster1,"Cluster 2": cluster2}, hover_data=["intensity"])

Applying Functions to the whole Dataset

Now we can develop a pipeline and but everything together. The .agg method is powerfull but sometimes not flexible enouth. So with .apply you can apply a function to the whole dataset. This again uses dask in the background for lazy evaualtion and parallele processing.

[ ]:
def isolate_target(frame: pcs.PointCloud) -> pcs.PointCloud:
    return frame.limit("x",0,1).limit("y",0,1)

Note the typehints. They are importont as they are used to determine if the result can be a new dataset are not. If the function returns a PointCloud then the result is another Dataset. This is very usefull to chain operations toghether.

[ ]:
testset.apply(isolate_target)

So the result is another Dataset. Now we can chain things together

[ ]:
def diff_to_pointcloud(pointcloud: pcs.PointCloud, to_compare: pcs.PointCloud) -> pcs.PointCloud:
    return pointcloud.diff("pointcloud", to_compare)
[ ]:
result = testset.apply(isolate_target).apply(diff_to_pointcloud, to_compare=testset[0])

Note that this uses lazy evaluation from dask and therfore the result is only calulated when needed. So you could develop a complex chain and then investigate the results.

[ ]:
result[1]

Now we can inquire the resulte even futher by useing .agg from before

[ ]:
result.agg({"x difference":"max"},"pointcloud")
[ ]: