pointcloudset.dataset module¶

class pointcloudset.dataset.Dataset(data: list[dask.delayed.DelayedLeaf] = [], timestamps: list[datetime.datetime] = [], meta: dict = {'orig_file': '', 'topic': ''})¶

Bases: DatasetCore

Dataset Class which contains multiple pointclouds, timestamps and metadata. For more details on how to use the Dataset Class please refer to the usage.ipynb notebook for an interactive tutorial. The notebook can also be found in the tutorial section of the docu.

classmethod from_file(file_path: Path, **kwargs)¶

Reads a Dataset from a file.gfile For larger ROS bagfiles files use the commandline tool pointcloudset to convert the ROS file beforehand.

Supported are the native format which is a directore filled with parquet frames and ROS bag files (.bag).

Parameters

file_path (pathlib.Path) –
File path where Dataset should be read from.

If file format is a directory: pointcloudset.io.dataset.dir.dataset_from_dir()

If file format is a ROS bag file: pointcloudset.io.dataset.bag.dataset_from_rosbag()
**kwargs – Keyword arguments to pass to func.

Returns

Dataset object from file.

Return type

Dataset

Raises

ValueError – If file format is not supported.
TypeError – If file_path is not a Path object.

Examples

pointcloudset.Dataset.from_file(bag_file, topic="lidar/points", keep_zeros=False)

Examples

pointcloudset.Dataset.from_file(bag_file, topic="lidar/points", keep_zeros=False)

to_file(file_path: Path = PosixPath('.'), **kwargs) → None¶

Writes a Dataset to a file.

Supported is the native format which is a directory full of parquet files with meta data.

Parameters

file_path (pathlib.Path) –
File path where Dataset should be saved.

If file format is a directory: pointcloudset.io.dataset.dir.dataset_to_dir()
**kwargs – Keyword arguments to pass to func.

classmethod from_instance(library: str, instance: list[pointcloudset.pointcloud.PointCloud], **kwargs) → Dataset¶

Converts a library instance to a pointcloudset Dataset.

Parameters

library (str) –
Name of the library.

If “pointclouds”: pointcloudset.io.dataset.pointclouds.dataset_from_pointclouds()
instance (list[PointCloud]) – Instance from which to convert.
**kwargs – Keyword arguments to pass to func.

Returns

Dataset object derived from the instance.

Return type

Dataset

Raises

ValueError – If instance is not supported.

Examples

pointcloudset.Dataset.from_instance("pointclouds", [pc1, pc2])

apply(func: collections.abc.Callable[[pointcloudset.pointcloud.PointCloud], pointcloudset.pointcloud.PointCloud] | collections.abc.Callable[[pointcloudset.pointcloud.PointCloud], Any], warn: bool = True, **kwargs) → pointcloudset.dataset.Dataset | pointcloudset.pipeline.delayed_result.DelayedResult¶

Applies a function to the dataset. It is also possible to pass keyword arguments.

Parameters

func (Union[Callable[[PointCloud], PointCloud], Callable[[PointCloud], Any]]) – Function to apply. If it returns a PointCloud and has the according type hint a new Dataset will be generated.
warn (bool) – If True warning if result is not a Dataset, if False warning is turned off.
**kwargs – Keyword arguments to pass to func.

Returns

A Dataset if the function returns a PointCloud, otherwise a DelayedResult object which is a tuple of dask delayed objects.

Return type

Union[Dataset, DelayedResult]

Examples

def func(pointcloud:pointcloudset .PointCloud) -> pointcloudset.PointCloud:
    return pointcloud.limit(x,0,1)

dataset.apply(func)
# This results in a new Dataset

def func(pointcloud:pointcloudset.PointCloud) -> float:
    return pointcloud.data.x.max()

dataset.apply(func)

def func(pointcloud:pointcloudset.PointCloud, test: float) -> float:
    return pointcloud.data.x.max() + test

dataset.apply(func, test=10)

property has_original_id: bool¶

Check if all pointclouds in the Dataset have original_ids

Returns: True if all PointClouds in the the Dataset returns has_original_id.
Return type: bool

agg(agg: str | list | dict, depth: Literal['dataset', 'pointcloud', 'point'] = 'dataset') → pandas.core.series.Series | list[pandas.core.frame.DataFrame] | pandas.core.frame.DataFrame¶

Aggregate using one or more operations over the whole dataset. Similar to pandas.DataFrame.aggregate(). Uses dask.dataframe.DataFrame with parallel processing.

Parameters

agg (Union[str, list, dict]) – Function to use for aggregating.
depth (Literal["dataset", "pointcloud", "point"], optional) – Aggregation level: “dataset”, “pointcloud” or “point”. Defaults to “dataset”.

Returns

Results of the aggregation. This can be a pandas DataFrame or Series, depending on the depth and aggregation.

Return type

Union[pandas.DataFrame, pandas.DataFrame, pandas.Series]

Raises

ValueError – If depth is not “dataset”, “pointcloud” or “point”.

Examples

dataset.agg("max", "pointcloud")

dataset.agg(["min","max","mean","std"])

dataset.agg({"x" : ["min","max","mean","std"]})

min(depth: str = 'dataset')¶

Aggregate using min operation over the whole dataset. Similar to pandas.DataFrame.aggregate(). Uses dask.dataframe.DataFrame with parallel processing.

Parameters

depth (Literal["dataset", "pointcloud", "point"], optional) – Aggregation level:
"dataset" –
"dataset". ("pointcloud" or "point". Defaults to) –

Returns

Aggregated Dataset.

Return type

Union[pandas.DataFrame, pandas.DataFrame, pandas.Series]

Examples

dataset.min()

dataset.min("pointcloud")

dataset.min("point")

Hint

Same as:

dataset.agg(["min"])

max(depth: str = 'dataset')¶

Aggregate using max operation over the whole dataset. Similar to pandas.DataFrame.aggregate(). Uses dask.dataframe.DataFrame with parallel processing.

Parameters

depth (Literal["dataset", "pointcloud", "point"], optional) – Aggregation level:
"dataset" –
"dataset". ("pointcloud" or "point". Defaults to) –

Returns

Aggregated Dataset.

Return type

Union[pandas.DataFrame, pandas.DataFrame, pandas.Series]

Examples

dataset.max()

dataset.max("pointcloud")

dataset.max("point")

Hint

Same as:

dataset.agg(["max"])

mean(depth: str = 'dataset')¶

Aggregate using mean operation over the whole dataset. Similar to pandas.DataFrame.aggregate(). Uses dask.dataframe.DataFrame with parallel processing.

Parameters

depth (Literal["dataset", "pointcloud", "point"], optional) – Aggregation level:
"dataset" –
"dataset". ("pointcloud" or "point". Defaults to) –

Returns

Aggregated Dataset.

Return type

Union[pandas.DataFrame, pandas.DataFrame, pandas.Series]

Examples

dataset.mean()

dataset.mean("pointcloud")

dataset.mean("point")

Hint

Same as:

dataset.agg(["mean"])

std(depth: str = 'dataset')¶

Aggregate using std operation over the whole dataset. Similar to pandas.DataFrame.aggregate(). Uses dask.dataframe.DataFrame with parallel processing.

Parameters

depth (Literal["dataset", "pointcloud", "point"], optional) – Aggregation level:
"dataset" –
"dataset". ("pointcloud" or "point". Defaults to) –

Returns

Aggregated Dataset.

Return type

Union[pandas.DataFrame, pandas.DataFrame, pandas.Series]

Examples

dataset.std()

dataset.std("pointcloud")

dataset.std("point")

Hint

Same as:

dataset.agg(["std"])

extend(dataset: Dataset) → Dataset¶

Extends the dataset by another one.

Parameters: dataset (Dataset) – Dataset to extend another dataset.
Returns: Extended dataset.
Return type: Dataset

animate(**kwargs) → Figure¶

Plot and animate a PointClouds in a dataset as a 3D scatter plot with Plotly. It uses the plot function of PointCloud and bundles them together for an interactive animation pointcloudset.pointcloud.plot().

You can also pass arguments to the Plotly express function plotly.express.scatter_3d().

Parameters: **kwargs – Keyword arguments to pass to plot of a single pointcloud and plotly express.
Returns: The interactive Plotly plot, best used inside a Jupyter Notebook.
Return type: plotly.graph_objs.Figure
Returns: _description_
Return type: go.Figure

Examples

dataset_bag.animate(hover_data=True, color="intensity")