Clusters-Features’s documentation

Clusters-Features Logo
Welcome to the official documentation of the Python package : Clusters-Features

Package features

ClustersCharacteristics

class ClustersFeatures.ClustersCharacteristics(pd_df_, **args)

Class Author: BERTRAND Simon - simonbertrand.contact@gmail.com

Made for preparing the summer mission with iCube, Strasbourg (D-IR on FoDoMust). This class has been made in order to facilitate the manipulation of Clusters generated by unsupervised techniques. It computes many scores and indexes to evaluate the generated clusters. Some utils tools such as data visualisation are also implemented.

Parameters
  • pd_df (pd.DataFrame) – Dataframe to analyse concatenated with the target vector

  • target (str) – The name of the column target of pd_df dataframe

Returns

ClustersCharacteristics Instance

>>> CC=ClustersCharacteristics(pd_df,label_target="target")

Many features are available as instance variables, here is the list:

InstVar self.num_clusters

Returns the number of clusters

InstVar self.num_observations

Returns the number of observations (pd_df.shape[0])

InstVar self.num_observation_for_specific_cluster

Returns a dict with cluster as key and number of observations as value

InstVar self.data_dimension

Returns the number of features/directions/dimensions (pd_df.shape[1]-1)

InstVar self.labels_clusters

Returns a list of all clusters labels

InstVar self.label_target

Returns the given argument “target” used in the initialisation of ClustersCharacteristics instance

InstVar self.data_clusters

Returns a dict with label cluster as key and sub-dataframe with same label cluster target as value

InstVar self.data_centroids

Returns a dict with label cluster as key and centroid point Series as value

InstVar self.data_barycenter

Returns a Series of the dataframe barycenter

InstVar self.data_radiuscentroid

Returns a dict with [“max”:,”75p”,”median”,”mean”,”min”] as keys and a dict with clusters as keys and centroid radius as value

InstVar self.data_target

Returns the vector target

InstVar self.data_frame

Returns the dataframe without the target vector

InstVar self.data_features

Returns the dataframe with the target vector (pd_df)

InstVar self.data_every_element_distance_to_every_element

Returns pairwise elements distances (Generated by Scipy)

InstVar self.data_every_element_distance_to_centroids

Returns each distance between element of the dataset and each centroid

InstVar self.data_every_possible_cluster_pairs

Returns all the possible clusters pairs of elements

InstVar self.data_every_cluster_element_distance_to_centroids

Returns the distance between element belonging a cluster and its centroid for each cluster

For example :

>>> CC.num_clusters

Data

class ClustersFeatures.src._data.__Data

The ClustersCharacteristics object creates attributes that define clusters. We can find them in the Data subclass. To use these methods, you need to initialize a ClusterCharacteristics instance and then write the corresponding methods:

For example:

>>> CC=ClustersCharacteristics(pd_df,"target")
>>> CC.data_intercentroid_distance_matrix()
data_intercentroid_distance(Cluster1, Cluster2)

Computes distances between centroid of Cluster1 and centroid of Cluster2.

Parameters
  • Cluster1 – Cluster1 label name

  • Cluster2 – Cluster2 label name

Returns

float

>>> CC.data_intercentroid_distance(CC.labels_clusters[0], CC.labels_clusters[1])
data_intercentroid_distance_matrix(**args)

Computes the distance between one centroid and another and return the matrix of this general term

Return a symetric matrix (xi,j)i,j where xi,j is the distance between centroids of cluster i and j

Parameters

target= (bool) – Concatenate the output with the data target

Returns

A symetric pandas dataframe with the computed distances between each centroid

>>> CC.data_intercentroid_distance_matrix()
data_interelement_distance_between_elements_of_two_clusters(Cluster1, Cluster2)

Returns every pairwise distance between elements belonging Cluster1 or Cluster2

If Cluster1 is equal to Cluster2, than these distances are inter-clusters and the output is symetric. Else, these are extra-clusters and the output is not symetric.

Parameters
  • Cluster1 – Label cluster column name

  • Cluster2 – Label cluster column name

Returns

A pandas dataframe with the given clusters pairwise elements distance

>>> CC.data_interelement_distance_between_elements_of_two_clusters(CC.labels_clusters[0], CC.labels_clusters[1])
data_interelement_distance_for_clusters(**args)

Returns a dataframe with two columns. The first column is the distance for each element belonging clusters in the “clusters=” list argument. The second column is a boolean column equal to True when both elements are inside the same cluster. We use here the Pandas Multi-Indexes to allow users to link the column Distance with dataset points.

Parameters

clusters= – labels of clusters to compute pairwise distances

Returns

A pandas dataframe with two columns : one for the distance and the other named ‘Same Cluster ?’ is equal to True if both elements belong the same cluster

Computing all the distances between the 3 first clusters of the dataframe

>>> CC.data_interelement_distance_for_clusters(clusters=CC.labels_clusters[0:3])
data_interelement_distance_for_two_element(ElementId1, ElementId2)

Calls the distance between Element1 and Element2

Parameters
  • ElementId1 – First element pandas index

  • ElementId2 – Second element pandas index

Returns

float

>>> CC.data_interelement_distance_for_two_element(CC.data_features.index[0],CC.data_features.index[1])
data_radius_selector_specific_cluster(Query, Cluster)

Returns the radius of one given cluster with different query.

Parameters
  • Query (str) – in the list [‘max’, ‘min’, ‘median’, ‘mean’] or “XXp” for the XXth radius percentile or “XX%” for a percentage of the max radius.

  • Cluster – The cluster label

Returns

a float.

data_same_target_for_pairs_elements_matrix()

Returns a boolean matrix where the general term is equal to True when the index elements belong the same cluster with the column element

Returns

A boolean pandas dataframe with shape (num_observations,num_observations)

>>> CC.data_same_target_for_pairs_elements_matrix()

Score

This section allows users to evaluate their clustering by checking values of the indices from below.

References :

Clustering Indice - Bernard Desgraupes (University Paris Ouest, Lab Modal’X) - 2017

Study on Different Cluster Validity Indices - Shyam Kumar K, Dr. Raju G (NSS College Rajakumari, Idukki & Kannur University, Kannur in Kerala, India) - 2018

Understanding of Internal Clustering Validation Measures - Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu - 2010

Scatter Score

class ClustersFeatures.src._score.__Score
scatter_matrix_T()

Returns the total dispersion matrix : it is self.num_observations times the variance-covariance matrix of the dataset.

Returns

a Pandas dataframe.

scatter_matrix_WG()

Returns the sum of scatter_matrix_specific_cluster_WGk for all k, it is also called as within group matrix.

Returns

a Pandas dataframe.

scatter_matrix_between_group_BG()

Return the matrix composed with the dispersion between centroids and the barycenter.

Returns

a Pandas dataframe.

scatter_matrix_specific_cluster_WGk(Cluster)

Returns the within cluster dispersion for a specific cluster (sum square distances between cluster’s elements and the centroid of the concerned cluster).

Parameters

Cluster – Cluster label name.

Returns

a Pandas dataframe.

score_between_group_dispersion()

Returns the between group dispersion, can also be seen as the trace of the between group matrix.

Returns

float.

score_mean_quadratic_error()

Mean quadratic error, also the same as score_pooled_within_cluster_dispersion / num_observations.

Returns

float.

score_pooled_within_cluster_dispersion()

Returns the sum of score_within_cluster_dispersion for each cluster.

Returns

float.

score_totalsumsquare()

Trace of scatter_matrix_T, we can compute it differently by using variance function.

Returns

float.

score_within_cluster_dispersion(Cluster)

Returns the trace of the WGk matrix for a specific cluster. It’s the same as score_total_sum_square but computed with WGk matrix’ coefficients.

Parameters

Cluster – Cluster label name.

Returns

float.

Index

class ClustersFeatures.src._score_index.__ScoreIndex
score_index_Log_Det_ratio()

Defined in the first reference.

Returns NaN value when the WG matrix or Total Scatter Matrix is not invertible. :returns: float.

score_index_PBM()

Defined in the first reference.

Returns

float.

score_index_SD()

Defined in the first reference.

Since we haven’t different numbers of cluster, we can’t compute the weighting coefficient : We will pass the average scattering for clusters and the total separation between clusters as the returned tuple.

Returns

A tuple of float that are (Scattering, Separation).

score_index_ball_hall()

Returns the Ball Hall index defined in the first reference.

Returns

float.

score_index_banfeld_Raftery()

Defined in the first reference.

Returns

float.

score_index_c()

Defined in the first reference.

Returns

float.

score_index_c_for_each_cluster(Cluster)

A variant of C Index for each cluster. The main difference is that we do not take the sum of all pairs of point but we directly take the number of pairs for the given cluster.

Parameters

Cluster – Cluster label name.

Returns

float.

score_index_calinski_harabasz()

Defined in the first reference.

Returns

float.

score_index_davies_bouldin()

Defined in the first reference.

It is the mean of score_index_davies_bouldin_for_each_cluster.

Returns

float.

score_index_davies_bouldin_for_each_cluster()

Defined in the first reference.

Returns

np.array of davies bouldin score for each clusters.

score_index_det_ratio()

Defined in the first reference.

Returns NaN value when the WG matrix or Total Scatter Matrix is not invertible. :returns: float.

score_index_dunn()

Defined in the first reference.

Returns

float.

score_index_generalized_dunn(**args)

Returns one of the 18 generalized dunn indices.

param int wc_distance

within cluster indice according to main reference. Int included in [1,2,3].

param int bc_distance

between cluster indice according to main reference. Int included in [1,2,3,4,5,6].

Returns

float.

score_index_generalized_dunn_matrix()

Returns the 18 generalized dunn indices defined in the first reference.

Returns

A pandas dataframe with shape (6,3).

score_index_log_ss_ratio()

Defined in the first reference.

Returns

float.

score_index_mclain_rao()

Defined in the first reference.

Returns

float

score_index_point_biserial()

Defined in the first reference.

Returns

float.

score_index_ratkowsky_lance()

Defined in the first reference.

Returns

float.

score_index_ray_turi()

Defined in the first reference.

Returns

float.

score_index_scott_symons()

Defined in the first reference.

Returns NaN if one of the WGk matrix is not inversible.

Returns

float.

score_index_silhouette()

Using the scikit-learn library to fast compute Silhouette score.

Returns

float.

score_index_silhouette_for_every_cluster()

Using the scikit-learn library to fast compute the mean for each cluster of the silhouette score.

Returns

A pandas Series with silhouette score for each cluster.

score_index_trace_WiB()

Defined in the first reference.

Returns NaN if WG matrix is not invertible.

Returns

float.

score_index_wemmert_gancarski()

A special thanks to M.Gançarski who recruited me for my first traineeship at iCube, Strasbourg, its index have been implemented here:

Defined in the first reference.

Returns

float.

score_index_xie_beni()

Defined in the first reference.

Returns

float.

IndexCore

In this library, there are two types of methods to calculate these scores: Using IndexCore which automatically caches the already calculated indexes or calculating directly using the score_index methods. The second method can make the calculation of the same index repetitive, which can be very slow because we know that some of these indexes have a very high computational complexity.

Warning

Special care to the indices.json structure. All the IndexCore class is based on this json structure. Modifying the aspect of indices.json brings to modifying the structure of many functions in this document. In other words, it is strongly discouraged to modify the global aspect of the json without having done a thorough analysis of the program. To add an index, it is important to add data to the json following its current structure.

indices.json structure dependency : indices.json, _info.py, __init__

class ClustersFeatures.index_core.__IndexCore
IndexCore_compute_every_index()

Compute all indexes and save it to cache.

Returns

A dict with all the indexes values.

IndexCore_generate_output_by_info_type(board_type, indices_type, code)

Returns the queried index. If it has already been computed, then the cached result is returned.

Parameters

board_type: (str) –

A str in the following list [‘general’, ‘radius’, ‘clusters’]. General shows indices that are calculed for the entire dataset. Radius shows informations about the distributions of radius and clusters allows users to check the indices for each clusters.

Parameters

indices_type: (str) –

A str in the following list [‘max’, ‘min’, ‘max diff’, ‘min diff’]. If ‘max’ (respect. ‘min’), then higher (respect. lower) is the score, the better is the clustering. For “max diff” and “min diff”, it is usefull to use them when you need to find the best number of clusters. Max diff will correspond to the maximum difference between clustering 1 with K clusters and clustering 2 with K’ clusters (K!=K’). See the Bernard Desgraupes reference for more explanations.

Parameters

code: (str) –

A str corresponding to one of the code inside the indices.json file. Check these codes with IndexCore_get_all_index().

Returns

list or float or pandas dataframe or pandas series.

IndexCore_get_all_index()

Returns a dict with all the indexes and its corresponding code.

Returns

dict

>>> CC.IndexCore_get_all_index
>>> {'general': {'max': {'Between-group total dispersion': 'G-Max-01', 'Mean quadratic error': 'G-Max-02', 'Silhouette Index': 'G-Max-03', 'Dunn Index': 'G-Max-04', 'Generalized Dunn Indexes': 'G-Max-GDI', 'Wemmert-Gancarski Index': 'G-Max-05', 'Calinski-Harabasz Index': 'G-Max-06', 'Ratkowsky-Lance Index': 'G-Max-07', 'Point Biserial Index': 'G-Max-08', 'PBM Index': 'G-Max-09'}, 'max diff': {'Trace WiB Index': 'G-MaxD-01', 'Trace W Index': 'G-MaxD-02'}, 'min': {'Banfeld-Raftery Index': 'G-Min-01', 'Ball Hall Index': 'G-Min-02', 'C Index': 'G-Min-03', 'Ray-Turi Index': 'G-Min-04', 'Xie-Beni Index': 'G-Min-05', 'Davies Bouldin Index': 'G-Min-06', 'SD Index': 'G-Min-07', 'Mclain-Rao Index': 'G-Min-08', 'Scott-Symons Index': 'G-Min-09'}, 'min diff': {'Det Ratio Index': 'G-MinD-01', 'Log BGSS/WGSS Index': 'G-MinD-02', 'S_Dbw Index': 'G-MinD-03', 'Nlog Det Ratio Index': 'G-MinD-04'}}, 'clusters': {'max': {'Centroid distance to barycenter': 'C-Max-01', 'Between-group Dispersion': 'C-Max-02', 'Average Silhouette': 'C-Max-03', 'KernelDensity mean': 'C-Max-04', 'Ball Hall Index': 'C-Max-05'}, 'min': {'Within-Cluster Dispersion': 'C-Min-01', 'Largest element distance': 'C-Min-02', 'Inter-element mean distance': 'C-Min-03', 'Davies Bouldin Index': 'C-Min-04', 'C Index': 'C-Min-05'}}, 'radius': {'min': {'Radius min': 'R-Min-01', 'Radius mean': 'R-Min-02', 'Radius median': 'R-Min-03', 'Radius 75th Percentile': 'R-Min-04', 'Radius max': 'R-Min-05'}}}
IndexCore_get_number_of_index()

Returns the number of indices inside the indices.json file.

Returns

int

Confusion Hypersphere

The confusion hypersphere subclass counts the number of element contained inside a n-dim sphere (hypersphere) of given radius and centered on each cluster centroid. The given radius is the same for each hypersphere.

class ClustersFeatures.src._confusion_hypersphere.__ConfusionHypersphere
confusion_hyperphere_around_specific_point_for_two_clusters(point, Cluster1, Cluster2, radius)

This function returns the number of elements belonging to Cluster1 or Cluster2 that are contained in the hypersphere of specific radius and centred on the given point.

Parameters
  • point (list,np.ndarray) – The point on which the hypersphere will be centred.

  • Cluster1 – Cluster1 label name.

  • Cluster2 – Cluster2 label name.

  • radius (float) – The radius of the hyperpshere.

Returns

int

confusion_hypersphere_for_linspace_radius_each_element(**args)

This method returns the results of the above method for a linear radius space. “=”

Parameters

n_pts (int) – Allows users to set the radius range.

Returns

A pandas dataframe

confusion_hypersphere_matrix(**args)

Returns the confusion hypersphere matrix.

Parameters
  • radius_choice (float) – The radius of the hypersphere.

  • counting_type (str) – a str in [‘including’, ‘excluding’]. If including, then the elements belonging cluster i and contained inside the hypersphere of centroid i are counted (for i=j). If excluding, then they’re not counted.

  • proportion (bool) – If True, returns the proportion.

Returns

A pandas dataframe.

If (xi,j)i,j is the returned Matrix, then the matrix can be described as follows :

  • for proportion = False : xi,j is the number of element belonging to the cluster j contained inside (euclidian norm) the hyperpshere with specified radius of cluster i

  • for proportion = True : xi,j is the number of element belonging to the cluster j contained inside (euclidian norm) the hypersphere with specified radius of cluster i divided by the number of elements inside the cluster j

>>> CC.confusion_hypersphere_matrix(radius=35, counting_type="including", proportion=True)

Info

The Info subclass shows two different informative boards that gives many kinds of informations about the general dataset and the clusters.

class ClustersFeatures.src._info.__Info
clusters_info(**args)

Generate a board that gives information about the different clusters.

Parameters

scaler (str) – Returns the scaled output. Avalaible scalers : ‘min_max’, ‘robust’, ‘standard’.

Returns

A pandas dataframe.

>>> CC.clusters_info
general_info(**args)

Generate a board that gives general information about the dataset.

Parameters

hide_nan (bool) – Show the NaN indices and their corresponding code. If True, showing is disabled.

Returns

A pandas dataframe.

>>> CC.general_info(hide_nan=False)

Density

class ClustersFeatures.src._density.__Density
density_estimation(method, **args)

Returns an estimation of density by summing n-dim gaussian laws. Since creating a n-dim meshgrid is very high computational complexity, we can only make an estimation on the observations of the dataset. We consider a density function to output a density estimation for a precise n-dim coordinate. Then we apply it to the coordinates of the dataframe points.

Parameters
  • method (str) – a str contained in the list : [‘intra’,’inter’,’total’]. “intra” argument is to specify the density for each observation relative to each cluster. “total” argument is an estimation of the density for each observation relative to all clusters at the same time. “Inter” argument is an estimation of total density of each cluster relative to the total density of another cluster. For this argument, the released matrix is symetric.

  • clusters (list) – List of specified cluster to estimate the density.

Returns

A pandas dataframe depending on the given “method” argument.

density_projection_2D(reduction_method, percentile, **args)

The density projection uses a reduction method to estimate the density with a 2D Meshgrid.

We estimate the density by summing num_observations times a 2D gaussian distribution centred on each element of the dataset. The percentile is an argument that sets the minimum density contour to select. For percentile=99, only the 1% most dense are going to be selected.

Parameters
  • reduction_method (str) – “UMAP” or “PCA”. Reduces the total dimension of the dataframe to 2.

  • percentile (int) – Sets the minimum density contour to select as a percentile of the current density distribution.

  • cluster= (list) – A list of clusters to estimate density.

  • return_clusters_density= (bool) – Adds a key in the returned dict with a Z values meshgrid for each clusters.

  • return_data (bool) – Returns the reduction data. It’s the same as self.utils_PCA(2) or self.utils_UMAP() but packed in the returned dict.

Returns

A dict containing all the data.

>>> CC.density_projection_2D("PCA", 99, cluster=CC.labels_clusters, return_data=False, return_clusters_density=True)
density_projection_2D_generate_png(reduction_method, percentile, **args)

This method generates a PNG where each density shape is observable.

We use the PIL library to generate this PNG.

Parameters
  • reduction_method (str) – “UMAP” or “PCA”

  • percentile (int) – Sets the minimum density contour to select as a percentile of the current density distribution.

  • show_image (bool) – Show the generated image with Plotly. If it is not installed, it is recommended to turn to False this argument.

Returns

A dict containing all the data.

>>> CC.density_projection_2D_generate_png("PCA", 99, show_image=False)
density_projection_3D(percentile, **args)

The density projection uses 3D PCA reduction method to estimate the density with a 3D Meshgrid.

We estimate the density by summing num_observations times a 3D gaussian distribution centred on each element of the dataset. The percentile is an argument that sets the minimum density contour to select. For percentile=99, only the 1% most dense are going to be selected.

Parameters
  • percentile (int) – Sets the minimum density contour to select as a percentile of the current density distribution.

  • cluster= (list) – A list of clusters to estimate density. It is forbidden to put more than 2 distincts clusters. Letting this argument empty will result to a estimation of each clusters as a single density.

  • return_clusters_density= (bool) – Adds a key in the returned dict with the density values for each cluster.

  • return_grid (bool) – Adds a key in the returned dict with the full 3D meshgrid.

Returns

A dict containing all the data.

>>> CC.density_projection_3D(99, cluster=CC.labels_clusters, return_grid=False, return_clusters_density=True)

Utils

class ClustersFeatures.src._utils.__Utils
utils_ClustersRank(**args)

Defines a mean rank for each cluster based on the min/max indexes of the cluster board.

The method uses the min-max scaler to put each row of the clusters_info board at the same dimension. We separate the min and the max indices to output a rank for each indices. If the indices min i is the lower of all clusters, then its rank is self.num_observations-th. To generate the final rank, we compute the mean rank for each cluster with min et max type. Then we sum the mean rank of the min indices to the mean rank of the max indices. As we want a rank where first position is the better, we invert the above sum and get the final rank. By adding params, you can provide the mean rank for each cluster by passing cluster_rank=True.

Parameters

cluster_rank= (bool) – Returns the mean rank for each cluster

Returns

The final leaderboard.

>>> CC.utils_ClustersRank(mean_cluster_rank=True)
utils_KernelDensity(**args)

Function that returns an estimation of Kernel Density with the best bandwidth.

Parameters
  • return_KDE (bool) – If argument return_KDE = True, so the KDE model is returned to generate samples later. It uses the Scikit Learn Library.

  • clusters (list) – List of clusters to evaluate KernelDensity, order is not important.If no clusters specified, the KernelDensity is done on the entire data set

Returns

  • An estimation of KernelDensity for each sample if return_KDE is false

  • A tuple with the estimation of KD for each sample and the KDE model if return_KDE is true

utils_PCA(n_components)

Principal Component Analysis : Use the scikit learn library

Parameters

n_components – number of data dimension after reduction

Returns

A n_components-D projection of the whole data set

utils_UMAP(**args)

Uniform Manifold Approximation Projection : Use the umap-learn library.

The result is cached to avoid same and repetitive calculs.

Parameters

show_target (bool) – Concatenate target to output dataframe

Returns

A pandas dataframe with the 2D projection of the whole data set.

utils_ts_filtering(filter, **args)

Filter a time-serie with different filters from statsmodels.

Col argument can be specified if it is wanted to filter a column of self dataframe.
Else, you can directly specify a time-serie with the data argument.
Parameters

filter (str) – Type of filter. Have to be in the list [‘STL’, ‘HP’, ‘BK’, ‘CF’] respectively for :

Parameters
  • periods= (int/float) – Specify the period between each sample.

  • col= (str/int) – Required if data is None: Specify the column of self data set to filter.

  • data= (list/np.ndarray) – Required if col is None : Specify the data to filter.

Returns

A pandas dataframe with columns as the decomposed signals.

Graph

class ClustersFeatures.src._graph.__Graph
graph_PCA_3D()

Shows the 3D PCA reduction graph with Plotly.

Returns

Plotly figure instance

>>> CC.graph_PCA_3D()
graph_boxplots_distances_to_centroid(Cluster)

Shows a box plot of the distances between all elements and the centroid of given cluster.

Parameters

Cluster – Cluster centroid name to evaluate the elements distance with.

Returns

Plotly figure instance.

>>> CC.graph_boxplots_distances_to_centroid(CC.labels_clusters[0])
graph_confusion_hypersphere_evolution_for_linspace_radius(n_pts, proportion)

Returns a Plotly animation with dataframes generated by the Confusion Hypersphere for different radius.

This animation allows users to understand which clusters are more confused with each other. You can also interpret compactness as follows: The diagonal term (when proportion is True) that first reaches the value 1 corresponds to the most compact cluster in the dataset :param int n_pts: Number of points for the radius linspace. :param bool proportion: Put the value of proportion to Confusion Hypersphere arguments :returns: Plotly figure instance.

>>> CC.graph_confusion_hypersphere_evolution_for_linspace_radius(50, True)
graph_projection_2D(feature1, feature2)

A simple 2D projection on two given features with Plotly.

Parameters
  • feature1 – The first dataframe columns to project

  • feature2 – The second dataframe columns to projectv

Returns

Plotly figure instance.

graph_reduction_2D(reduction_method)

Shows the 2D reduction graph with Plotly.

Parameters

reduction_method (str) – “UMAP” or “PCA”

Returns

Plotly figure instance

>>> CC.graph_reduction_2D("UMAP")
graph_reduction_density_2D(reduction_method, percentile, graph)

Shows the result of 2D PCA density estimation with Plotly.

Parameters
  • reduction_method (str) – “UMAP” or “PCA”. Reduces the total dimension of the dataframe to 2.

  • percentile (int) – Sets the minimum density contour to select as a percentile of the current density distribution.

  • graph (str) – “interactive” or “contour”. Shows different ways to visualize the density.

Returns

Plotly figure instance.

graph_reduction_density_3D(percentile, **args)

Shows the result of 3D PCA density estimation with Plotly.

Parameters
  • percentile (int) – Sets the minimum density contour to select as a percentile of the current density distribution.

  • cluster= (list) – A list of clusters to estimate density.

Returns

Plotly figure instance

>>> CC.graph_reduction_density_3D(99,cluster=CC.labels_clusters[:2])
>>> CC.graph_reduction_density_3D(99,cluster=CC.labels_clusters[0])
>>> CC.graph_reduction_density_3D(99)

Indices and tables