clustering python

Solutions on MaxInterview for clustering python by the best coders in the world

showing results for - "clustering python"

13 May 2020

1#HIERARCHCAL CLUSTERING
2
3#import the linkage, dendrogram and fcluster func from scipy
4from scipy.cluster.hierarchy import dendrogram,linkage,fcluster
5import matplotlib.pyplot as plt
6import numpy as np
7#Create your linkage object, it contains all the info about the 
8#joins and clusterization
9#The "ward" argument refers to the linking method ("single","average"...)
10Z = linkage(X,"ward")
11#plotting the dendrogram
12plt.figure(figsize = (25,30))
13# Color threshold refers to the distance cutoff for coloring the clusters
14dendrogram(Z, leaf_font_size = 8, color_threshold = 10)
15plt.show()
16# fcluster returns an array as big as your df \w the cluster each data belongs
17# u can cut the clusters using diferent "criterion". Some examples:
18
19# U only want 4 clusters:
20k = 4
21clusters = fcluster(Z,k,criterion="maxclust")
22
23# U want the max distance in a cluster to be 10:
24max_d = 20
25clusters = fcluster(Z,max_d, criterion = "distance")
26
27# Visualization of the clustering (2d clustering)
28plt.figure(figsize = (10,8))
29# now we use the object /w the ncluster info to color the scatter
30# cmap refers to the color palette we are using
31plt.scatter(X[:,0], X[:,1] ,  c = clusters, cmap = "brg")
32plt.show()

Dario

13 Nov 2016

1from sklearn.cluster import KMeans
2kmeans = KMeans(init="random", n_clusters=3, n_init=10, max_iter=300, random_state=42 )
3kmeans.fit(x_train) #Replace your training dataset instead of x_train
4# The lowest SSE value
5print(kmeans.inertia_)
6# Final locations of the centroid
7print(kmeans.cluster_centers_)
8# The number of iterations required to converge
9print(kmeans.n_iter_)
10# first five predicted labels 
11print(kmeans.labels_[:5])
12
13
14# init controls the initialization technique. The standard version of the k-means algorithm is implemented by setting init to "random". Setting this to "k-means++" employs an advanced trick to speed up convergence, which you’ll use later.
15
16# n_clusters sets k for the clustering step. This is the most important parameter for k-means.
17
18# n_init sets the number of initializations to perform. This is important because two runs can converge on different cluster assignments. The default behavior for the scikit-learn algorithm is to perform ten k-means runs and return the results of the one with the lowest SSE.
19
20# max_iter sets the number of maximum iterations for each initialization of the k-means algorithm.

similar questions

k means clustering python medium assign each point to the cluster with the closest centroid python abstraction example python how to smooth a function in python python average mean python k means clustering and disabling clusters grafica de clustering en 3d python average python calcutalte average python kmeans python python average function program hierarchical clustering dendrogram python example how to use a function to find the average in python what does 2a mean in python in functions python function guts how to calculate mean in python mean python code scipy cluster hierarchy

queries leading to this page