latent dirichlet allocation 28lda 29 2c a topic model designed for text documents

Solutions on MaxInterview for latent dirichlet allocation 28lda 29 2c a topic model designed for text documents by the best coders in the world

showing results for - "latent dirichlet allocation 28lda 29 2c a topic model designed for text documents"
María
12 May 2020
1# Latent Dirichlet Allocation (LDA), a topic model designed for text documents
2
3from pyspark.ml.linalg import Vectors, SparseVector
4from pyspark.ml.clustering import LDA
5df = spark.createDataFrame([[1, Vectors.dense([0.0, 1.0])],
6                            [2, SparseVector(2, {0: 1.0})],], ["id", "features"])
7lda = LDA(k=2, seed=1, optimizer="em")
8model = lda.fit(df)
9model.isDistributed()
10# True
11localModel = model.toLocal()
12localModel.isDistributed()
13# False
14model.vocabSize()
15# 2
16model.describeTopics().show()
17# +-----+-----------+--------------------+
18# |topic|termIndices|         termWeights|
19# +-----+-----------+--------------------+
20# |    0|     [1, 0]|[0.50401530077160...|
21# |    1|     [0, 1]|[0.50401530077160...|
22# +-----+-----------+--------------------+
23# ...
24model.topicsMatrix()
25# DenseMatrix(2, 2, [0.496, 0.504, 0.504, 0.496], 0)
26lda_path = temp_path + "/lda"
27lda.save(lda_path)
28sameLDA = LDA.load(lda_path)
29distributed_model_path = temp_path + "/lda_distributed_model"
30model.save(distributed_model_path)
31sameModel = DistributedLDAModel.load(distributed_model_path)
32local_model_path = temp_path + "/lda_local_model"
33localModel.save(local_model_path)
34sameLocalModel = LocalLDAModel.load(local_model_path)
similar questions