Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

na online stranicama Daska.

Dostupne verzije

VerzijaModul
2022.11.1dask/2022.11.1

Korištenje

Za širenje na Isabelli putem SGE-a, potrebno je koristiti Dask-MPI knjižnicu kojom se stvara Dask klaster i putem

...

Code Block
languagepy
titlekmeans.py
linenumberstrue
collapsetrue
# https://examples.dask.org/machine-learning/training-on-large-datasets.html

import time

from dask_mpi import initialize
from dask.distributed import Client

import dask_ml.datasets
import dask_ml.cluster
    
import matplotlib.pyplot as plt
    
if __name__ == '__main__':

    # spoji klijenta putem datoteke scheduler.json
    client = Client(scheduler_file="scheduler.json")

    # kreiraj podatke
    n_clusters = 10
    n_samples = 10**4
    n_chunks = int(os.environ['NSLOTS'])-2
    X, _ = dask_ml.datasets.make_blobs(
        centers = n_clusters,
        n_samples = n_samples,
        chunks = n_samples//n_chunks,
    )

    # izračunaj
    km = dask_ml.cluster.KMeans(n_clusters=n_clusters, oversampling_factor=10)
    now = time.time()
    km.fit(X)
    print('GB: %f' % (int(X.nbytes)/1073741824))
    print('elapsed fit: %f' % (time.time()-now))

Performanse

Broj jezgaraDataframeK-means
4111352
898158
168493
3216137
641781