Table of Contents | ||
---|---|---|
|
Opis
Tensorflow je python knjižnica namijenjena razvoju aplikacija temeljenih na dubokom učenju koja se oslanja na ubrzanje grafičkim procesorima. Jedna od glavnih značajki ove knjižnice je postojanje API-a za brži razvoj modela strojnog učenja Keras, koja u sebi sadrži module i funkcije za svaki dio pipelinea u tipičnoj ML aplikaciji (preprocessing podataka, definicija modela, načina optimizacije i validacije)
...
Ispod se nalaze primjeri aplikacija umjetnog benchmarka koji testira performanse na modelu Resnet50.
Primjeri su redom:
- benchmarksinglegpu.py - python skripta umjetnog benchmarka
- singlegpu.sh - skripta sustava PBS koja koristi jedan grafički procesor multigpu-singlenode.sh - skripta sustava PBS koja koristi više grafičkih procesora na jednom čvoru
- * - skripte za pokretanje na jednom grafičkom procesoru
- multigpu-singlenode.* - skripte za pokretanje na više grafičkih procesora na jednom čvoru
- multigpu-multinode.* - skripte za pokretanje na više grafičkih procesora na jednom čvoru
Jedan grafički procesor
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash
#PBS -q gpu
#PBS -l select=1:ncpus=8:ngpus=1:mem=10GB
#PBS -o output/
#PBS -e output/
# pozovi modul
module load scientific/tensorflow/2.10.1-ngc
# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}
# potjeraj skriptu
run-singlenode.sh singlegpu.py |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/usr/bin/env python3 # source: # - https://github.com/leondgarse/Keras_insightface/discussions/17 import sys import time import argparse import numpy as np import tensorflow as tf # input arguments parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("-s",def main(): # vars batch_size = 256 samples = 256 "--strategy",* 20 epochs = 10 # do not allocate all GPU memory type=int,gpus help="{1: OneDeviceStrategy, 2: MirroredStrategy}", tf.config.experimental.list_physical_devices('GPU') for gpu in gpus: default=1) parser.add_argument("-i", tf.config.experimental.set_memory_growth(gpu, True) # use fp16 for faster inference tf.keras.mixed_precision.set_global_policy('mixed_float16') "--images",# strategy gpus = tf.config.experimental.list_physical_devices('GPU') devices = [ gpu.name[-5:] for gpu in type=int,gpus ] strategy = tf.distribute.OneDeviceStrategy(device=devices[0]) # dataset data help="batch size", np.random.uniform(size=[samples, 224, 224, 3]) target = np.random.uniform(size=[samples, 1], default=1024) parser.add_argument("-b",low=0, high=999).astype("int64") dataset = tf.data.Dataset.from_tensor_slices((data, target)) dataset "--batch_size",= dataset.batch(batch_size*strategy.num_replicas_in_sync) # define model with strategy.scope(): model type=int,= tf.keras.applications.ResNet50(weights=None) loss = tf.keras.losses.SparseCategoricalCrossentropy() optimizer help="batch size", tf.optimizers.SGD(0.01) default=8) parser.add_argument("-e",model.compile(optimizer=optimizer, loss=loss) # fit callbacks = [] "--epochs"model.fit(dataset, type=intcallbacks=callbacks, help="epochs"epochs=epochs, default=10) parser.add_argument("-m", "--model_name", type=str, help="model name", default="ResNet50") parser.add_argument("-f", "--use_fp16", action="store_true", help="Use fp16") args = parser.parse_known_args(sys.argv[1:])[0] verbose=2) if __name__ == "__main__": main() |
Više grafičkih procesora na jednom čvoru
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash
#PBS -q gpu
#PBS -l select=1:ncpus=16:ngpus=2:mem=10GB
#PBS -o output/
#PBS -e output/
# pozovi modul
module load scientific/tensorflow/2.10.1-ngc
# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}
# potjeraj skriptu
run-singlenode.sh multigpu-singlenode.py |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/usr/bin/env python3 # source: # - https://github.com/leondgarse/Keras_insightface/discussions/17 import sys import time import argparse import numpy as np import tensorflow as tf def main(): # vars batch_size = 256 samples = 256 * 20 epochs = 10 # do not allocate all GPU memory gpus = tf.config.experimental.list_physical_devices('GPU') for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) # use fp16 for faster inference if args.use_fp16: tf.keras.mixed_precision.set_global_policy('mixed_float16') # strategy gpus = tf.config.experimental.list_physical_devices('GPU') devices = [ gpu.name[-5:] for gpu in gpus ] if args.strategy == 2: strategy = tf.distribute.MirroredStrategy(devices=devices) else: strategy = tf.distribute.OneDeviceStrategy(device=devices[0]) # dummy dataset batch_size = args.batch_size * strategy.num_replicas_in_sync data = np.random.uniform(size=[args.imagessamples, 224, 224, 3]) target = np.random.uniform(size=[args.imagessamples, 1], low=0, high=999).astype("int64") dataset = tf.data.Dataset.from_tensor_slices((data, target)) dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync) # define model with strategy.scope(): model = getattr(tf.keras.applications, args.model_name).ResNet50(weights=None) loss = tf.keras.losses.SparseCategoricalCrossentropy() opt optimizer = tf.optimizers.SGD(0.01) model.compile(optimizer=opt,optimizer, loss=loss) # fit callbacks = [] loss=tf.keras.losses.SparseCategoricalCrossentropy()) # fit model.fit(dataset, callbacks=callbacks, epochs=args.epochs, verbose=2) if __name__ == "__main__": main() |
Više grafičkih procesora na više čvorova
Warning |
---|
Pri definiranju traženih resursa, potrebno je osigurati jednak broj grafičkih procesora po čvoru. |
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/bin/bash #PBS -q gpu #PBS -l select=12:ncpus=8:ngpus=12:mem=10GB #PBS -l place=free #PBS -o output/ #PBS -e output/ # pozovi modul module load scientific/tensorflow/2.10.1-ngc # pomakni se u direktorij gdje se nalazi skripta cd $PBS${PBS_O_WORKDIR:-""} # potjeraj skriptu run-singlenodemultinode.sh singlegpumultigpu-multinode.py \ --strategy 1 \ --images 10240 \ --batch_size 256 \ --epochs 10 \ --use_fp16 | ||||||||
Code Block | ||||||||
| ||||||||
|
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
#!/usr/bin/env python3 # source: # - https://github.com/leondgarse/Keras_insightface/discussions/17 import os import sys import time import socket import argparse import numpy as np import tensorflow as tf def main(): # vars batch_size = 256 samples = 256*20 epochs = 10 # do not allocate all GPU memory gpus = tf.config.experimental.list_physical_devices('GPU') for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) # use fp16 for faster inference tf.keras.mixed_precision.set_global_policy('mixed_float16') # strategy communication_options = tf.distribute.experimental.CommunicationOptions( implementation=tf.distribute.experimental.CommunicationImplementation.NCCL) strategy = tf.distribute.MultiWorkerMirroredStrategy( communication_options=communication_options) # dataset data = np.random.uniform(size=[samples, 224, 224, 3]) target = np.random.uniform(size=[samples, 1], low=0, high=999).astype("int64") dataset = tf.data.Dataset.from_tensor_slices((data, target)) dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync) # define model with strategy.scope(): model = tf.keras.applications.ResNet50(weights=None) loss = tf.keras.losses.SparseCategoricalCrossentropy() optimizer = tf.optimizers.SGD(0.01) model.compile(optimizer=optimizer, loss=loss) # fit callbacks = [] verbose = 2 if os.environ['PMI_RANK'] == '0' else 0 model.fit(dataset, callbacks=callbacks, epochs=epochs, verbose=verbose) if __name__ == "__main__": main()#!/bin/bash #PBS -q gpu #PBS -l select=1:ncpus=32:ngpus=4 :mem=10GB #PBS -o output/ #PBS -e output/ # pozovi modul module load scientific/tensorflow/2.10.1-ngc # pomakni se u direktorij gdje se nalazi skripta cd $PBS_O_WORKDIR # potjeraj skriptu run-singlenode.sh benchmark.py \ --strategy 2 \ --images 10240 \ --batch_size 512 \ --epochs 10 \ --use_fp16 |
Napomene
Note | ||
---|---|---|
| ||
Ova knjižnica je dostavljena u obliku kontejnera, zbog opterećenja koje pip/conda virtualna okruženja stvaraju na Lustre dijeljenim datotečnim sustavima. Za ispravno izvršavanje python aplikacija, potrebno ih je koristiti wrapper wrappere run-singlenode.sh ili run-multinode.sh u skriptama sustava PBS:
|
...