Opis
Tensorflow je python knjižnica namijenjena razvoju aplikacija temeljenih na dubokom učenju koja se oslanja na ubrzanje grafičkim procesorima. Jedna od glavnih značajki ove knjižnice je postojanje API-a za brži razvoj modela strojnog učenja Keras, koja u sebi sadrži module i funkcije za svaki dio pipelinea u tipičnoj ML aplikaciji (preprocessing podataka, definicija modela, načina optimizacije i validacije)
Verzije
verzija | modul | Supek | Padobranred |
---|
2.10.1 | scientific/tensorflow/2.10.1-ngcgpu | |
|
2.12.0 | scientific/tensorflow/2.12.0 |
| |
2.15.0 | scientific/tensorflow/2.15.0 |
| |
Note |
---|
title | Korištenje aplikacije na Supeku |
---|
|
Python aplikacije i knjižnice na Supeku su dostavljene u obliku kontejnera i zahtijevaju korištenje wrappera kao što je opisano ispod. Više informacija o python aplikacijama i kontejnerima na Supeku možete dobiti na sljedećim poveznicama: |
Dokumentacija
...
Supek
Ispod se nalaze primjeri aplikacija umjetnog benchmarka koji testira performanse na modelu Resnet50.
Primjeri su redom:
- benchmarksinglegpu.py - python skripta umjetnog benchmarka
- singlegpu.sh - skripta sustava PBS koja koristi jedan grafički procesor
multigpu-singlenode.sh - skripta sustava PBS koja koristi više grafičkih procesora na jednom čvoru- * - skripte za pokretanje na jednom grafičkom procesoru
- multigpu-singlenode.* - skripte za pokretanje na više grafičkih procesora na jednom čvoru
- multigpu-multinode.* - skripte za pokretanje na više grafičkih procesora na jednom čvoru
Jedan grafički procesor
Code Block |
---|
language | bash |
---|
title | singlegpu.sh |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/bin/bash
#PBS -q gpu
#PBS -l select=1:ncpus=8:ngpus=1:mem=10GB
# pozovi modul
module load scientific/tensorflow/2.10.1-ngc
# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}
# potjeraj skriptu
run-singlenode.sh singlegpu.py |
Code Block |
---|
language | py |
---|
title | benchmarksinglegpu.py |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/usr/bin/env python3
# source:
# - https://github.com/leondgarse/Keras_insightface/discussions/17
import sys
import time
import argparse
import numpy as np
import tensorflow as tf
def main():
# vars
batch_size = 256
samples = 256 * 20
epochs = 10
# input arguments
parser do not allocate all GPU memory
gpus = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("-s",
"--strategy",
type=int, help="{1: OneDeviceStrategy, 2: MirroredStrategy}",
tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
# use fp16 for faster inference
tf.keras.mixed_precision.set_global_policy('mixed_float16')
# strategy
gpus = tf.config.experimental.list_physical_devices('GPU')
devices = [ gpu.name[-5:] for gpu in gpus ]
strategy = tf.distribute.OneDeviceStrategy(device=devices[0])
# dataset
data = np.random.uniform(size=[samples, 224, 224, 3])
target = np.random.uniform(size=[samples, 1], low=0, high=999).astype("int64")
dataset = tf.data.Dataset.from_tensor_slices((data, target))
dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync)
# define model
with strategy.scope():
model default=1)
parser.add_argument("-i",= tf.keras.applications.ResNet50(weights=None)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer "--images",= tf.optimizers.SGD(0.01)
model.compile(optimizer=optimizer, loss=loss)
# fit
callbacks = []
type=intmodel.fit(dataset,
callbacks=callbacks,
help="batch size",
epochs=epochs,
defaultverbose=10242)
parser.add_argument("-b",
if __name__ == "__main__":
main() |
Više grafičkih procesora na jednom čvoru
Code Block |
---|
language | bash |
---|
title | multigpu-singlenode.sh |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/bin/bash
#PBS -q gpu
#PBS -l select=1:ncpus=16:ngpus=2:mem=10GB
# pozovi modul
module load scientific/tensorflow/2.10.1-ngc
# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}
# potjeraj skriptu
run-singlenode.sh multigpu-singlenode.py |
Code Block |
---|
language | py |
---|
title | multigpu-singlenode.py |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/usr/bin/env python3
# source:
# - https://github.com/leondgarse/Keras_insightface/discussions/17
import sys
import time
import argparse
import numpy as np
import tensorflow as tf
def main():
# vars
batch_size = 256
samples = 256 * 20
epochs = 10
# do not allocate all GPU memory
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
-batch_size",
type=int,
help="batch size",
default=8)
parser.add_argument("-e",
tf.config.experimental.set_memory_growth(gpu, True)
# use fp16 for faster inference
tf.keras.mixed_precision.set_global_policy('mixed_float16')
# strategy
"--epochs",
gpus = tf.config.experimental.list_physical_devices('GPU')
devices = [ gpu.name[-5:] for gpu in gpus ]
strategy type=int,
= tf.distribute.MirroredStrategy(devices=devices)
# dataset
data = np.random.uniform(size=[samples, 224, 224, 3])
target = np.random.uniform(size=[samples, 1], low=0, helphigh=999).astype("epochs",
int64")
dataset = tf.data.Dataset.from_tensor_slices((data, target))
dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync)
# define model
with strategy.scope():
model default=10)
parser.add_argument("-m",= tf.keras.applications.ResNet50(weights=None)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer "--model_name",= tf.optimizers.SGD(0.01)
model.compile(optimizer=optimizer, loss=loss)
# fit
callbacks = []
type=strmodel.fit(dataset,
callbacks=callbacks,
help="model name",
epochs=epochs,
verbose=2)
if __name__ default== "ResNet50")
parser.add_argument("-f",
"--use_fp16",
action="store_true",
help="Use fp16")
args = parser.parse_known_args(sys.argv[1:])[0]
__main__":
main()
|
Više grafičkih procesora na više čvorova
Warning |
---|
Pri definiranju traženih resursa, potrebno je osigurati jednak broj grafičkih procesora po čvoru. |
Code Block |
---|
language | bash |
---|
title | multigpu-multinode.sh |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/bin/bash
#PBS -q gpu
#PBS -l select=2:ncpus=8:ngpus=2:mem=10GB
#PBS -l place=scatter
# pozovi modul
module load scientific/tensorflow/2.10.1-ngc
# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}
# potjeraj skriptu
run-multinode.sh multigpu-multinode.py
|
Code Block |
---|
language | py |
---|
title | multigpu-multinode.py |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/usr/bin/env python3
# source:
# - https://github.com/leondgarse/Keras_insightface/discussions/17
import os
import sys
import time
import socket
import argparse
import numpy as np
import tensorflow as tf
def main():
# vars
batch_size = 256
samples = 256*20
epochs = 10
# do not allocate all GPU memory
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
# use fp16 for faster inference
if args.use_fp16:
tf.keras.mixed_precision.set_global_policy('mixed_float16')
# strategy
gpus communication_options = tf.configdistribute.experimental.list_physical_devices('GPU')
devices = [ gpu.name[-5:] for gpu in gpus ]
if args.strategy == 2:CommunicationOptions(
implementation=tf.distribute.experimental.CommunicationImplementation.NCCL)
strategy = tf.distribute.MirroredStrategyMultiWorkerMirroredStrategy(devices=devices)
else:
strategy = tf.distribute.OneDeviceStrategy(device=devices[0] communication_options=communication_options)
# dummy dataset
batch_size = args.batch_size * strategy.num_replicas_in_sync
data = np.random.uniform(size=[args.imagessamples, 224, 224, 3])
target = np.random.uniform(size=[args.imagessamples, 1], low=0, high=999).astype("int64")
dataset = tf.data.Dataset.from_tensor_slices((data, target))
dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync)
# define model
with strategy.scope():
model = getattr(tf.keras.applications, args.model_name).ResNet50(weights=None)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
opt optimizer = tf.optimizers.SGD(0.01)
model.compile(optimizer=opt,optimizer, loss=loss)
# fit
callbacks = []
verbose = 2 loss=tf.keras.losses.SparseCategoricalCrossentropy())
# fit
if os.environ['PMI_RANK'] == '0' else 0
model.fit(dataset,
callbacks=callbacks,
epochs=args.epochs,
verbose=2)=verbose)
if __name__ == "__main__":
main() |
Padobran
Ispod se nalaze primjeri aplikacija umjetnog benchmarka koji testira performanse na modelu Resnet50.
Primjeri su redom:
- singlenode.* - skripte za pokretanje na jednom čvoru
Jedan čvor
Code Block |
---|
language | bash |
---|
title | singlegpusinglenode.sh |
---|
linenumbers | true |
---|
collapse | true |
---|
|
#!/bin/bash
#PBS -q gpucpu
#PBS -l select=1:ncpus=8:ngpus=1:mem=10GB32
#PBS -o output/
#PBS -e output/l mem=50GB
# pozovidopremi modul
module load scientific/tensorflow/2.10.1-ngc12.0
# postavi broj cpu jezgri
export OMP_NUM_THREADS=${NCPUS}
export TF_NUM_INTEROP_THREADS=${NCPUS}
export TF_NUM_INTRAOP_THREADS=${NCPUS}
# pomakni se u direktorij gdje se nalazi skriptai pokreni
cd $PBS${PBS_O_WORKDIR}
# potjeraj skriptu
run-singlenode.sh singlegpu.py \
--strategy 1 \
--images 10240 \
--batch_size 256 \
--epochs 10 \
--use_fp16 |
Code Block |
---|
language | py |
---|
title | multigpu-singlenode.py |
---|
linenumbers | true |
---|
collapse | true |
---|
|
python singlenode.py |
Code Block |
---|
language | py |
---|
title | singlenode.py |
---|
linenumbers | true |
---|
collapse | true |
---|
|
import sys
import time
import argparse
import numpy as np
import tensorflow as tf
def main():
# vars
batch_size = 16
samples = 16*10
epochs = 3
# dataset
data = np.random.uniform(size=[samples, 224, 224, 3])
target = np.random.uniform(size=[samples, 1], low=0, high=999).astype("int64")
dataset = tf.data.Dataset.from_tensor_slices((data, target))
dataset = dataset.batch(batch_size)
# define model
model = tf.keras.applications.ResNet50(weights=None)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.optimizers.SGD(0.01)
model.compile(optimizer=optimizer, loss=loss)
# fit
callbacks = []
model.fit(dataset,
callbacks=callbacks,
epochs=epochs,
verbose=1)
if __name__ == "__main__":
main()#!/bin/bash
#PBS -q gpu
#PBS -l select=1:ncpus=32:ngpus=4 :mem=10GB
#PBS -o output/
#PBS -e output/
# pozovi modul
module load scientific/tensorflow/2.10.1-ngc
# pomakni se u direktorij gdje se nalazi skripta
cd $PBS_O_WORKDIR
# potjeraj skriptu
run-singlenode.sh benchmark.py \
--strategy 2 \
--images 10240 \
--batch_size 512 \
--epochs 10 \
--use_fp16 |
Napomene
Note |
---|
title | Apptainer i run-singlenode.sh |
---|
|
Ova knjižnica je dostavljena u obliku kontejnera, zbog opterećenja koje pip/conda virtualna okruženja stvaraju na Lustre dijeljenim datotečnim sustavima. Za ispravno izvršavanje python aplikacija, potrebno ih je koristiti wrapper wrappere run-singlenode.sh ili run-multinode.sh u skriptama sustava PBS: Code Block |
---|
...
run-singlenode.sh moja_python_skripta.py
... |
|
...