Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: multigpu-multinode added




Table of Contents
maxLevel2

Opis

Tensorflow je python knjižnica namijenjena razvoju aplikacija temeljenih na dubokom učenju koja se oslanja na ubrzanje grafičkim procesorima. Jedna od glavnih značajki ove knjižnice je postojanje API-a za brži razvoj modela strojnog učenja Keras, koja u sebi sadrži module i funkcije za svaki dio pipelinea u tipičnoj ML aplikaciji (preprocessing podataka, definicija modela, načina optimizacije i validacije)

...

Ispod se nalaze primjeri aplikacija umjetnog benchmarka koji testira performanse na modelu Resnet50.

Primjeri su redom:

  • benchmarksinglegpu.py - python skripta umjetnog benchmarka
  • singlegpu.sh - skripta sustava PBS koja koristi jedan grafički procesor
  • multigpu-singlenode.sh - skripta sustava PBS koja koristi više grafičkih procesora na jednom čvoru
  • * - skripte za pokretanje na jednom grafičkom procesoru
  • multigpu-singlenode.* - skripte za pokretanje na više grafičkih procesora na jednom čvoru
  • multigpu-multinode.* - skripte za pokretanje na više grafičkih procesora na jednom čvoru

Jedan grafički procesor

Code Block
languagebash
titlesinglegpu.sh
linenumberstrue
collapsetrue
 #!/bin/bash

#PBS -q gpu
#PBS -l select=1:ncpus=8:ngpus=1:mem=10GB
#PBS -o output/
#PBS -e output/

# pozovi modul
module load scientific/tensorflow/2.10.1-ngc

# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}

# potjeraj skriptu
run-singlenode.sh singlegpu.py
Code Block
languagepy
titlebenchmarksinglegpu.py
linenumberstrue
collapsetrue
 #!/usr/bin/env python3

# source:
# - https://github.com/leondgarse/Keras_insightface/discussions/17

import sys
import time
import argparse
import numpy as np
import tensorflow as tf

# input arguments
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("-s",def main():

    # vars
    batch_size = 256
    samples = 256   "--strategy",* 20
    epochs = 10

    # do not allocate all GPU memory
    type=int,gpus help="{1: OneDeviceStrategy, 2: MirroredStrategy}", tf.config.experimental.list_physical_devices('GPU')
    for gpu in gpus:
             default=1)
parser.add_argument("-i",
tf.config.experimental.set_memory_growth(gpu, True)

    # use fp16 for faster inference
    tf.keras.mixed_precision.set_global_policy('mixed_float16')

       "--images",# strategy
    gpus = tf.config.experimental.list_physical_devices('GPU')
    devices = [ gpu.name[-5:] for gpu in    type=int,gpus ]
    strategy = tf.distribute.OneDeviceStrategy(device=devices[0])

    # dataset
      data   help="batch size",
    np.random.uniform(size=[samples, 224, 224, 3])
    target = np.random.uniform(size=[samples, 1],          default=1024)
parser.add_argument("-b",low=0, high=999).astype("int64")
    dataset = tf.data.Dataset.from_tensor_slices((data, target))
    dataset         "--batch_size",= dataset.batch(batch_size*strategy.num_replicas_in_sync)

    # define model
    with strategy.scope():
        model  type=int,= tf.keras.applications.ResNet50(weights=None)
        loss = tf.keras.losses.SparseCategoricalCrossentropy()
        optimizer  help="batch size", tf.optimizers.SGD(0.01)
                    default=8)
parser.add_argument("-e",model.compile(optimizer=optimizer, loss=loss)

    # fit
    callbacks = []
         "--epochs"model.fit(dataset,
                    type=intcallbacks=callbacks,
                    help="epochs"epochs=epochs,
                    default=10)
parser.add_argument("-m",
                    "--model_name",
                    type=str,
                    help="model name",
                    default="ResNet50")
parser.add_argument("-f",
                    "--use_fp16",
                    action="store_true",
                    help="Use fp16")
args = parser.parse_known_args(sys.argv[1:])[0]

verbose=2)

if __name__ == "__main__":
    main()

Više grafičkih procesora na jednom čvoru

Code Block
languagebash
titlemultigpu-singlenode.sh
linenumberstrue
collapsetrue
 #!/bin/bash

#PBS -q gpu
#PBS -l select=1:ncpus=16:ngpus=2:mem=10GB
#PBS -o output/
#PBS -e output/

# pozovi modul
module load scientific/tensorflow/2.10.1-ngc

# pomakni se u direktorij gdje se nalazi skripta
cd ${PBS_O_WORKDIR:-""}

# potjeraj skriptu
run-singlenode.sh multigpu-singlenode.py
Code Block
languagepy
titlemultigpu-singlenode.py
linenumberstrue
collapsetrue
 #!/usr/bin/env python3

# source:
# - https://github.com/leondgarse/Keras_insightface/discussions/17

import sys
import time
import argparse
import numpy as np
import tensorflow as tf

def main():

    # vars
    batch_size = 256
    samples = 256 * 20
    epochs = 10

    # do not allocate all GPU memory
    gpus = tf.config.experimental.list_physical_devices('GPU')
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

    # use fp16 for faster inference
if args.use_fp16:
    tf.keras.mixed_precision.set_global_policy('mixed_float16')

    # strategy
    gpus = tf.config.experimental.list_physical_devices('GPU')
    devices = [ gpu.name[-5:] for gpu in gpus ]
if args.strategy == 2:
    strategy = tf.distribute.MirroredStrategy(devices=devices)
else:
    strategy = tf.distribute.OneDeviceStrategy(device=devices[0])

# dummy dataset
batch_size = args.batch_size * strategy.num_replicas_in_sync
data = np.random.uniform(size=[args.imagessamples, 224, 224, 3])
    target = np.random.uniform(size=[args.imagessamples, 1], low=0, high=999).astype("int64")
    dataset = tf.data.Dataset.from_tensor_slices((data, target))
    dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync)

    # define model
    with strategy.scope():
        model = getattr(tf.keras.applications, args.model_name).ResNet50(weights=None)
        loss = tf.keras.losses.SparseCategoricalCrossentropy()
      opt  optimizer = tf.optimizers.SGD(0.01)
        model.compile(optimizer=opt,optimizer, loss=loss)

    # fit
    callbacks =  []
      loss=tf.keras.losses.SparseCategoricalCrossentropy())

# fit
model.fit(dataset,
              callbacks=callbacks,
              epochs=args.epochs,
              verbose=2)

if __name__ == "__main__":
    main()
	

Više grafičkih procesora na više čvorova

Warning

Pri definiranju traženih resursa, potrebno je osigurati jednak broj grafičkih procesora po čvoru.

Code Block
languagebash
titlesinglegpumultigpu-multinode.sh
linenumberstrue
collapsetrue
#!/bin/bash

#PBS -q gpu
#PBS -l select=12:ncpus=8:ngpus=12:mem=10GB
#PBS -l place=free
#PBS -o output/
#PBS -e output/

# pozovi modul
module load scientific/tensorflow/2.10.1-ngc

# pomakni se u direktorij gdje se nalazi skripta
cd $PBS${PBS_O_WORKDIR:-""}

# potjeraj skriptu
run-singlenodemultinode.sh singlegpumultigpu-multinode.py \
  --strategy 1 \
  --images 10240 \
  --batch_size 256 \
  --epochs 10 \
  --use_fp16 
Code Block
languagepy
titlemultigpu-singlenode.sh
linenumberstrue
collapsetrue

 
Code Block
languagepy
titlemultigpu-multinode.py
linenumberstrue
collapsetrue
 #!/usr/bin/env python3

# source:
# - https://github.com/leondgarse/Keras_insightface/discussions/17

import os
import sys
import time
import socket
import argparse
import numpy as np
import tensorflow as tf

def main():

    # vars
    batch_size = 256
    samples = 256*20
    epochs = 10

    # do not allocate all GPU memory
    gpus = tf.config.experimental.list_physical_devices('GPU')
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

    # use fp16 for faster inference
    tf.keras.mixed_precision.set_global_policy('mixed_float16')

    # strategy
    communication_options = tf.distribute.experimental.CommunicationOptions(
        implementation=tf.distribute.experimental.CommunicationImplementation.NCCL)
    strategy = tf.distribute.MultiWorkerMirroredStrategy(
        communication_options=communication_options)

    # dataset
    data = np.random.uniform(size=[samples, 224, 224, 3])
    target = np.random.uniform(size=[samples, 1], low=0, high=999).astype("int64")
    dataset = tf.data.Dataset.from_tensor_slices((data, target))
    dataset = dataset.batch(batch_size*strategy.num_replicas_in_sync)

    # define model
    with strategy.scope():
        model = tf.keras.applications.ResNet50(weights=None)
        loss = tf.keras.losses.SparseCategoricalCrossentropy()
        optimizer = tf.optimizers.SGD(0.01)
        model.compile(optimizer=optimizer, loss=loss)

    # fit
    callbacks = []
    verbose = 2 if os.environ['PMI_RANK'] == '0' else 0
    model.fit(dataset,
              callbacks=callbacks,
              epochs=epochs,
              verbose=verbose)

if __name__ == "__main__":
    main()#!/bin/bash

#PBS -q gpu
#PBS -l select=1:ncpus=32:ngpus=4	:mem=10GB
#PBS -o output/
#PBS -e output/

# pozovi modul
module load scientific/tensorflow/2.10.1-ngc

# pomakni se u direktorij gdje se nalazi skripta
cd $PBS_O_WORKDIR

# potjeraj skriptu
run-singlenode.sh benchmark.py \
  --strategy 2 \
  --images 10240 \
  --batch_size 512 \
  --epochs 10 \
  --use_fp16 


Napomene

Note
titleApptainer i run-singlenode.sh

Ova knjižnica je dostavljena u obliku kontejnera, zbog opterećenja koje pip/conda virtualna okruženja stvaraju na Lustre dijeljenim datotečnim sustavima.

Za ispravno izvršavanje python aplikacija, potrebno ih je koristiti wrapper wrappere run-singlenode.sh ili run-multinode.sh u skriptama sustava PBS:

Code Block
...
run-singlenode.sh moja_python_skripta.py
...


...