Appendix > Exporting an R Workspace and Training in a Cloud Computing Environment

Training an ML model can be a resource- and time-consuming task. Therefore it is often reasonable to perform this task using remote computing power. That can be either a server within the same institution or so-called Cloud Computing services (e.g. Amazon Web Services, Microsoft Azure). The architecture can be optimized for Machine Learning calculations. In both cases PMOD’s PSEG tool provides a convenient way to transfer the data to the computing unit via an R Workspace. Such a solution guarantees that the time required on the computing machine will be spent entirely on the training process (since the data stored in the R Workspace can be already preprocessed in PSEG). Software setup can also be simplified, as there is no need to install PMOD on the computing unit.

Please note that the short guide below assumes that appropriate versions of R (with the required packages) and TensorFlow libraries are provided and configured on the computing unit (See Installation of R and Python for UNIX platforms).

Remote training following Export of an R workspace (overview):

1.Local machine: Create / load the Learning Set in the PSEG tool according to the Documentation above

2.Remember to configure all training parameters, as if preparing for a standard local training session

3.Select “Export R Workspace”

4.After the export is finished please select a path and a name for the Workspace

5.Transfer the R Workspace and “pm.ai.tar.gz” to the computing unit (“pm.ai.tar.gz” is an R package created by PMOD. It is required to run machine learning-based processing. You can find it in your PMOD installation folder in the subfolder: “Pmod4.4/resources/extlibs/r/lib/”)

6.Start R on the computing unit

7.Run the training process in the R environment:

install.packages(“~/.../pm.ai.tar.gz”)

library(“pm.ai”)

library(“keras”)

library(“stringr”)

load.workspace(“~/.../Workspace.RData”)

pm.ai.learn()

To activate the tensorflow 2.3 environment, use the following terminal command “source activate tensorflow2_latest_p37”.

To monitor the GPU card usage, use the terminal command “watch -n0.1 nvidia-smi”. This allows you to see the resources used at an interval of 0.1 second (the time interval can be changed).

The “htop” command allows you to see the Memory and CPU usage.