TULIP Resources‎ > ‎

Cluster Usage

Overview

What you need before reading this tutorial
  • An account on the computer cluster with PBS support 
    • for example, VPAC username, password
  • SSH Client, such as PUTTY

This document is written by Yongli Ren, Quan, TULIP Lab, based on resource taken from VPAC Tutorial Website and the README file for Linux published with Matlab.

In this tutorial, we will present a overall introduction to the usage of cluster. Specifically, we are going to introduce the usage of  VPAC and Massive. Normally, the host of the VPAC cluster is tango.vpac.org (trifid.vpac.org), and the host for the Massive cluster is m2.massive.org.au. In addition, to use these clusters, you need to first apply for an account that will be used to log into the cluster. So, to log into the cluster, you need to issue in the terminal on local host:

ssh username@hostname

After logging into the cluster, you can run some linux commands there which will be discussed in Section 6.

However, to run some HPC job on the cluster is a bit different from the connection to them, we will focus on how to run HPC jobs (C, C++, Java and Matlab jobs) on these clusters.  

The structure of this tutorial is as follows. Section 2 presents how to set up your laptop to access the cluster for HPC job submission; Section 3 presents how to submit Matlab HPC job on VPAC, and Section 4 focus on how to submit Matlab jobs on Massive; Section 5 presents how to run C, C++ and Java jobs; finally, some practical tips on how to use clusters efficiently are presented in Section 6.

How to Connect to VPAC/Massive

  • 1). Set up SSH on linux.

Run the following commands in order:

ssh-keygen,

ssh-copy-id user@tango.vpac.org %% replacing user with your own VPAC account,

and ssh-add.

This step is shown in Figure 1.


Figure 1. Set up SSH


If you have already had your private/public key, please log into the cluster, and go to folder ./ssh and open the file, authorised_keys, and put your public key as a separate line in the file and save it.


After that, you will set up a passwordless ssh where you can access to VPAC without password requirement. This step is vital, because Matlab needs this passwordless ssh configuration to submit jobs to VPAC. You can check this by running command:

ssh user@tango.vpac.org  %%% replacing user with your own VPAC account.

If you can access to VPAC without any password requirement, you succeed in this step.


  • 2). Setup Matlab working environment at local host.

At this step, it would be easy for Windows user, but for Linux user, it would be a little bit tricky, since Linux is a multi-user system. For some Linux users, they prefer to install softwares with the root account, which is a good habit but may cause some problems for changing those software’s environment manually. For instance, if you installed Matlab with Root account, you will not be able to change files or paths in Matlab folder with your working account. This would put some difficulties in this step. But no worries, we can figure out one way to solve those problems.


  • a). In Matlab is toolbox, there is a one that is vital for VPAC job submission. It is the “distcomp” folder:

$MATLABROOT/toolbox/distcomp


Copy this “distcomp” folder to a new folder which is created with your working account. Then, we can start from there, since you have all the permission for the newly created folders. More importantly, add the new folder and its subfolders in Matlab’s path. We will use $distcomp represent where you create the new “distcomp” folder.


  • b). Go to folder: $distcomp/@distcomp/@abstractscheduler

Rename pGetJobState.m to OLD_pGetJobState.m_OLD and pDestroyJob.m to

OLD_pDestroyJob.m_OLD. or you can directly delete these two files.


  • c). Copy

$distcomp/examples/integration/pbs/nonshared/unix/pGetJobState.m

and place it in

$distcomp/@distcomp/@abstractscheduler/


  • d). Copy

$distcomp/examples/integration/pbs/nonshared/unix/pDestroyJob.m

and place it in

$distcomp/@distcomp/@abstractscheduler/


        The main purpose of this step is to set up the PBS environment on local host to submit jobs to VPAC cluster.

  • 3). Copy new submit function from VPAC to the local host.

Since the standard Parallel and Simple Submit Functions have been modified by VPAC. So we need to copy two more files from VPAC:

Copy

pbsNonSharedParallelSubmitFcn.m and pbsNonSharedSimpleSubmitFcn.m, on tango.vpac.org from: /common/examples/matlab

To

$distcomp/examples/integration/pbs/nonshared/unix


  • 4). Finally, once again, please NOTE: The account name you use to run Matlab on Linux should be exactly the same as the one you have for VPAC, namely your VPAC account. You can run “whoami” command:

[user@your laptop] $ whoami

user            %%% the same as your VPAC account.


 * For Massive connection, please replace user@tango.vpac.org with your massive account and the domain name: user@m2.massive.org.au

How to submit a Matlab job to VPAC

  • 1). Run simple jobs using Matlab Distributed Computing

    The concept for this process is that you will write some simple code to send to VPAC server and it will run the code there then return back to you the result. You can copy the following code put in an .m file and run it your Matlab directly
    (make sure that all the above sections is setup correctly)
    Sistributedcomputingsimple.m

   clusterHost = 'tango.vpac.org';     


   remoteDataLocation = '/home/xxxxxxxx'; //replace this by your own location in your account space where you want to store the job data


   sched = findResource('scheduler', 'type', 'generic');


   get(sched)


   set(sched, 'DataLocation', 'C:\matlab\share'); //you can choose and folder in your computer to be share file system - matlab will create some job data in thic location.

   set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008a'); // change the version of matlab to your version of matlab such as R2009a, R2008b,

   set(sched, 'HasSharedFilesystem', true);

   set(sched, 'ClusterOsType', 'unix');


   set(sched, 'SubmitFcn', {@pbsNonSharedSimpleSubmitFcn, clusterHost, remoteDataLocation});


   j = createJob(sched);  //create a job to send to vpac


   get(j)


   createTask(j, @rand, 1, {3,3});  //create task for this job to generate randomly 3x3 matrix

   createTask(j, @rand, 1, {3,3}); //create task for this job to generate randomly 3x3 matrix

 

   get(j,'Tasks')


   submit(j)

   waitForState(j)

   results = getAllOutputArguments(j);

   results{1:2}

                
  • 2). Run Parallel Job using Matlab Distributed Computing


Paralleljob.m

clusterHost = 'tango.vpac.org';     


remoteDataLocation = '/home/xxxxxxxx'; //replace this by your own location in your account space where you want to store the job data


sched = findResource('scheduler', 'type', 'generic');


get(sched)


set(sched, 'DataLocation', 'C:\matlab\share'); //you can choose and folder in your computer to be share file system - matlab will create some job data in thic location.

set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008a'); // change the version of matlab to your version of matlab such as R2009a, R2008b,

set(sched, 'HasSharedFilesystem', true);

set(sched, 'ClusterOsType', 'unix');

set(sched, 'ParallelSubmitFcn', {@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});  //the pbsNonSimpleParallelSubmitFcn from above example has been changed to pbsNonSharedParallelSubmitFcn


pjob = createParallelJob(sched)


//Create a parallel task using the rand function to provide 3 random numbers.


createTask(pjob, @rand, 1, {3});


set(pjob,'MinimumNumberOfWorkers',5);

set(pjob,'MaximumNumberOfWorkers',9);


submit(pjob)

waitForState(pjob, 'finished')

results = getAllOutputArguments(pjob)

celldisp(results);


  • 3). Using FileDependency property to send code and data to VPAC:
Assume, we a outine code saved in a file test.m, this code load a data file data.mat to perform some operation. we want to  send these files  to vpac and execute them. We can add a piece of code  to  Sistributedcomputingsimple.m, right after createJob() function:

    set(j ,'FileDependencies' , {‘test.m’,’data.mat'}); // this call the FileDependiencies function to send our file to VPAC, in this example, we assume that these 2 files are stored in the working directory of Malab. Otherwise, an absolute path should be provided. e.g. ‘C:\matlab\share\test.m’
   createTask(j, @test , 1, {}); //this task will tell server to run the test.m function

  • 4). Using PathDependency to run code and data files already stored at VPAC:
This method is useful when our data set is big, which is inconvenient to transfer to VPAC every times using FileDependency property. We can put our datasets on VPAC and tell matlab to access our files and run it Assume, we have a big data file data.mat and a function test.m already stored at VPAC. We can run these files by adding the below code right after  createJob() function in  Sistributedcomputingsimple.m:

  set(j ,'PathDependencies' , {'/home/username//matlab/share/data.mat'});
  set(j ,'PathDependencies' , {'/home/username//matlab/share/test.mat'});
 //add the path to the place you store your dataset file on vpac to matlab path, assume these files are stored in '/home/username//matlab/share/ folder on VPAC.
  createTask(j, @test , 1, {}); //execute the test function.

An useful tips: In most cases, the time required for experiment running is long. It is inconvenient to submit the matlab job to VPAC and wait for it complete to retrieve output result. We would want to submit our jobs to VPAC and let it run without keeping Matlab opened on our computer. A simple trick to do this is to remove all the code after submit() function in Sistributedcomputingsimple.m. We no longer need to wait for the job to complete to retrieve result, but we can close Matlab after the job is submitted.The output result can be retrieved by adding a piece of code to save output result into a mat file, which can be download from VPAC anytime after the jobs are finished.  

How to submit a Matlab job to Massive

Submit your Matlab job to Massive will be relatively easier, as you can run your Matlab code from command line. The following script shows you how to configure the working path, and then submit your Matlab job to Massive.

run.pbs:

#!/bin/bash


###### Select resources #####

#PBS -N myJob                    %change myJob to the name of your job

#PBS -l nodes=1

#PBS -l walltime=7:00:00:00

#PBS -l pmem=2000MB


#### Output File #####

#PBS -o output.out              % save the output to your working directory.


#### Error File #####

#PBS -e error.err                % save the error message to your working directory.


##### Change to current working directory #####

cd $PBS_O_WORKDIR       % enter the current working directory

##### Execute Program #####

module load matlab               %load matlab module.


matlab -nodisplay -nodesktop -nojvm -nosplash < main.m;    %%% main.m is the matlab function you would like to run.


A benefit from the above script is that you do not need to configure the working directory every time you run a job.

How to run Java/C++/C job

To run a Java/C++/C PBS job on VPAC/Massive is easier and similar to the above process. Basically, you run the jobs from the command line using a script file:

  • For Java job submission

run.pbs:

#!/bin/bash

#PBS -S /bin/bash

#PBS -N jobName

#PBS -l nodes=1

#PBS -l pmem=2G

#PBS -m ae


cd $PBS_O_WORKDIR       % enter the current working directory

module load java                  % load the java module


javac myExample.java          % compile your java code


java -Xms1000m -Xmx2000m -classpath \

:/home/yongli/... \                %% add the packages used in your java code

Benchmark                          % run your code as a normal program from command line.


  • For C++/C job submission.

    run.pbs:
    #!/bin/bash

#PBS -S /bin/bash

#PBS -N jobName

#PBS -l nodes=1

#PBS -l pmem=2G

#!change the working directory (default is home directory)

cd $PBS_O_WORKDIR       % enter the current working directory


./yourProgram       

Practical Tips

1). Useful commands:

As the usage of cluster is no user-interactive, it will be necessary to use some commands to check the status or interact with your HPC jobs. The following are some practical commands that can be run on the cluster host to check the status of the job or interact with the jobs (e.g. delete a existing job).

showq | grep username: This command can help ‘username’ to check the running status of the submitted jobs, e.g. running, idle, or held.

qstat -f JobID: check what is a specific job, e.g. where the job was submitted, and from which pbs script.

qdel: to delete the job you want to terminate.

2). More efficient way to use VPAC:

When using cluster with your own experiments, it is normally required to download the experiment results to your laptop. The following small script can free your hand: this small script could help you to update results on VPAC to your local host more efficiently, especially when you are running a batch of experiments.

function MyVersioncopyDataFromCluster(localLoc, remoteLoc, clusterHost)

%COPYDATAFROMCLUSTER Copies files or directories from a location

% on a remote host to the local machine.


% Copyright 2006 The MathWorks, Inc.


% Use scp to copy files.

copyCmd = sprintf('scp -r "%s:%s" %s', clusterHost, remoteLoc, localLoc);


[s, r] = system(copyCmd);

if s ~= 0

   fprintf('distcomp:scheduler:FailedRemoteOperation \n', ...

       ['Failed to copy files from "%s" on the host "%s"\n' ...

       'to "%s".\n' ...

       'Command Output:\n' ...

       '"%s"\n' ...

       ], remoteLoc, clusterHost, localLoc, r);

end


3) How to submit VPAC jobs by using more than one accounts but on one desktop.


a). To access more than one VPAC/Massive account from your laptop, you must create another system account with the new  VPAC account on your laptop:

add system account:

sudo useradd --system -m -s /bin/bash userName

sudo useradd --system --shell /bin/bash -m userName

sudo passwd userName


b). Then, if you have set up the Matlab environment as the above instructions 2.2 and 2.3, you only need to do instruction 2.1 with the new VPAC account. If not, you need to follow all the instructions in section 2.  


ċ
run-c⁄c++.pbs
(0k)
Yongli Ren,
Feb 12, 2013, 3:54 AM
ċ
run-java.pbs
(0k)
Yongli Ren,
Feb 12, 2013, 3:54 AM
ċ
run-matlab-massive.pbs
(0k)
Yongli Ren,
Feb 12, 2013, 3:54 AM