TULIP Resources‎ > ‎

Submitting Java/WEKA Projects to a Computer Cluster using PBS


What you will learn:
  • How to submit a Java project to VPAC using PBS
  • How to transfer files from your local computer to a computer cluster 
What you need before reading this tutorial
  • An account on the computer cluster with PBS support (for example, VPAC username, password)
  • Secure copy software, like WinSCP, etc.
This document is written by Quan Vu, Gang Li . With some resource is taken from VPAC Tutorial Website.

Transferring files using WinSCP

    WinSCP is most suited for general use and we will concentrate on it here.

    WinSCP is available from: http://winscp.net

    WinSCP (Windows Secure Copy) is a graphical open source SFTP (and FTP) client for MS-Windows. It uses ssh and supports SCP     (secure copy). It can also provide basic file management and remote editing.

this is a picture of winscp


       Generally we advise Windows Users to use Putty. Putty is what is called a "terminal programme" that you use to connect to VPAC             systems from your own (desktop or laptop) computer. Putty is free and very easy to install.

      Putty is available from: http://www.chiark.greenend.org.uk/~sgtatham/putty/

      It probable that all you need is putty.exe, its quite small and does not need to be installed, just save it to your desktop and double click when you need it.

Configuring Putty

Config of putty
    In the 'Host Name' box, enter the server you want to connect to (e.g., tango.vpac.org) and select ssh from the 'Connection type' radio     button. Its useful to enter a session name, "Tango" in the above case and save it so you don't need to remember the details next time.

    Generally, the other Putty settings will be fine as they are. One thing you might need if you are going to be using XWindows (to display a     graphical interface from VPAC on your desktop) is to turn on XForwarding. You will also need some sort of "XWindows Server" installed on         your desktop, perhaps XWin32 or Exceed3D. A possible free option is XMing, http://www.straightrunning.com/XmingNotes/

X11 putty

    When you connect, you will be asked to verify the host key when connecting for the first time when a box pops up. Say 'yes', you will only see the first time you connect to a particular machine.

How to run java ( using PBS script to submit the job)

   After knowing those tools above, it is enough for use to log in the VPAC account using user name and password to run java job. Ingeneral, WINCSP is used for manage our files normally as in window explorer, and PUTTY is for using command to submit the job and other operations of our work.

   For running java, first we need to make sure it run successful on our local machine
  Let say we have a java file called rf1.java as below: this code will load the output.arff file (download this data file from the attachment below) and train it using random forest algorithm then out put a model file called  rdfr.model.


import weka.classifiers.trees.RandomForest;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.ObjectOutputStream;
import java.io.PrintWriter;
import java.util.Random;

public class rf2
    public static void main(String[] args) throws Exception
     ArffLoader loader = new ArffLoader();
     loader.setSource(new File("output.arff"));
     Instances dataset = loader.getDataSet();
     dataset.setClassIndex(dataset.numAttributes() - 1);

     //build tree mode from original dataset.
     RandomForest rdfr = new RandomForest();

     ObjectOutputStream oos = new ObjectOutputStream(
             new FileOutputStream("rdfr.model"));

  Copy the above code and save it in a rf2.java file then add weka.jar into the same folder with this code (I assume that you installed weka software in your computer so the weka.jar file should be available in that weka folder)
   To run java on your computer:
open command promt, type javac -classpath weka.jar rf2.java  it will compile the java code file.
 then to run the code type : java -classpath .;weka rf2   it will execute the complied code and use the dataset output.arff to train a randomforest model then save the model in the same folder.You will see the rdfr.model created after you run this program.

   + log in accoung using WINSCP and transfer all the file in this packet to your accound folder in vpac. let say we transfer the file rf2.class and weka.jar and output.arff to that server. (we can compile the java code in vpac but make it simple just assume that you got the complile file already and we use it right away.)
   + After that, you need to write a PBS script file to run our java job(see the test.pbs file in the packet)


        #PBS -S /bin/bash
        #PBS -N TreeTest
        #PBS -l nodes=1
        #PBS -l pmem=1G
        #!change the working directory (default is home directory)
        cd quan       (in this case I save the file in a folder called: quan)
        #! Running the  program.
        module load java           
        java -classpath .:weka.jar rf2

please read more about pbs script in this link http://www.vpac.org/tutorials/Submitting_and_Running_HPC_Jobs 
now you already got the test.pbs file in the same folder with other files. then log in PUTTY command window go to the folder you saved the files: (cd quan) then type the command (qsub test.pbs) your job will be submitted to run, you can check the status by type: qstat
It will run java and get the model as in your local computer.

For more information and some more trick relating to this tutorial, please contact  Quan Vu

Huy Quan Vu,
Oct 1, 2009, 6:21 PM