GridSAM
From NGSWiki
GridSAM is a Web service for job submission that uses the JSDL (http://en.wikipedia.org/wiki/Job_Submission_Description_Language) specification from OGF. Its evolution is tracking the development of the OGSA-BES specification, also from OGF.
For general information about using GridSAM, please see the GridSAM webpages at http://gridsam.sourceforge.net.
As GridSAM uses JSDL and OGSA-BES, it provides a standards-based mechanism for submitting jobs to NGS resources. Software like the Application Hosting Environment use GridSAM to submit jobs.
| Table of contents |
NGS GridSAM Endpoint to Access STFC and Oxford NGS Resources
Oxford NGS (http://www.oerc.ox.ac.uk/resources/ngs/) provides a GridSAM instance which allows submission to ngs.oerc.ox.ac.uk via Globus. The GridSAM instance can be reached at:
- https://gridsam.oerc.ox.ac.uk:18443/gridsam/services/gridsam - Submit jobs to ngs.rl.ac.uk
- https://gridsam-test.oerc.ox.ac.uk:18443/gridsam/services/gridsam - Submits jobs to ngs.oerc.ox.ac.uk
Please note: this service is currently under test. Until it moves into production it should be deemed to be 'at risk' and sensitive jobs should be run directly on the core NGS nodes or from the portal.
Installing GridSAM
- Download the OMII-UK Campus Grid Toolkit Client for UNIX (https://gridsam.oerc.ox.ac.uk:18443/download_client.jsp?osType=linux-unknown)
- Untar the resulting archive and run the installation script CampusGridToolkitClientInstall.sh or CampusGridToolkitClientGuiInstall.sh
- Switch the client to using e-science certificates (http://www.omii.ac.uk/docs/3.4.0/installation_guide/omii_3_installation_and_setup_guide.htm). Select "Security"->"X.509 Certiifcates"->"Replacing the Temporary OMII Certificate"->"Using a non-OMII Certificate with a Client" - you will need follow the instructions under "Procedure for importing a non-OMII Certificate" but NOT "Disabling Trust of the OMII CA"
You may want to read about Managed Programme (http://www.omii.ac.uk/docs/3.4.0/installation_guide/omii_3_installation_and_setup_guide.htm): select "Client-side Installation/Uninistallation"->"Installation"->"How to install the SC Client"->"How to install Software Component Clients" in the side bar. Please note that AHE is not supported by the Oxford GridSAM instance.
Using GridSAM
In order to use the GridSAM instance you need:
- Obtaining a UK e-Science certificate (https://ca.grid-support.ac.uk/cgi-bin/pub/pki?cmd=getStaticPage&name=index)
- Creating a Proxy Certificate that will be used by GridSAM to operate in your place (i.e. submitting jobs to globus, staging in and out files)
- Describing your job in an extensible XML specification language called JSDL
- Submitting and monitoring your job via the GridSAM client
Proxy Certificate
Before using GridSAM with NGS resources you will need to ensure that you have a *proxy certificate* uploaded to the NGS MyProxy server (http://www.ngs.ac.uk/site-level-services/myproxy). The easiest way is with the Certificate Management Wizard (http://www.ngs.ac.uk/tools/certwizard), but if you have access to an installation of Globus you can use the myproxy-init commandline tool:
$ myproxy-init -s myproxy.ngs.ac.uk -l <username>
Entering the password for your Grid Certificate when prompted followed (twice) by the password you wish to protect your proxy certificates with, on the server.
Job Submission
You will need a JSDL (http://en.wikipedia.org/wiki/Job_Submission_Description_Language) describing the job you wish to submit, for example:
<JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <JobDescription> <JobIdentification> <JobProject>gridsam</JobProject> </JobIdentification> <Application> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix"> <Executable>/bin/sleep</Executable> <Argument>5</Argument> </POSIXApplication> </Application> </JobDescription> </JobDefinition>
If you save this as myjob.jsdl and you have created your proxy certificate, you can then submit the job with:
$ cd <omii_client_home>/gridsam/bin $ ./gridsam-submit -s https://gridsam.oerc.ox.ac.uk:18443/gridsam/services/gridsam \ -myproxy -myproxyuser <username> -myproxyhost myproxy.grid-support.ac.uk \ -j myjob.jsdl
Which returns an unique job ID like
urn:gridsam:b47fde291a7ff0d6edc9151f3af2ce2d
This can then be used to monitor the job:
$ ./gridsam-status -s https://gridsam.oerc.ox.ac.uk:18443/gridsam/services/gridsam -j <unique_job_id_returned>
File Staging
File staging is done via GridFTP. Here is a simple examble of a job tat stages stderr and stout in two separate files on the chosen NGS resource (ngs.oerc.ox.ac.uk in this case):
<?xml version="1.0" encoding="UTF-8"?> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <JobDescription> <Application> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix"> <Executable>/bin/uname</Executable> <Argument>-a</Argument> <Output>stdout-raw.txt</Output> <Error>stderr-raw.txt</Error> <WorkingDirectory>/home/ngsXXXX</WorkingDirectory> </POSIXApplication> </Application> <DataStaging> <FileName>stdout-raw.txt</FileName> <CreationFlag>overwrite</CreationFlag> <Target> <URI>gsiftp://ngs.oerc.ox.ac.uk:2811/stdout-staged.txt</URI> </Target> </DataStaging> <DataStaging> <FileName>stderr-raw.txt</FileName> <CreationFlag>overwrite</CreationFlag> <Target> <URI>gsiftp://ngs.oerc.ox.ac.uk:2811/stderr-staged.txt</URI> </Target> </DataStaging> </JobDescription> </JobDefinition>
The contents of the <WorkingDirectory> element should be the value of your $HOME on ngs.rl.ac.uk (where the job is actually executed), NOT the value on ngs.oerc.ox.ac.uk. These paths differ.
In order to avoid GSIFTP failures, you need to login via GSI-SSHTerm to ngs.oerc.ox.ac.uk and ngs.rl.ac.uk prior to submitting your job to GridSAM, otherwise the filesystem isn't accessible to GridSAM.
See more details at: http://www.omii.ac.uk/docs/3.4.0/user_guide/sc_services/gridsam/running_gridsam_with_file_staging.htm
More Information
- http://www.omii.ac.uk/mp/mp_jobsubmission.jsp
- http://gridsam.sourceforge.net/
- http://www.ngs.ac.uk/sites/belfast/gridsam.html
Known Issues
If there are no reasources available on the cluster when you submit a job, after a while GridSAM will give up to try to submit your job with a similar message:
undefined - 2009-03-12 12:23:21.72
cannot advance from 'active-queued' to 'active'
As of May 2007 and Version 2.0, GridSAM is buggy and unreliable. Use with caution.
One serious error is the inability to host multiple GridSAM instances from a single OMII container. The cryptic, user hostile error message that indicates you are having this bug is
GridOx:
Fails on gridox: GridSAM state is: failed Time: 2007-05-14T18:17:07.519+01:00 Description: cannot initialise working directory: Could not connect to FTP server on"gsiftp://grid-compute.oesc.ox.ac.uk/ - User globus credential is required but not specified in the context".
GridMan:
GridSAM state is: failed Time: 2007-05-14T18:18:17.579+01:00 Description: cannot initialise working directory: Could not connect to FTP server on"gsiftp://grid-compute.leeds.ac.uk/ - User globus credential is required but not specified in the context".
GridSAM state is: failed
Time: 2007-05-14T18:20:21.696+01:00 Description: cannot initialise working directory: Could not connect to FTP server on"gsiftp://grid-data.man.ac.uk/ - User globus credential is required but not specified in the context".
The developers of GridSAM say:
" This is a known issue with the Globus DRMConnector. At present we believe that it is due to a "feature" in the Globus COG kit. For some reason it appears to create a number of (different) class loaders which it uses in different parts of the code. This breaks when you try to have more than one instance of the COG kit in memory at the same time. A workaround was published on the GridSAM web page (http://gridsam.sourceforge.net/2.0.0/gridsam-service/gt2.html) though it doesn't seem to work in every case. "
Best of luck
