Condor is a high-throughput opportunistic distributed computing environment. It is specifically tailored to long term and/or batch serial jobs where efficient use of computing resources are preferred to speed of computation. For example, one might use condor over existing high-performance computing solutions to run numerous serial jobs simultaneously that is known to take days, weeks or months to complete.
Condor jobs may be submitted from any HPC head node or the condor login node, condor-login.hpc.fsu.edu. Any HPC user may login to the condor login node from any of the HPC login nodes using by typing "ssh condor-login". condor-login.hpc.fsu.edu is also accessible directly from any machine on the FSU VPN
The spool directory (/opt/condor/spool) on condor submit nodes holds the job queue and history files for all jobs submitted from a given machine. As a result, disk space requirements limit the number of jobs that can be submitted from our HPC login nodes. Especially if users are submitting jobs with very large executables or image sizes.
All condor utilities should be immediately available upon login (ensure you have not overwritten PATH). Currently, all compute nodes are backed by clusters and workstations at the Department of Scientific Computing and many are available for general access.
Your home directory on the condor-login is on Lustre. The Panfs file system is not available on the condor-login node, so copy any files that you need from Panfs to Lustre before you leave your HPC login node.
If you experience performance issues submitting jobs from the condor-login node, try moving your large data file from Lustre to /opt/condor/stage on the local condor-login node. Staging large files on the local condor-login disks will help to reduce network traffic and might help speed up your runs if you are working with very large data files. Please delete/move your files from this directory after you are done.
Job parameters and requirements are described by a condor description file and submitted using the
Here is a simple example of a condor job description:
universe = vanilla executable = /path/to/my/executable arguments = arg1 arg2 arg3 transfer_input_files = file1, file2, file3 requirements = Arch == "X86_64" && OpSys == "LINUX" should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT queue
This describes a vanilla job that will run on a 64 bit Intel compute node running Linux. A vanilla job runs on a compute node with no special features. There are other possible job universes that provide special features like check-pointing but presently only vanilla is supported.
You may view all queued jobs with
Compute node statuses may also be queried with
For more documentation on condor job submission and control, please see the condor documentation at DSC:
Remember that you don't have to login to submit.scs.fsu.edu, but that you can submit jobs from your HPC login node!
- Standard Universe is not supported
- To take full advantage of the compute-nodes, executables need to be compiled for 3 different architectures (32 and 64 bit Intel, and 64 bit PowerPC).
- Compute nodes do not presently share the same file system as HPC.
- Condor will notify you of job completion by email. If this is not desirable behavior then add the following to your description:
notification = NEVER