Much like workspaces, jobs are also Linux environments that you can in parallel on multiple machines; this allows you to try out different hyper-parameters, code and datasets and then compare the results and metrics for each job and continue iterating on the best results.  

While a job is running, you can view running logs, system metrics and model/training metrics via TensorBoard.  You also have full terminal access to each running job.

Important: The current working directory while executing a job is /onepanel/code . Datasets are mounted into /onepanel/input/datasets and any output that you want to save should be saved to /onepanel/output .

Tip: You can chain shell commands and download datasets in jobs. See Chaining Job Commands and Downloading Datasets for more information.

Tip: You can install additional python packages in jobs by adding a requirements.txt file to your code.  See Installing Packages and Dependencies for more information

Once a job is created, you will see the following tabs:

Log

The "log" tab displays a realtime log of your job's progress.

Datasets

The "datasets" tab displays the exact datasets that were mounted to the job.

Code

The "code" tab displays the exact version of code that was used for this job.

Output

The output tab contains the output files that were saved.  Please see Saving Job Output for more information on saving and downloading job output.

Cloning Jobs

You can easily clone a job that was previously executed by clicking on the 'CLONE' button. The previous shell command will be shown but you can easily update this field.

Note: The code, machine type and environment of the previous job will be preselected for you.

Stopping Jobs

You can stop a job by clicking on the 'STOP' button on the top of the right panel.  This will immediately stop the job and save any output and log for future reference.

Did this answer your question?