Much like workspaces, jobs are also Linux environments that you can in parallel on multiple machines; this allows you to try out different hyper-parameters, code and datasets and then compare the results and metrics for each job and continue iterating on the best results.
While a job is running, you can view running logs, system metrics and model/training metrics via TensorBoard. You also have full terminal access to each running job.
Important: The current working directory while executing a job is
/onepanel/code . Datasets are mounted into
/onepanel/input/datasets and any output that you want to save should be saved to
Tip: You can install additional python packages in jobs by adding a
requirements.txt file to your code. See Installing Packages and Dependencies for more information
Once a job is created, you will see the following tabs:
The "log" tab displays a realtime log of your job's progress.
The "datasets" tab displays the exact datasets that were mounted to the job.
The "code" tab displays the exact version of code that was used for this job.
The output tab contains the output files that were saved. Please see Saving Job Output for more information on saving and downloading job output.
You can easily clone a job that was previously executed by clicking on the 'CLONE' button. The previous shell command will be shown but you can easily update this field.
Note: The code, machine type and environment of the previous job will be preselected for you.
You can stop a job by clicking on the 'STOP' button on the top of the right panel. This will immediately stop the job and save any output and log for future reference.