Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 44 Next »

This documentation includes guidance, instructions, general information about the Research IT Service managed Research Cluster. Researcher execute computing jobs on the cluster in containerized environments known as Docker “containers” which are essentially lightweight virtual machines, each assigned dedicated CPU, RAM, and GPU hardware, and each well isolated from other users’ processes. The Research Cluster uses the Kubernetes container management/orchestration system to route users’ containers onto compute nodes, monitors performance, and applies resource limits/quotas as appropriate.  

Complex Machine Learning workflows are supported through terminal/SSH logins and a full Linux/Ubuntu CUDA development suite. Users may install additional library packages (e.g. conda/pip, CRAN) as needed, or can opt to replace the default environment entirely by launching their own custom Docker containers. 

High-speed cluster-local storage houses workspaces and common training corpora (e.g. CIFAR, ImageNet).

Note to Users:

When using the Research Cluster, please be considerate and terminate idle containers prior to closing you command line interface or logging out of the datahub. When a user engages a container, the container become unusable by others even if completely idle. While containers share system RAM and CPU resources under the standard Linux/Unix model, the cluster’s 80 GPU cards are assigned to users on an exclusive basis.  

Getting Started

The are two ways in which to access the Research Cluster - via ssh or via the datahub.

 Accessing the Research Cluster via SSH

First, login via SSH to the “dsmlp-login.ucsd.edu" Linux server using your UC San Diego Active Directory (AD) username (with ‘@dsmlp-login.ucsd.edu’) and password.  After logging in, you will be in a login node for the Research Cluster and should not perform any computation in the login node.

Login step-by-step guidance:

  1. Open command line interface - known as the 'Terminal' for MacOS and 'Command Prompt' for Windows.

  2. Enter command ‘ssh ADusername@dsmlp-login.ucsd.edu'.

You may be asked a question after entering your username. Select 'yes’ to continue connecting.

  1. Enter your password. Note: Your password will not display as you enter it.

  2. A successful login will display your last login information. For example, ‘Last login: Thu Aug 3 10:25:19 2023 from 137.110.14.162’. Note that you are now in the Login Node. 

DO NOT RUN JOBS IN THE LOGIN NODE. Jobs must only be run in a launched container. Follow the guidance in the next section (Launching a Container), before running your compute jobs. 

 Accessing the Research Cluster via the datahub
  1. Login page: https://datahub.ucsd.edu/hub/login .

    • Enter UCSD email address and AD password. You will be sent a push to confirm via DUO.

  2. Select the your chosen notebook environment. 

    • Research Cluster users will have a choice of multiple environments to select.

    • If joining a PI/lab specific environment, your may only see the name of your PI/lab’s environment.

    • The ‘public’ folder will include storage that is shared and where datasets can be stored for all to access.

  3. Next, you’ll be directed to your environment and see the Jupyter Notebook interface.

  4. Click ‘New’ and select the kernel you wills to start up.

  5. When done using the datahub, select ‘logout’ to terminate kernels and end session.

Launching a Container

After signing into the login node, you can start a pod/container using launching a standard Research Cluster launch script or a customize container launch script.

Once started, containers are accessible in either a Bash Shell (command-line) or a Jupyter/Python Notebook environment. Users may access their Jupyter notebook by copying and pasting the launch script link provided by pasting the link in the browser address bar. This link will work as long as your container is active and will cease to work once you logout. Docker container image and CPU/GPU/RAM settings are all configurable - see the “Customization” and "Launch Script Command-line Options" sections below for more details.

Containers terminate when automatically when users exit the interactive shell.

More details and guidance on launching a container is available on the “How To: Launching Containers From the Command Line - Data Science/Machine Learning Platform (DSMLP)” guidance page.

 Example of a successful launch script

A successfully launched container will display similar output as follows:

Note in the example, the link provided to access the Jupyter notebook is “http://dsmlp-login.ucsd.edu:9572/user/kkt008/?token=8f7e5fbfa093f5a29fe115ca8b24f1fc03295c4d26886a83facbfb8750e02698”.

 Standard launch scripts

The standard launch scripts are predefined meaning they have specific RAM and CPU (and/or GPU) configurations. Other launch scripts are available at /software/common64/dsmlp/bin/.

Launch Script

Description

#GPU

#CPU

RAM

Container Image(s)

launch-scipy-ml.sh

Python 3, PyTorch, TensorFlow

0

2

8

ucsdets/scipy-ml-notebook:2020.2.9

launch-scipy-ml-gpu.sh

Python 3, PyTorch, TensorFlow

1

4

16

ucsdets/scipy-ml-notebook:2020.2.9

launch-datascience.sh

Python 3, Datascience, R

0

2

8

ucsdets/datascience-notebook:2020.2-stable

launch-rstudio.sh

R-Studio

1

4

16

ucsdets/datascience-rstudio:latest

Standard images with ‘pytorch’ include the GNU Screen utility, which may be used to manage multiple terminal sessions in a window-like manner.  

Web Interface Tool

The Research Cluster uses the web interface tool known as Jupyterhub Notebooks as an alternative graphical interface option for users who prefer computing in this type of interface rather than in the command line interface.

To access the web interface tool, users are directed to sign-in at https://datahub.ucsd.edu (or via selecting the login button up top).

 Jupyter / Python Notebooks

Web-based Jupyter notebooks allow researchers to combine live code, equations, visualizations and narrative text for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and more.  The Research Cluster's Jupyter notebooks offer straightforward interactive access to popular languages and GPU-enabled frameworks such as Python, R, Pandas, PyTorch, TensorFlow, Keras, NLTK, and AllenNLP.

The default container configuration creates an interactive web-based Jupyter/Python Notebook which may be accessed via a TCP proxy URL output by the launch script. Note that access to the TCP proxy URL requires a UCSD IP address: either on-campus wired/wireless, or VPN. See http://blink.ucsd.edu/go/vpn  for instructions on the campus VPN.

 Launching a Jupyter Notebook

Click on the "Log In" button above, or visit https://datahub.ucsd.edu and sign in with your UC San Diego Google account and password.
(Note: only '@ucsd.edu' addresses are currently accepted, not departmental or divisional addresses such as '@eng.ucsd.edu' or '@physics.ucsd.edu'.)

Click the    button.

Select a software and hardware configuration via the "Spawner options" page:

Open a blank Python 3 notebook:

When your work is complete, please shut down your Notebook via the Control Panel's "Stop my Server" option:

 Explore More: Programming Training/Educational Resources

Monitoring Container Resource Usage

 Monitoring Resource Usage in a Jupyter Notebook

Users can view the container CPU, GPU, and memory (RAM) utilization by selecting the ‘Show Usage’ header menu buttons. The usage will display in the top right of the notebook as follows:

 Monitoring Resource Usage in the command line terminal

Users can view the container CPU and memory (RAM) utilization in the Bash command line interface by using the ‘htop’ command. To see GPU usage, enter the `/usr/local/nvidia/bin/nvidia-smi` command for a container that uses GPU.

Modifying Containers

Certain modifications can be made to containers to allow for users to adjust their environment to accommodate specific computing needs.

Container Run Time Limits

 Container Run Time Limits

By default, containers are limited to a 6 hours job run time to minimize impact of abandoned/runaway jobs on the Research Cluster (and in turn the availability of cluster resources to other researchers). Users may adjust the runtime for their environment by increasing the limit up to 12 hours. To increase the runtime limit, users must modify the "K8S_TIMEOUT_SECONDS" configuration variable. 

Enter the following commands in the shell command line interface:

$ export K8S_TIMEOUT_SECONDS=$(( 3600 * 12 ))
$ launch-scipy-ml.sh

Users may also send a request for runtime extensions longer than 12 hours by emailing rcd-support@ucsd.edu.

Container Termination Messages

 Container Termination Messages

Containers may occasionally exit (or unexpectedly terminate) with one of the following error messages:

OOMKilled

Container memory (CPU RAM) limit was reached.

DeadlineExceeded

Container time limit (default 6 hours) exceeded - see above.

Error

Unspecified error.  Contact ITS/ETS for assistance.

Note: These errors will show up in 'kubectl get pods' in the status column.

Data Storage / Datasets

There are two types of persistent file storage are available within containers:

  • A private home directory ($HOME) for each user

  • A shared directory - for group shared data or for datasets used to distribute common data (e.g. CIFAR-10, Tiny ImageNet) for individual access

Each user's private home directory is limited to a 100GB storage allocation by default. Shared directory storage can vary as this storage may be a mounted storage. In specific cases, Research IT may make allowances to temporary increase storage in a user’s private home directory. These requests may be submitted by emailing rcd-support@ucsd.edu.

Standard Datasets

 Standard Datasets

Name

Path

Size

#Files

Notes

MNIST

/datasets/MNIST

53M

4

ImageNet Fall 2011

/datasets/imagenet

1300G

14M

ImageNet 32x32 2010

/datasets/imagenet-ds

1800M

2.6M

ILSVRC2012

Downsampled 32x32,64x64

Tiny-ImageNet

/datasets/Tiny-ImageNet

353M

120k

CIFAR-10

/datasets/CIFAR-10

178M

9

Caltech256

/datasets/Caltech256

1300M

30k

ShapeNet

/datasets/ShapeNet

204G

981k

ShapeNetCore v1/v2

MJSynth 

/datasets/MJSynth

36G

8.9M

Synthetic Word Dataset

Contact Research IT to request installation of additional datasets.

 Chicago Booth Kilts Center for Marketing: Nielsen Datasets

The Nielsen subscription dataset are available to authorized users at /uss/dsmlp-a/nielsen-dataset/. All of the datasets have been decompressed into this read-only directory making it easy for users to use software (Ex: Stata, your own code) to read directly from the Nielsen directories. In the interest of being mindful of server space, please do not duplicate these large datasets to your home directory and delete unneeded data from your home directory once you've completed your analyses and have your output files.

File Transfer

Users can utilize commands (e.g. 'git', 'scp', 'sftp', and 'curl') in the bash shell (command line interface) to import code or data from external servers that are both on and off-campus. Files can be copied into the cluster from external sources using Globus, SCP/SFTP, or RSYNC.

 Copying Data Into the Cluster: Using Globus

See the page on using Globus to transfer data to and from your computer or another Globus collection.

 Copying Data Into the Cluster: SCP/SFTP from your computer

Data may be copied to/from the cluster using the "SCP" or "SFTP" file transfer protocol from a Mac or Linux terminal window, or on Windows using a freely downloadable utility.  We recommend this option for most users.

Example using the Mac/Linux 'sftp' command line program:

slithy:Downloads agt$ sftp <username>@dsmlp-login.ucsd.edu
pod agt-4049 up and running; starting sftp
Connected to ieng6.ucsd.edu 
sftp> put 2017-11-29-raspbian-stretch-lite.img
Uploading 2017-11-29-raspbian-stretch-lite.img to /datasets/home/08/108/agt/2017-11-29-raspbian-stretch-lite.img
2017-11-29-raspbian-stretch-lite.img             100% 1772MB  76.6MB/s   00:23    
sftp> quit
sftp complete; deleting pod agt-4049
slithy:Downloads agt$

On Windows, we recommend the WinSCP utility.

  • After installing WinSCP, the tool will open and you will be be prompted to enter the following information:

 Copying Data Into the Cluster: rsync

On MacOS or Linux, 'rsync' can be used from a terminal window to synchronize data sets.

Example using the Mac/Linux ‘rsync’ command line program:

slithy:ME198 agt$ rsync -avr tub_1_17-11-18 <username>@dsmlp-login.ucsd.edu
pod agt-9924 up and running; starting rsync
building file list ... done
rsync complete; deleting pod agt-9924
sent 557671 bytes  received 20 bytes  53113.43 bytes/sec
total size is 41144035  speedup is 73.78
slithy:ME198 agt$

Customizing a Container Environment

Each launch script specifies the default Docker image to use, the required number of CPU cores, GPU cards, and GB RAM assigned to a container. When creating a customized container, it is recommended to use CPU-only containers until your code is fully tested and a test training run has been completed successfully. It is important to note that PyTorch, Tensorflow, and Caffe toolkits can easily switch between CPU and GPU , as such, a successful run in a CPU-only container should also be successful in a container with GPU.

 An example launch configuration is as follows:
K8S_DOCKER_IMAGE="ucsdets/instructional:cse190fa17-latest"
K8S_ENTRYPOINT="/run_jupyter.sh"

K8S_NUM_GPU=1  # max of 1 (contact ETS to raise limit)
K8S_NUM_CPU=4  # max of 8 ("")
K8S_GB_MEM=32  # max of 64 ("")

# Controls whether an interactive Bash shell is started
SPAWN_INTERACTIVE_SHELL=YES

# Sets up proxy URL for Jupyter notebook inside
PROXY_ENABLED=YES
PROXY_PORT=8888
 Users may copy an existing launch script into their home directory, then modify that private copy such as;
$ cp -p `which launch-pytorch.sh` $HOME/my-launch-pytorch.sh
$ nano $HOME/my-launch-pytorch.sh    
$ $HOME/my-launch-pytorch.sh

Adjusting Container Environment and CPU/RAM/GPU limits

All running containers in the cluster have a maximum configuration limit of 8 CPU, 64GB, and 1 GPU. You may run eight 1 CPU-core containers, one 8-core container, or any configuration within the these bounds. Requests may be submitted to rcd-support@ucsd.edu to increases to these default limits, as well as, to request other adjustments (including software) to your container environment. 

Alternate Docker Images

In addition to configuration settings, users can import alternate or custom Docker images. The cluster servers will pull container images from dockerhub.io or elsewhere if requested. You can create or modify these Docker images as needed.

Adjusting Launch Script Environments Command Line Options

Users can change the default variables within a launch script environment variables using specific command line options.

 Command line options to adjust launch script variables:

Option

Description

Example

-c N

Adjust # CPU cores

-c 8

-g N

Adjust # GPU cards

-g 2

-m N

Adjust # GB RAM

-m 64

-i IMG

Docker image name

-i nvidia/cuda:latest

-e ENTRY

Docker image ENTRYPOINT/CMD

-e /run_jupyter.sh

-n N

Request specific cluster node (1-10)

-n 7

-v

Request specific GPU (gtx1080ti,k5200,titan)

-v k5200

-b

Request background pod

(see below)

 An example launch script adjustment to the RAM (-m) and the GPU (-v):
[cs190f @ieng6-201]:~:56$  launch-py3torch-gpu.sh -m 64 -v k5200

Custom Python Packages (Anaconda/PIP)

Users may install additional Python packages within their containers using the PIP tool or standard Anaconda package management system. Users should only install Python packages after launching a container. When Python packages are installed, they are installed in a user’s home directory. As such, these packages will be available for all containers launched thereafter by the user.

  • For less complex installations the PIP tool can be used to install Python packages.  Please see PIP documentation ‘User Installs’ for detailed guidance.

  • Anaconda is recommended for installing scientific packages with complex dependencies. Please see Anaconda's Getting Started for a guided introduction. 

 Example of a user specific (--user) package installation using 'pip':
agt@agt-10859:~$ pip install --user imutils
Collecting imutils
  Downloading imutils-0.4.5.tar.gz
Building wheels for collected packages: imutils
  Running setup.py bdist_wheel for imutils ... done
  Stored in directory: /tmp/xdg-cache/pip/wheels/ec/e4/a7/17684e97bbe215b7047bb9b80c9eb7d6ac461ef0a91b67ae71
Successfully built imutils
Installing collected packages: imutils
Successfully installed imutils-0.4.5

Installing TensorBoard

Our current configuration doesn’t permit easy access to Tensorboard via port 6006, but the following shell commands will install a TensorBoard interface accessible within the Jupyter environment:

pip install -U --user jupyter-tensorboard
jupyter nbextension enable jupyter_tensorboard/tree --user

You’ll need to exit your Pod/container and restart for the change to take effect.

Running Jobs in a Background Container and Long-Running Jobs

To minimize the impact of abandoned/runaway jobs, the cluster allows for containers to run jobs in the background container for up to 12 hours of execution time. Users need to specify that a job should run in a background container by using the "-b" command line option (see example below). To support longer run times, the default execution time can be extended upon request to rcd-support@ucsd.edu.   

Note to users: Please be considerate and terminate any unused background jobs.  GPU cards are limited and assigned to containers on an exclusive basis. When attached to a container, GPUs are unusable by others even if the GPU is idle while attached to your container.

 Reconnecting a background container:

In the event that your background container is disconnected, use the ‘kubesh <pod-name>’ command to connect or reconnect to a background container.

 Terminating a background container:

In the event that you need to terminate a background container, use the ‘kubectl delete pod <pod-name>’ command to terminate the container.

 An example of entering a background container command:
[amoxley@dsmlp-login]:~:504$ launch-scipy-ml.sh -b
Attempting to create job ('pod') with 2 CPU cores, 8 GB RAM, and 0 GPU units.
   (Adjust command line options, or edit "/software/common64/dsmlp/bin/launch-scipy-ml.sh" to change this configuration.)
pod/amoxley-5497 created
Mon Mar 9 14:04:10 PDT 2020 starting up - pod status: Pending ; containers with incomplete status: [init-support]
Mon Mar 9 14:04:15 PDT 2020 pod is running with IP: 10.43.128.17 on node: its-dsmlp-n25.ucsd.edu
ucsdets/scipy-ml-notebook:2019.4-stable is now active.

Connect to your background pod via: "kubesh amoxley-5497"
Please remember to shut down via: "kubectl delete pod amoxley-5497" ; "kubectl get pods" to list running pods.
You may retrieve output from your pod via: "kubectl logs amoxley-5497".
PODNAME=amoxley-5497
[amoxley@dsmlp-login]:~:505$ kubesh amoxley-5497

amoxley@amoxley-5497:~$ hostname
amoxley-5497
amoxley@amoxley-5497:~$ exit
exit

[amoxley@dsmlp-login]:~:506$ kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
amoxley-5497   1/1     Running   0          45s

[amoxley@dsmlp-login]:~:507$ kubectl delete pod amoxley-5497
pod "amoxley-5497" deleted
[amoxley@dsmlp-login]:~:508$

Run-TIme Error Messages

There may be instances where you receive a CUDA run-time error while running a job in a container. Below are a few of the more commonly encountered errors. These errors can typically be resolved by user adjustments. However, If users encounter a run-time error that requires more assistance to resolve, please contact rcd-support@ucsd.edu.

 (59) device-side assert

Indicates a run-time error in the CUDA code executing on the GPU and is commonly due to out-of-bounds array access. Consider running in CPU-only mode (remove .cuda() call) to obtain more specific debugging messages.

cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/generic/THCTensorCopy.c:18
 (2) out of memory

GPU memory has been exhausted. Try reducing your dataset size, or confine your job to 11GB GTX1080Ti cards rather than 6GB Titan or 8GB K5200 (see “Adjusting Launch Script Environments Command Line Options” in this user guide).

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/THC/generic/THCStorage.cu:66
 (30) unknown error

This indicates a hardware error on the assigned GPU, and usually requires a reboot of the cluster node to correct. As a temporary workaround, you may explicitly direct your job to another node (see 'Adjusting Launch Script Environments Command Line Options” in this user guide). 

RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THC/THCGeneral.c:70

Please report this type of error directly to rcd-support@ucsd.edu for assistance.

Monitoring Cluster Status

Users can enter the ‘cluster-status’ command for insight into the number of jobs currently running and GPU/CPU/RAM allocated. Alternatively, users can refer the the cluster ‘Node Status’ page for updates on containers (or images).

 An example of a 'cluster-status' command output:

Cluster Hardware Specifications

Cluster architecture diagram

 Node

CPU Model

#Cores ea.

RAM ea

#GPU

GPU Model

Family

CUDA Cores

GPU RAM

GFLOPS

Nodes 1-4

2xE5-2630 v4

20

384Gb

8

GTX 1080Ti

Pascal

3584 ea.

11Gb

10600

Nodes 5-8

2xE5-2630 v4

20

256Gb

8

GTX 1080Ti

Pascal

3584 ea.

11Gb

10600

Node 9

2xE5-2650 v2

16

128Gb

8

GTX Titan
(2014)

Kepler

2688 ea.

6Gb

4500

Node 10

2xE5-2670 v3

24

320Gb

7

GTX 1070Ti

Pascal

2432 ea.

8Gb

7800

Nodes 11-12

2xXeon Gold 6130

32

384Gb

8

GTX 1080Ti

Pascal

3584 ea.

11Gb

10600

Nodes 13-15

2xE5-2650v1

16

320Gb

n/a

n/a

n/a

n/a

n/a

n/a

Nodes 16-18

2xAMD 6128

24

256Gb

n/a

n/a

n/a

n/a

n/a

n/a


Nodes are connected via an Arista 7150 10Gb Ethernet switch.  

Additional nodes can be added into the cluster at peak times.

Example: PyTorch Session with TensorFlow examples


slithy:~ agt$
slithy:~ agt$ ssh cs190f@ieng6.ucsd.edu
Password:
Last login: Thu Oct 12 12:29:30 2017 from slithy.ucsd.edu
============================ NOTICE =================================
Authorized use of this system is limited to password-authenticated
usernames which are issued to individuals and are for the sole use of
the person to whom they are issued.
 
Privacy notice: be aware that computer files, electronic mail and
accounts are not private in an absolute sense.  For a statement of
"ETS (formerly ACMS) Acceptable Use Policies" please see our webpage
at http://acms.ucsd.edu/info/aup.html.
=====================================================================
 

Disk quotas for user cs190f (uid 59457):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
acsnfs4.ucsd.edu:/vol/home/linux/ieng6
                      11928  5204800 5204800                 272        9000        9000      
=============================================================
Check Account Lookup Tool at http://acms.ucsd.edu
=============================================================

[…]

Thu Oct 12, 2017 12:34pm - Prepping cs190f
[cs190f @ieng6-201]:~:56$ launch-pytorch-gpu.sh
Attempting to create job ('pod') with 2 CPU cores, 8 GB RAM, and 1 GPU units.  (Edit /home/linux/ieng6/cs190f/public/bin/launch-pytorch.sh to change this configuration.)
pod "cs190f -4953" created
Thu Oct 12 12:34:41 PDT 2017 starting up - pod status: Pending ;
Thu Oct 12 12:34:47 PDT 2017 pod is running with IP: 10.128.7.99
tensorflow/tensorflow:latest-gpu is now active.

Please connect to: http://ieng6-201.ucsd.edu:4957/?token=669d678bdb00c89df6ab178285a0e8443e676298a02ad66e2438c9851cb544ce

Connected to cs190f-4953; type 'exit' to terminate processes and close Jupyter notebooks.
cs190f@cs190f-4953:~$ ls
TensorFlow-Examples
cs190f@cs190f-4953:~$
cs190f@cs190f-4953:~$ git clone https://github.com/yunjey/pytorch-tutorial.git
Cloning into 'pytorch-tutorial'...
remote: Counting objects: 658, done.
remote: Total 658 (delta 0), reused 0 (delta 0), pack-reused 658
Receiving objects: 100% (658/658), 12.74 MiB | 24.70 MiB/s, done.
Resolving deltas: 100% (350/350), done.
Checking connectivity... done.
cs190f@cs190f-4953:~$ cd pytorch-tutorial/
cs190f@cs190f-4953:~/pytorch-tutorial$ cd tutorials/02-intermediate/bidirectional_recurrent_neural_network/
cs190f@cs190f-4953:~/pytorch-tutorial/tutorials/02-intermediate/bidirectional_recurrent_neural_network$ python main-gpu.py
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!
Epoch [1/2], Step [100/600], Loss: 0.7028
Epoch [1/2], Step [200/600], Loss: 0.2479
Epoch [1/2], Step [300/600], Loss: 0.2467
Epoch [1/2], Step [400/600], Loss: 0.2652
Epoch [1/2], Step [500/600], Loss: 0.1919
Epoch [1/2], Step [600/600], Loss: 0.0822
Epoch [2/2], Step [100/600], Loss: 0.0980
Epoch [2/2], Step [200/600], Loss: 0.1034
Epoch [2/2], Step [300/600], Loss: 0.0927
Epoch [2/2], Step [400/600], Loss: 0.0869
Epoch [2/2], Step [500/600], Loss: 0.0139
Epoch [2/2], Step [600/600], Loss: 0.0299
Test Accuracy of the model on the 10000 test images: 97 %
cs190f@cs190f-4953:~/pytorch-tutorial/tutorials/02-intermediate/bidirectional_recurrent_neural_network$ cd $HOME
cs190f@cs190f-4953:~$ nvidia-smi
Thu Oct 12 13:30:59 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:09:00.0 Off |                  N/A |
| 23%  27C    P0     56W / 250W |      0MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

cs190f@cs190f-4953:~$ exit

Licensed Software

Stata

If you have been provisioned with Stata licensing, a container started by launch-scipy-ml.sh is capable of executing Stata. Stata will be installed in your home directory and can be executed using the command '~/stata-se' from within a container.

Acknowledging Research IT Services

Papers, presentations, and other publications that feature research that benefited from the Research Cluster computing resource, services or support expertise may include in the text the following acknowledgement:

This research was done using the UC San Diego Research Cluster computing resource, supported by Research IT Services and provided by Academic Technology Services / IT Services.



  • No labels