Table Of Contents

Previous topic

Public Data Commons

Next topic

Tukey - The OSDC Console

This Page

OSDC Quickstart Guide

  1. Apply for and obtain a resource allocation and an account via the OSDC application. Please allow a few weeks for our allocation committee to review your application, depending on when your application is received in the quarterly resource allocation cycle.
  2. Once your request has been approved and you receive a welcome email with details, log into the main OSDC console.
  3. From the console, upload or generate a key pair to use for cloud access (Details). NOTE: If you generate your keypair in the Tukey console, you’ll probably need to run chmod to change the permissions
  4. From the console, launch a virtual machine (VM). You can start with a plain vanilla image, or use a preexisting snapshot or image that has been already set up by your lab or another user with the required tools and software.

Coming Soon - Currently in BETA

Updates to the Tukey Webconsole will allow OSDC users to more easily share descriptions of their image snapshots and the types of analysis the image is intended for.

  1. From a terminal, or a program like PuTTY in Windows, access your virtual machine by first ssh’ing into the appropriate login node with the username provided to you and then ssh’ing to your VM. If you’re having trouble, please see SSH - Managing Keys and Connections. The login nodes and the user to login into your VM are:
Cloud Login Node VM
OSDC Sullivan <username>@sullivan.opensciencedatacloud.org ubuntu@<VM.IP>
OSDC Griffin <username>@griffin.opensciencedatacloud.org ubuntu@<VM.IP>
Bionimbus PDC <username>@bionimbus-pdc.opensciencedatacloud.org ubuntu@<VM.IP>

Warning

Remember to terminate your VMs in the Tukey console when not in use to free up valuable cores for other OSDC users. Make sure if it’s something you built from scratch, to snapshot it first! To learn more visit the VM or FAQ/Best Practices sections.

QUICKSTART TERMINAL SESSION: SSH’ing

Below is a screencapture of a terminal session showing the command line tools necessary to login to the OSDC Sullivan headnode and then a VM. Feel free to copy and paste commands to your own shell, adjusting usernames and VM IPs as needed.

  1. On Sullivan your user data is automatically mounted via GlusterFS. On Griffin and the PDC, persistent data should be pushed to object storage in Ceph or Cleversafe with local scratch in ephemeral storage in VMs.
  2. Install any software packages on the VM using apt-get install <packagename>.
  3. On Sullivan, you can find the datasets in the OSDC Public Data Commons at /glusterfs/osdc_public_data or your home folder at /glusterfs/users/<username>. For other resources, please review the documents available on Public Data Sets
  4. If you are on Sullivan, follow the Tutorial below for a hands-on demonstration of how to use the OSDC. For OSDC Griffin, take a look at this example of how to run a Jupyter Notebook to analyze NEXRAD data. For the Bionimbus PDC, stay tuned...
  5. When you have technical issues, please review this support guide. If you are unable to resolve them, contact us at support@opensciencedatacloud.org. A member of our support team will review and contact you as soon as possible.

OSDC EO-1 Quick Start Tutorial

Now we’ll take you step by step through a demo using NASA’s Earth Observing-1 dataset that works on OSDC Sullivan. In this tutorial, we will show you how to use OSDC to visualize and perform a simple example analysis of NASA satellite imagery data. You will perform many tasks common to using the OSDC during this demo like launching an instance, ssh’ing, in addition to those specific to analysis.

Here we will show you how to use Python to

  • create png false-color images from GeoTiff data,
  • use a machine algorithm to classify each pixel of a scene as desert, water, cloud, or vegetation,
  • view GeoTiffs and save the results of your classification as an image.

Launch the OSDC EO-1 Instance

In the console, under ‘Images and Snapshots’, scroll down to find the section labeled ‘All Snapshots’. Here’s you’ll want to find and launch the snapshot called ‘OSDC_DatasetExplorer_EO1’. We recommend using a medium instance.

When you ssh in to both the login node and the instance, make sure and add both the “A” and the “X” flags. The A is for key forwarding, the X is for X11 forwarding. IE: ssh -AX <username>@sullivan.opensciencedatacloud.org and then ssh -AX ubuntu@<INSTANCE.IP>. If you’re doing a lot of GUI work like looking at plots and images, you’ll want to use this X flag often.

Once you’re in the instance, cd and run all commands from the existing /eo1_demo dir.

Viewing a GeoTiff

We will take a look at an example ALI GeoTiff from band 3, covering 0.45 - 0.515 micron. Our data resides in the /glusterfs/osdc_public_data/eo1 directory. In the terminal, type or copy:

python viewGeoTiff.py /glusterfs/osdc_public_data/eo1/ali_l1g/2014/029/EO1A1930292014029110PZ_ALI_L1G/EO1A1930292014029110PZ_B03_L1T.TIF

Making an RGB Image

Here we will create an RGB image from three bands of an individual ALI scene. We will use the makeRGB.py script to look at a scene observed on the 29th day of 2014 and save it as a png image. To make the image a little brighter, we tell the script to scale each color up by a factor of 2.

In the terminal, type or copy in:

python makeRGB.py 2014 029 EO1A1930292014029110PZ italy.png 2

To download this image to your local machine for viewing is a two-step process. First, move the file to your gluster user directory on Sullivan by typing the following into your VM terminal:

mv italy.png /glusterfs/users/USERNAME/

Then, in the terminal on your local machine, download the file into the preferred directory:

scp USERNAME@sullivan.opensciencedatacloud.org:~/italy.png .

Now take a look at your picture using your favorite image viewer. Looks like a nice spot to run our classifier. This is a section of the Italian coast near Pisa.

Classifying the Image

We will run our classifier see if it can identify which sections of the scene are clouds, water, desert, or vegetation. The classifier uses a support vector machine (SVM) from Python’s scikit-learn module to fit a model to the training set from Hyperion data we have provided in ‘FourClassTrainingSet.txt’. This classifier uses the ratios of ALI bands 3:7 and 4:8. The file trainingSpectra.png shows a plot of the average reflectance spectra from Hyperion for each class in the training set. Shaded grey areas show the wavelength coverage of ALI bands, which are used by the classifier described.

You can run the classifier with the following command:

python classify.py 2014 029 EO1A1930292014029110PZ italyClassified.tif

It will take about 10 minutes to run, so go get a snack or some coffee. You can also look at the classified GeoTiff we have provided using the above procedure.

INTERMISSION - Project Matsu

This demonstration comes from analysis demonstrated by one of our OSDC projects called Project Matsu. Project Matsu is a collaboration between NASA and the Open Commons Consortium to develop open source technology for cloud-based processing of satellite imagery to support the earth sciences and disaster relief.

Viewing the Results

Let’s take a look at the GeoTiff created. Run viewClassifiedTiff.py on the file made by the classification:

python viewClassifiedTiff.py italyClassified.tif italyClassified.png

You can download italyClassified.png to your local machine using the instructions above in ‘Making an RGB image.’ The classified scene has a white pixel where the classifier identified clouds, blue for water, brown for desert, and green for vegetation. Using the USGS EarthExplorer webpage you can retrieve the scene IDs and dates for scenes all over the world and classify them. Have fun!

Cleaning up

Once you have completed this demo, exit out of the VM and the login node, enter the console and be sure to terminate your VM.