your home directory. Your home directory is shared at the login node below.
ssh -A username@occy.opensciencedatacloud.org
Once logged in you may issue commands as needed.
- example of seeing files in your hdfs directory:
hadoop fs -ls /user/$USER
- example of launching a job:
hadoop jar /opt/hadoop/hadoop-examples-0.20.203.0.jar wordcount /user/$USER/in/ /user/$USER/out/
Before Hadoop jobs can be run over data it needs to be loaded into the Hadoop filesystem HDFS. Loading data in and out of the Hadoop cloud is done in two stages. First data must be loaded into your home directory. Then from there it is loaded into HDFS. All users have a home directory inside hdfs at /user/$USER. Please note that the HDFS filesystem is completely separate from the usual linux filesystem. Please only reference /user/$USER inside HDFS. Please use scp, sftp, rsync, or UDR to load data in and out of OCC Y.
Once the files exist on your home directory they can be loaded into your HDFS directory at /user/$USER with the hadoop fs/dfs commands.
- Copy files into hadoop
hadoop fs -moveFromLocal /home/$USER/source_on_local_disk /user/$USER/target_in_hdfs
- Copy files out of hadoop
hadoop fs -copyToLocal /user/$USER/source_in_hdfs /home/$USER/target_on_local_disk
ssh, forwarding your ssh key, into skidmore.opensciencedatacloud.
ssh -A username@skidmore.opensciencedatacloud.org
Once logged into skidmore.opensciencedatacloud.org, you may issue commands as needed. Make sure /opt/hadoop/bin is in your path.
hadoop fs -ls /user/$USER
hadoop jar /opt/hadoop/hadoop-examples-0.20.203.0.jar wordcount /user/$USERNAME/in/ /user/$USERNAME/out/
In order to run hadoop jobs over your files, you need to copy them into Hadoop’s special filesystem called HDFS. The filename used are just examples, substitute them with your own. Also please do not attempt to write/store files outside your designated home directories.
scp -r ./myfiles_in $USERNAME@$CLOUD.opensciencedatacloud.org:
hadoop fs -moveFromLocal ~/myfiles_in /user/$USERNAME/in
In order to copy the file out of the Hadoop clouds, you first need to copy the file off of HDFS to your home dirctory.
hadoop fs -copyToLocal /user/$USERNAME/out ~/myfiles_out
scp -r $USERNAME@$CLOUD.opensciencedatacloud.org:myfiles_out ./
Hadoop jobs are custom Java application that you have written or acquired. Ideally you would store these in your home directory on the local filesystem. Most jobs will be of the form hadoop jar YOURJARFILE.jar Option1 option2.....OptionN. Since every program will have its own syntax we will only show the default example wordcount. Please consult the source of your hadoop program or review the code for proper syntax. We assume /user/$USERNAME/in has been prepopulated as described above and the /user/$USERNAME/out does not exist. The wordcount example is part of the hadoop-examples.jar provided by apache, the listed version number may change depending on the cloud and hadoop version.
hadoop jar /opt/hadoop/hadoop-examples-0.20.203.0.jar wordcount /user/$USERNAME/in/ /user/$USERNAME/out/