Public-use data, with or without usage restrictions, that has only non-personal information.
De-identified data with usage restrictions. These data do not contain any direct identifiers but contain sensitive or restricted information.
Data that include any kind of direct personal identifier, such as names, addresses, or SSN.
We have two versions of Python:
Python 2.6.6
Python 2.7.X ::
Anaconda 2.2.0 (64-bit) with all packages (pandas, sckit.etc..)
IPython 3.1.0 , ipython notebook
The 2.6.6 is the default version, but if you want to use the 2.7.x add the following command to you ~/.bashrc
Current Java version:
java version "1.7.0_71"To enable Maven, add to your bash profile:
export M2_HOME=/usr/local/mavenCurrent Matlab version:
Matlab version 8.3.0.532Current QGIS version:
QGIS version 2.0.1Current R version:
R-project, R version 3.2.1The Data Facility offers PostgreSQL and Oracle (Green environment only) as database options for projects. Databases must be requested as part of a new or existing research project using the Research Project Request form.
When requesting a database, the project requestor will be given administrative privileges over the database. However, all other access levels for collaborators must be specified in the request. The levels of access are as follows.
If you are unsure of your needs, the Data Facility team can provide consultation based on the requirements, goals, and objectives of your project. Simply state that you are seeking guidance on what best fits your project in the Addition Support section of the form and a member of the Data Facility team will help.
PostgreSQL is a Relational Database Management System.
To access the database, after login on compute.cusp.nyu.edu with ssh, run
'psql <database_name>', for example:
Oracle is a Relational Database Management System. To access the database, after login on compute.cusp.nyu.edu with ssh, run:
oracle-client
It is very common to implement database connections using drivers, which are piece of programs that have implemented the database connection to different DBMS and languages.
If you've never connected to a DB using python or need more info, check this tutorial.
Some packages are already provided on compute for Python (through Anaconda):
Psycopg2 - Postgres
SqlAlchemy - Postgres and Oracle
For example, to connect to database opendata, use the following code snippet. Replace app_login and app_password with the username and password of your app; and dbname with the database name.
Oracle using Python: To connect to Oracle database my_db using python on compute run the following code:
import cx_OracleTo test if this worked run:
cursor = db.cursor()After accessing the database, close the connection:
db.close()
Security Note: Don't put your credentials on plain text on the source code of your application because others can see it. If you need to commit credentials, ask CUSP-IT for an application account and use it.
To try it, on compute run python and then the commands above.
Java: To access the Oracle database from Java you need the JDBC driver. Look the example code src/db-example-jdbc-oracle on CUSP git repository.
This is our identity management server, responsible for authenticating user access on all machines and services. The server also serves a single web interface for managing users. This interface is only accessible by the IT and HR admin staff inside CUSP network (aka from a CUSP workstation).
Hardware SpecificationsThis is our main service-provider that allows CUSP users to collaborate over the internet. Here are this list of available services:
Our gateway virtual host access to our gateway cluster (dual nodes) the nodes have SSH port (22) open for CUSP users to access internal computing services. No heavy computing should be performed on this computer nor any data should be stored on this cluster. (check Frequently Asked Questions (FAQ) / How-to Guides)
The gateway cluster has two identical nodes gw1 and gw2 for redundancy.
Cluster is the gateway to the hadoop cluster, Users can ssh to it and create session where they run jobs on the hadoop cluster then store data in the home directory.
Hostname: "cluster", it is accessible only through the gateway server "gw.cusp.nyu.edu"
Large shared-memory machine for user computing and databases. Hostname:"compute", it is accessible only through the gateway server "gw.cusp.nyu.edu"
Hardware SpecificationsThis is a SAN storage controller with 4 expansions to support high performance computing, the storage subsystem offers data access and protection. The SAN storage has the capacity of 505.086TB of 360 SAS Disks.The Disk unit is a SAS (Serial Attached SCSI) capacity 1.863GB.
There are 02 GPFS administration servers that provide NSD drives (Network Shared Drive). The 02 servers are identical, they are both in one GPFS cluster acting as NSD servers providing GPFS file system to other GPFS clusters in CUSP infrastructure in a secure manner (servers mentioned above)
Hardware SpecificationsThe NSD drives are mapped as following:
Hadoop FAQ
Five FAQ
Hardware and software Specifications
Hardware: IBM System Storage TS3310 tape Library
Hardware SpecificationsSoftware: IBM Enterprise Software
Software Specifications
If you are new to the Data Facility, please follow these instructions to create your home directory. This is required for Jupyter Notebooks and the Windows Remote Desktop to work properly.
Create home directory
Jupyter Notebook from your browser
Jupyter Notebook is now available to use in the Data Facility and can be accessed through your browser. Below are instructions for accessing from a Mac or PC.
Mac
PC
Windows Remote Desktop Instructions
To securely access the Windows Remote Desktop from a Mac or PC, follow the instructions found below.
Mac
PC
SSH Instructions
If you prefer to access the Green Enviroment from the command line, first ssh to our gateway server, then to compute by running the following. Remember to subsititue in your CUSP ID.
Once connected, run this from gw.cusp.nyu.edu:
ssh compute
Project Workspaces - Directory Structure
/gws/projects/ - This is where green project workspaces are located and can be accessed by project owners and collaborators.
/gscratch/share - A temporary directory for sharing data with other users. Please do not keep work or data here long term, as it is not backed up and will be deleted periodically.
/homedirs/cuspid - Your home directory for all of your personal files.
Like most Data Facility services, Git access is granted when your CUSP account is created. If you are for any reason unable to use Git, please send an email to CUSP IT. (If you need a Git crash course, please check out http://git.or.cz/course/svn.html)
The Data Facility now offers GitLab, which is a GUI for managing your git repositories. The instructions below will help you learn how to use GitLab with new and existing projects.
GitLab Guides
GitLab is available in the Green and Yellow environment. Use these guides to learn how GitLab can be setup and used to help you manage your projects.
Green GitLab
Yellow GitLab
Redirect An Existing Repository
All existing repos have been migrated over from our previous repo manager (Gitolite).
If you have a clone of an existing repository, you will need to update the remote repository URL. Use the following rules to find out your new repository path.
* Note: Values for <dir> and <repo_name> are groups in GitLab, so repositories which include these are in a group namespaces.
Old Repository Path | New Repository Path |
---|---|
www-users/<username> | <username>/www-users |
www-projects/<repo_name> | <repo_name>/www-projects |
www-classes/<repo_name> | <repo_name>/www-classes |
users/<username>/<repo_name> | <username>/<repo_name> |
src/<repo_name> | <owner_username>/<repo_name> |
projects/<repo_name> | <owner_username>/<repo_name> |
projects/<dir>/<repo_name> | <dir>/<repo_name> |
papers/<repo_name> | <owner_username>/<repo_name> |
Once you know your new repository path, execute the following command inside your local repository workspace:
$ git remote set-url origin https://gitlab.cusp.nyu.edu/<new_repo_path>.gitYour local repository copy is now pointing to GitLab.
You can create a website at serv.cusp.nyu.edu/~cuspid by creating a GIT repository at gitlab.cusp.nyu.edu:USERNAME/www-users, then pushing your contents to the server. Follow the instructions to push into a GitLab repository. Here is an example of creating a Hello World homepage for user hvo.
1. Log into GitLab.
2. Create a project called 'www-users' in the user namespace (hvo). The repository path should be <cuspid>/www-users (hvo/www-users).
3. Run the example commands shown below, substituting in your own information.
Done - now the page will be available at http://serv.cusp.nyu.edu/~hvo
NOTE: for large and/or temporary data, please use NFS instead.
You can create a project website at serv.cusp.nyu.edu/projects/PROJECT by creating (cloning) a GIT repository at gitlab.cusp.nyu.edu:PROJECT/www-projects and pushing them to the server. The project should be a group inside GitLab. Here is an example of creating a Hello World project.
1. Log into GitLab.
2. Create a Group called 'helloworld'.
3. Create a project called 'www-projects' inside the helloworld group. The repository path should be GROUPNAME/www-projects. (helloworld/www-projects)
Done - now the page will be available at http://serv.cusp.nyuedu/projects/helloworld
NOTE: for large and/or temporary data, please use NFS instead.
Please place your files on /scratch/www/files/ (that is mounted on most of the servers, e.g. compute.cusp.nyu.edu). Whatever you put there will be transferred to http://serv.cusp.nyu.edu/files/.
By default, this folder is not browse-able on the web, but you can always create a sub-folder and turn on 'x' mode for it. For data that is not large, you can share it in your personal or project web space as indicated above. Since these are version control, it might not be wise to share large and/or temporary data.
Please note that data transfer through this URL will not be encrypted, thus must not be used to shared sensitive data. For secured data sharing, please consider using data marts for sftp access.
Once your personal/project webpage has been created, you can add an .htaccess file in the folder to control user access. For example, putting the below into the .htaccess file will allow only users of the group mart_uods1113 to access.
AuthType BasicWe could replace the Require valid-user with Require user USERNAME1 USERNAME2 to grant access to specific users.
This guide is to be used by teaching faculty and assistants for posting code and data online for student access.
Reminder: These files are publicly accessible. Ensure code and data are non-proprietary and open before posting.
Instructions
Hosting class files at CUSP