Data Hub | Center for Urban Science and Progress, NYU

Public-use data, with or without usage restrictions, that has only non-personal information.
De-identified data with usage restrictions. These data do not contain any direct identifiers but contain sensitive or restricted information.
Data that include any kind of direct personal identifier, such as names, addresses, or SSN.

TOOLS

FORMS

SUPPORT

CUSP IT

CUSP Resources

FAQ

Set up a virtual environment in compute
Virtual environments and Jupyter Hub
Transfer files to home directory or project workspace
Green data in the Yellow environment
Continuous scripts
Scheduled scripts
Database policies
Password reset

Set up a virtual environment in compute

Available in:
Green environment

By setting up a python virtual environment, you'll be able to easily install necessary packages for your projects without needing administrative privileges on compute. To do so, run the following.

conda create -n JupyterVE python=2.7 anaconda

This will create a virtual environment with all anaconda libraries included. In this example, we are naming the environment JupyterVE and using python version 2.7. You may, of course, change these to your specifications. You'll then need to activate the environment by running:

source activate JupyterVE

You will notice your command line prompt change to be similar to the following:

(JupyterVE) [abc123@compute ~]$

This indicates that you are working in a virtual environment. You can now install any additional packages you would like to use and they will be contained within this environment. If you would like to exit the virtual environment, run the following:

source deactivate

You will then be returned to your normal environment. Any packages you installed while in the virtual environment will persist there and be available upon activation. You can return to the virtual environment any time by once again running the source activate JupyterVE command.

Virtual environments and Jupyter Hub

Available in:
Green environment

To use your installed packages in Jupyter Hub, you will need to create a new kernel from your virtual environment. To do so, make sure your virtual environment is active and then run the following command.

python -m ipykernel install --user --name JupyterKern --display-name JupyterKern

Note: You may change the name and display-name to be whatever you prefer and they may be different. In this example, we use JupyterKern for both.

Now login to Jupyter Hub and create or start a notebook. You can select the new kernel under the Kernel -> Change kernel menu option. All of the packages that were installed in the virtual environment will now available in your notebook.

If you would like to add environment variables to your notebook, you will need to edit the ~/.local/share/jupyter/kernels/JupyterKern/kernel.json file. Below is an example with the variable MY_VAR added.

{
  "display_name": "JupyterKern",
  "language": "python",
  "argv": [
    "/opt/rh/anaconda/root/bin/python",
    "-m",
    "ipykernel",
    "-f",
    "{connection_file}"
  ],
  "env": {
    "MY_VAR": "HELLO WORLD"
  }
}

Transfer files to home directory or project workspace

Available in:
Green environment

You may find yourself wanting to bring your own code or data into the Green environment. This can be done by transferring the files via SFTP. There are a number of SFTP clients that make transferring a drag and drop process. However, using the command line is perhaps just as quick and effective.

From your local machine, make sure you are in the directory that contains the file you would like to transfer and run the following. Be sure to replace cuspid and filename.txt with the proper entries for your use case.

$ sftp cuspid@staging.cusp.nyu.edu:/home/cuspid 
$ put filename.txt 
$ bye

At this point, the file is now in your home directory, which is mounted on all CUSP machines, including compute. If you would like to move the file into your project workspace, you will need to ssh to compute (through gw) and move the file as shown below.

$ ssh cuspid@gw.cusp.nyu.edu 
$ ssh compute 
$ cd /green-projects/my-project-name/workspace/share 
$ mv ~/filename.txt ./

Green data in the Yellow environment

Available in:
Yellow environment

While Green data can be made available in the Yellow environment, it is not mounted there by default.

For Yellow projects which require access to Green data, please submit an amendment to your project by using the Research Project Request Form. Once approved and processed, the Green data will be mounted in your project's datamart directory.

Continuous scripts

Available in:
Green environment
Yellow environment

Scripts that take a considerable amount of time to run or ones that pull regularly from an API may be set up as a persistent background process so that you may disconnect from compute without terminating your script. To do this, you can combine the "no hangup" command (nohup) with the background process argument (&). This will send your process to the background and keep it running, even if you disconnect from compute.

You can see how this would be done with an example script called scraper.py. The process id for your script will be printed to the terminal on the following line, as shown below.

$ nohup scraper.py &
[1] 60692

You can use the process id in the event you need to interrupt your script. This can be done with the kill command.

$ kill 60692

If you have disconnected from compute and did not save the process id, you can find this by running ps -ef and then grep for the name of your script, as shown below.

$ ps -ef | grep scraper.py

Scheduled scripts

Available in:
Green environment
Yellow environment

Cron jobs are useful for running scripts on a regular basis, such as an hourly or daily API call. To do so, you will need to request that CUSP IT add your user to the HBAC Rule "compute_cron". Please provide a brief description as to why you are requesting cron access.

If approved, you can learn more on how to use crontab here:
https://help.ubuntu.com/community/CronHowto

Database policies

Available in:
Green environment
Yellow environment

The Data Facility offers PostgreSQL and Oracle (Green environment only) as database options for projects. Databases must be requested as part of a new or existing research project using the Research Project Request form.

When requesting a database, the project requestor will be given administrative privileges over the database. However, all other access levels for collaborators must be specified in the request. The levels of access are as follows.

Data Reader: Read only (Select)
Data Writer: Read/Write (Select, Insert, Update, Delete)
DDL Admin: (Create table, Alter table, Drop table, create index)

Please visit the Databases section on the Data Hub to learn more about using databases in your project.

If you are unsure of your needs, the Data Facility team can provide consultation based on the requirements, goals, and objectives of your project. Simply state that you are seeking guidance on what best fits your project in the Addition Support section of the form and a member of the Data Facility team will help.

Password reset

If you have forgotten your password, you will need to email CUSP IT and request that it be reset. Once processed, you will receive a new password with instructions on how to change it.

If you know your password and you would simply like to reset it, please visit https://serv.cusp.nyu.edu/ipa/ui/reset_password.html

TOOLS

FORMS

LINKS

SUPPORT

CUSP Resources

FAQ

Set up a virtual environment in compute

Virtual environments and Jupyter Hub

Transfer files to home directory or project workspace

Green data in the Yellow environment

Continuous scripts

Scheduled scripts

Database policies

Password reset