Public-use data, with or without usage restrictions, that has only non-personal information.
De-identified data with usage restrictions. These data do not contain any direct identifiers but contain sensitive or restricted information.
Data that include any kind of direct personal identifier, such as names, addresses, or SSN.
Available in:
Green environment
By setting up a python virtual environment, you'll be able to easily install necessary packages for your projects without needing administrative privileges on compute. To do so, run the following.
conda create -n JupyterVE python=2.7 anacondaThis will create a virtual environment with all anaconda libraries included. In this example, we are naming the environment JupyterVE and using python version 2.7. You may, of course, change these to your specifications. You'll then need to activate the environment by running:
source activate JupyterVEYou will notice your command line prompt change to be similar to the following:
(JupyterVE) [abc123@compute ~]$This indicates that you are working in a virtual environment. You can now install any additional packages you would like to use and they will be contained within this environment. If you would like to exit the virtual environment, run the following:
source deactivateYou will then be returned to your normal environment. Any packages you installed while in the virtual environment will persist there and be available upon activation. You can return to the virtual environment any time by once again running the source activate JupyterVE command.
Available in:
Green environment
To use your installed packages in Jupyter Hub, you will need to create a new kernel from your virtual environment. To do so, make sure your virtual environment is active and then run the following command.
python -m ipykernel install --user --name JupyterKern --display-name JupyterKernNote: You may change the name and display-name to be whatever you prefer and they may be different. In this example, we use JupyterKern for both.
Now login to Jupyter Hub and create or start a notebook. You can select the new kernel under the Kernel -> Change kernel menu option. All of the packages that were installed in the virtual environment will now available in your notebook.
If you would like to add environment variables to your notebook, you will need to edit the ~/.local/share/jupyter/kernels/JupyterKern/kernel.json file. Below is an example with the variable MY_VAR added.
{ "display_name": "JupyterKern", "language": "python", "argv": [ "/opt/rh/anaconda/root/bin/python", "-m", "ipykernel", "-f", "{connection_file}" ], "env": { "MY_VAR": "HELLO WORLD" } }
Available in:
Green environment
You may find yourself wanting to bring your own code or data into the Green environment. This can be done by transferring the files via SFTP. There are a number of SFTP clients that make transferring a drag and drop process. However, using the command line is perhaps just as quick and effective.
From your local machine, make sure you are in the directory that contains the file you would like to transfer and run the following. Be sure to replace cuspid and filename.txt with the proper entries for your use case.
$ sftp cuspid@staging.cusp.nyu.edu:/home/cuspidAt this point, the file is now in your home directory, which is mounted on all CUSP machines, including compute. If you would like to move the file into your project workspace, you will need to ssh to compute (through gw) and move the file as shown below.
$ ssh cuspid@gw.cusp.nyu.edu
Available in:
Yellow environment
While Green data can be made available in the Yellow environment, it is not mounted there by default.
For Yellow projects which require access to Green data, please submit an amendment to your project by using the Research Project Request Form. Once approved and processed, the Green data will be mounted in your project's datamart directory.
Available in:
Green environment
Yellow environment
Scripts that take a considerable amount of time to run or ones that pull regularly from an API may be set up as a persistent background process so that you may disconnect from compute without terminating your script. To do this, you can combine the "no hangup" command (nohup) with the background process argument (&). This will send your process to the background and keep it running, even if you disconnect from compute.
You can see how this would be done with an example script called scraper.py. The process id for your script will be printed to the terminal on the following line, as shown below.
$ nohup scraper.py &You can use the process id in the event you need to interrupt your script. This can be done with the kill command.
$ kill 60692If you have disconnected from compute and did not save the process id, you can find this by running ps -ef and then grep for the name of your script, as shown below.
$ ps -ef | grep scraper.py
Available in:
Green environment
Yellow environment
Cron jobs are useful for running scripts on a regular basis, such as an hourly or daily API call. To do so, you will need to request that CUSP IT add your user to the HBAC Rule "compute_cron". Please provide a brief description as to why you are requesting cron access.
If approved, you can learn more on how to use crontab here:
https://help.ubuntu.com/community/CronHowto
Available in:
Green environment
Yellow environment
The Data Facility offers PostgreSQL and Oracle (Green environment only) as database options for projects. Databases must be requested as part of a new or existing research project using the Research Project Request form.
When requesting a database, the project requestor will be given administrative privileges over the database. However, all other access levels for collaborators must be specified in the request. The levels of access are as follows.
Please visit the Databases section on the Data Hub to learn more about using databases in your project.
If you are unsure of your needs, the Data Facility team can provide consultation based on the requirements, goals, and objectives of your project. Simply state that you are seeking guidance on what best fits your project in the Addition Support section of the form and a member of the Data Facility team will help.
If you have forgotten your password, you will need to email CUSP IT and request that it be reset. Once processed, you will receive a new password with instructions on how to change it.
If you know your password and you would simply like to reset it, please visit https://serv.cusp.nyu.edu/ipa/ui/reset_password.html