This lesson is still being designed and assembled (Pre-Alpha version)

Data Pre-Processing using Python: Setup

Software setup

We will be using Jupyter notebook for this workshop. We install the Anaconda navigator. Anaconda is an open source distribution, which provides the easiest way to code in python, especially for data science.

Virtual Lab

If you would prefer not to install the software for this workshop on your computer, you may use the Virtual lab service run by Technology Services. This allows you to use a virtual machine either from your web browser or from a desktop app installed on your computer. Overall you may have a better experience using it from the desktop app, but the browswer should suffice for most workshops.

See browser instructions here
See desktop instructions here

Text Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words. The default text editor on macOS and Linux is usually set to Vim, which is not famous for being intuitive. If you accidentally find yourself stuck in it, hit the Esc key, followed by :+Q+! (colon, lower-case 'q', exclamation mark), then hitting Return to return to the shell.

nano is a basic editor and the default that instructors use in the workshop. It is installed along with Git.

nano is a basic editor and the default that instructors use in the workshop. See the Git installation video tutorial for an example on how to open nano. It should be pre-installed.

Video Tutorial

nano is a basic editor and the default that instructors use in the workshop. It should be pre-installed.

Setup files:

Please download the following files to particpate in the workshop:

data file
script file

Launching Jupyter on Anaconda

We can use Anaconda Navigator to access Jupyter and other tools(pyCharm etc) provided in Anaconda.

For Windows Users:

  1. Click Start
  2. Search and select Anaconda Navigator from the menu.
  3. Once the Navigator opens up. Select Jupyter Notebook from the tools available.
  4. Jupyter will open up on a new tab in the browser.
  5. Navigate to the required destination.
  6. Click on new - > Notebook
  7. The script file opens up.

For Mac Users:

  1. Click Launchpad and select Anaconda Navigator. Or, use Cmd+Space to open Spotlight Search and type “Navigator” to open the program.
  2. Once the Navigator opens up. Select Jupyter Notebook from the tools available.
  3. Jupyter will open up on a new tab in the browser.
  4. Navigate to the required destination.
  5. Click on new - > Notebook
  6. The script file opens up.

NOTE:

While we are using jupyter notebook for the purpose of this workshop, we can use any text editor to write our program and run it using terminal/command prompt. To do so-

  1. Open the command prompt/terminal and type pip install python3
  2. Once the installation is complete, open any editor and type you code in it. Make sure you save the file with a .py extension.
  3. Go to the command prompt/terminal and navigate to the filder where the file is saved.
  4. type python3 “name of the file”.py

About the Data Used in this Workshop:

The data set being used in this workshop is “auto-mpg.csv”. It contains information regarding varios parts. It was collected by Carnegie Mello University. We will perform data pre-processing on this workshop. Additionally, as homework, you will be required to perform visualization on this dataset.