May 22, 2020
08:45 - 12:30 EST
Instructor: Karen Soenen
Helpers: Amber York, Brett Longworth, Stace Beaulieu
Workshops always have the cleanest, best examples of data tables to use, don’t they? These tables always seem to be immediately usable for analysis. Getting a raw table usable for analysis is a process called “data wrangling”. In this workshop we’ll show you how to get to this perfect table using Python and the package Pandas.
Doing this process correctly will not only make you more efficient, but it will also make your data easier to reuse in the future.
The workshop is sponsored by a WHOI Academic Programs Doherty Award and a DDVPR Technical Staff Training Award
Who: This workshop is targeted towards improving project efficiency and building technical skills. The workshop will only be held for 10 people at a time through an online Zoom meeting. Registration is required. Please contact stace@whoi.edu for availability.
Requirements:
Accessibility: We are dedicated to providing a positive and accessible learning environment for all. Please notify the instructors in advance of the workshop if you require any accommodations or if there is anything we can do to make this workshop more accessible to you.
Contact: Please email ksoenen@whoi.edu for more information.
The process called “data wrangling”, i.e., manipulating data into a usable form and diagnosing data quality issues often constitutes the most tedious and time-consuming aspect of analysis.
This workshop is for you if you:
We will be using the Carpentries code of conduct for this workshop.
Everyone who participates in this workshop is required to conform to the Code of Conduct.
Please be sure to complete these surveys before and after the workshop.
This workshop is based on a few workshops developed by the Carpentries (See https://carpentries.org for more information about the Carpentries organisation.) and by Joe Futrelle (WHOI):
TIME | SUBJECT | TOPICS COVERED | NOTEBOOK/EXERCISE | MORE RESOURCES |
08:45 | Introduction | |||
09:00 | Formatting data tables in Spreadsheets | How do we format data in spreadsheets for effective data use? | Carpentries: data table | |
09:10 | Excercise | How can this table be improved to start analysis in python? Excercise in breakout rooms | Datatable for exercise | Carpentries exercise and discussion |
09:35 | Date Notation | Good approaches for handling dates in spreadsheets | Carpentries: Dates | |
09:45 | Break | 15 minute break |
TIME | SUBJECT | TOPICS COVERED | NOTEBOOK/EXERCISE | MORE RESOURCES |
10:00 | Starting with Python | What is Python? Data types Mathematical operations Lists |
Notebook: Starting with Python | Carpentries: Intro to Python I Carpentries: Intro to Python II First commands, Notebook Joe Futrelle, WHOI |
10:20 | The Pandas Library | What is Pandas? How do I import data What is a dataframe? How can I access specific data within my data set? |
Notebook: The Pandas Library | Carpentries: Starting with data Carpentries: Indexing, Slicing and Subsetting DataFrames in Python Anatomy of a DF, Notebook Joe Futrelle, WHOI |
10:35 | Excercise | |||
10:45 | Break | 15 minute break |
TIME | SUBJECT | TOPICS COVERED | NOTEBOOK/EXERCISE | MORE RESOURCES |
11:00 | Further manipulation of a data frame | Sorting Unique values Logical conditions Summary statistics Groups Merging dataframes |
Notebook: Further manipulation of a dataframe |
Carpentries: Statistics, groups and basic math
Carpentries: Merging data Querying and merging DFs, Notebook Joe Futrelle, WHOI |
11:30 | Excercise | |||
11:45 | Break | 15 minute break | ||
12:00 | Questions? | |||
12:15 | Wrap-up |
To participate in this workshop, you will need an up-to-date web browser and access access to a spreadsheet program (Excel, LibreOffice,...), Python and Jupyter notebooks. In addition you will need an up-to-date web browser.
You only need to install these programs:
Please make sure you have installed all the required packages before the start of this workshop. We will be holding an on-line data lab with Stace Beaulieu on May 20 and can help you install the packages if necessary.
We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.