Introduction to Python Workshops

Author

Python Group

Welcome🙌

This workshop series is geared toward R and SAS users who are interested in exploring Python–a powerful, versatile programming language that excels in many areas such as machine learning, large-scale data processing, and broader data science applications.

Our goal is to help you build a solid foundation in Python and gain skills that can be integrated into your own work. This series will guide you through installing and setting up Python, understanding basic Python data structures, data manipulation through pandas, and, finally, applying machine learning methods using libraries such as scikit-learn.

All workshop materials, including pre-session handouts, coding demos, assignments, session slides, and video recordings, will be uploaded here. Be sure to check out the FAQ tab for common questions. If you have any questions, suggestions, or ideas for future sessions, please feel free to share them on our GitHub Discussions page. We value your feedback and aim to continually improve the workshops. We look forward to learning with you!

Workshop Information

Time and Location: Mon 2:30 - 4:30 pm, Wed 1:00 - 3:00 pm | Room 501C

Contact and Office Hours:

Session Date Outline Materials
Session 1: Python Installation and Reproducible Workflow (Yufei)

Monday (4/14)

2:30–4:30 pm

Python, VS Code, and conda virtual environment overview

Building a reproducible Python Project:

  • Create a GitHub repository for the workshop series
  • Set up a conda virtual environment through command lines
  • Set up Jupyter Notebook in VS Code

Installation guide html

Session Slides html

|

Session 2:

Intro to Python Data Structures (Patrick)

Wednesday (4/16)

1:00–3:00 pm

  • Data container types: Strings, lists, tuples, dictionaries, etc.
  • Handling missingness
  • Methods vs. functions

Pre-reading docx

Session Slides html

Session 3:

Intro to Pandas (Patrick)

Monday (4/21)

2:30–4:30 pm

  • pandas for data manipulation and cleaning
  • Data management
  • great_tables for gtsummary-style tables
Session Slides html
Session 4: Object-Oriented Programming & Intro to ML Libraries (Dani)

Wednesday (4/23)

1:00–3:00 pm

Part A: Introduction to object-oriented programming (OOP)

  • Recap: classes, methods, and attributes

  • Understanding classes (base, derived, mixins)

  • Creating classes and instances

  • Understanding inheritance

Part B: scikit-learn for machine learning

  • Clustering (k-means)

  • Prediction (k nearest neighbors)

  • Plotting with Python (plotnine, seaborn)

Pre-read: K-Means vs KNN and Plotting Primer

Session Slides html

Other Resources