Data Science Week: Retake

February 14, 2022— February 28, 2022

Contents

What is it?

Over the course of two weeks, you will compete with other teams of behavioural data scientists to solve a real data science case. BDSi staff will organise several seminars and workshops throughout the first week to introduce the various steps involved in data science. In the second week you and your team will compete to provide the best solutions to the data science problem posed. Throughout both weeks coaches will be available to guide you when you run into problems.

The goal of the Data Science week is to introduce interested students and staff to data science in a fun and cooperative way, and help create a community of data scientists at BMS. After one week, the best teams will be asked to present their solutions, and the winners will be presented with a suitable prize.

Sign up now!

This is a retake
Note that this is a retake of the first Data Science week, using the same data and asking the same questions.

We received a lot of feedback stating that the timing for the first Data Science Week could have been better. There was not a lot of time to prepare, and the event coincided with the start of the academic year. We've since also learned a thing or two about holding online events, and have made some updates to both the materials and schedules of the seminars.

To give everyone a chance to participate in the best possible Data Science Week, we decided to re-use the same materials and questions. Wether or not you participated in the first Data Science Week, you are more than welcome to join us in this second - even better - iteration!

Who can join?

Everyone related to the faculty of Behavioural and Management Sciences and their friends and family can join (although at least one member of your team needs to have a University of Twente account in order to sign up). You can join with friends, colleagues or even family. The event is open to both novices and experts, and everyone in between. Three (virtual) lunch workshops will introduce the main steps in all data science projects. You can join as a team, or alone. If you do join alone you can choose to be assigned to a team with other data science enthusiasts.

What can I do to prepare?

Get a team

First off, get a team together, and sign up now - or just sign up on your own.

Sign up now!

Read the book (or at least pretend to)

The materials we will use are based on the freely available Introduction to Statistical Learning book. If you’re interested in data science, statistical learning or machine learning, this book would be a great place to start. BDSi also organizes a yearly reading club around this book.

Install R, RStudio, and tidyverse

As a faculty, BMS has decided to use R for statistical education. We will follow this example, and use R and the tidyverse packages in the workshops and seminars. If you do not already have a preferred programming language, you may want to install R and RStudio. ModernDive has published a good primer on installing R and RStudio, that also covers the basics of working in R. If you’d like to go further, we recommend the free R for Data Science book by Hadley Wickham - a name you’ll encounter often in the R community.

What will we do?

Amin Asadi was kind of enough to record a video presentation of his submission for last years' Data Science Week - have a look!

Amin chose to meet the challenge on his own, but you don't have to. In fact, the datathon is best performed as a group - so that you can learn from each other, and explore together. BDSi staff will be available to give you a helping hand if and when you get stuck.

Schedule

The Data Science Week will start and end with a group session on monday the 14th and 28th. You will be free to work on the case on your own schedule, and coaches will be available for questions and feedback throughout.

Scheduling of the individual workshops and seminars will be updated in the coming weeks.

data science week schedule

Kickoff

February 14th, 12:45 – 13:30

After a quick introduction about BDSi, we will introduce the topic, and give a description of the dataset, and the problem you will solve. We will also explain how to reach the coaches for help, and give a brief overview of the schedule.

Workshop Data Wrangling

February 15th, 12:45 - 13:30

A 45 minute guided introduction to data wrangling in R, using the ‘tidy’ data principles. Karel Kroeze will show how to prepare a ‘raw’ dataset for analysis, by cleaning, reshaping and mutating the data until it gives up all its secrets.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here, or sign up directly here.

Workshop Data Visualization

February 16th, 12:45 – 13:30

A 45 minute guided overview of data visualization using the grammar of graphics. Karel Kroeze will explain the principles of creating and layering visualizations with ggplot in R, and give a quick introduction to interactive visualizations with plotly, shiny and beyond.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here, or sign up directly here.

Workshop Machine Learning

February 17th, 12:45 - 13:30

A 45 minute guided overview of basic machine learning techniques. Anna Machens will take you through the basics of model fitting, paramater selection and hyperparameter tuning, ending up with a simple but effective predictive model.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here, or sign up directly here.

Submission Deadline

February 27th, 12:00

After spending all weekend with your team fine-tuning your solutions, you will have to submit them before midnight on Sunday. That gives us a bit of time to check your models and pick a winner. In the mean time, you can practice your victory speech - or suddenly have a brilliant idea that it’s too late to implement before submission.

Closing Session

February 28th, 12:45 – 13:30

The teams that created the best and most creative solutions will give a short presentation about their approach, and there will be time to ask questions to the winning teams as well as BDSi staff and coaches.

References

header image adapted from upklyak