4.1 Bike Sharing Counts (Regression)

This dataset contains daily counts of rented bikes from bike sharing company Capital-Bikeshare in Washington D.C., along with weather and seasonal information. The data was kindly open sourced by Capital-Bikeshare and the folks from Fanaee-T and Gama (2013)10 have added the weather data and the seasonal information. The goal is to predict how many bikes will be rented given weather and day. The data can be downloaded from the UCI Machine Learning Repository.

For the examples, new features were introduced and not all original features were used. Here is the list of features that were used:

  • season : spring (1), summer (2), autumn (3), winter (4).
  • holiday : Binary feature indicating if the day was a holiday (1) or not (0).
  • yr: The year (2011 or 2012).
  • days_since_2011: Number of days since the 01.01.2011 (the first day in the dataset). This feature was introduced to account for the trend, in this case that the bike rental became more popular over time.
  • workingday : Binary feature indicating if the day was a workingday (1) or weekend / holiday (0).
  • weathersit : The weather situation on that day
    • Clear, Few clouds, Partly cloudy, Cloudy
    • Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
    • Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
    • Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
  • temp : Temperature in degrees Celsius.
  • hum: Relative humidity in percent (0 to 100).
  • windspeed: Wind speed in km per hour.
  • cnt: Count of bikes including both casual and registered. The count is used as the target in the regression tasks.

  1. Fanaee-T, Hadi, and Joao Gama. 2013. “Event Labeling Combining Ensemble Detectors and Background Knowledge.” Progress in Artificial Intelligence. Springer Berlin Heidelberg, 1–15. doi:10.1007/s13748-013-0040-3.