“The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on June 24, 2017.
The version of the Instacart data that we will use in this class can be found here.
Instacart is an online grocery service that allows you to shop online from local stores. In New York City, partner stores include Whole Foods, Fairway, and The Food Emporium. Instacart offers same-day delivery, and items that users purchase are delivered within 2 hours.
“The Instacart Online Grocery Shopping Dataset 2017” is an anonymized dataset with over 3 million online grocery orders from more than 200,000 Instacart users. However the dataset does not represent a random sampling of products, users, or purchases. Therefore, while the data allow examination of trends in online grocery purchasing, the results may not be generalizable to Instacart users more broadly.
“The Instacart Online Grocery Shopping Dataset 2017” website provides some summary results of interesting findings, including:
The original data is quite extensive, and the data linked to at the top of this page for use in the class represents a cleaned and limited version of the data. The dataset contains 1,384,617 observations of 131,209 unique users, where each row in the dataset is a product from an order. There is a single order per user in this dataset.
There are 15 variables in this dataset:
order_id
: order identifierproduct_id
: product identifieradd_to_cart_order
: order in which each product was added to cartreordered
: 1 if this prodcut has been ordered by this user in the past, 0 otherwiseuser_id
: customer identifiereval_set
: which evaluation set this order belongs in (Note that the data for use in this class is exclusively from the “train” eval_set
)order_number
: the order sequence number for this user (1=first, n=nth)order_dow
: the day of the week on which the order was placedorder_hour_of_day
: the hour of the day on which the order was placeddays_since_prior_order
: days since the last order, capped at 30, NA if order_number
=1product_name
: name of the productaisle_id
: aisle identifierdepartment_id
: department identifieraisle
: the name of the aisledepartment
: the name of the department