Assignments

These two assignments open ended applications of the work that we have done in the rest of the course presented as a web-written document.

Assignment 2

A trip down the Uber Lane

The data Uber collects about Pi

Pi Ko
May 6, 2022 - 1700 words . 5.7 min read

💻 Note : Please view the article on computer for best experience.

Contents

1. Introduction
- Platform economy
- Uber and I

2. Requesting Data
-Requesting
-Receiving

3. Description of Uber Data
-Individual Files Received
-Data Samples

4. Data Analysis and Visualizations
-Uber Messages
-Trip Distance
-Trip Price
-Locations
-Key Takeaways

5. Uber's Policies
-Uber Privacy Notice

6. Final Thoughts
-Personal Thoughts

1. Introduction

Platform Economy

Uber Banner

Figure : Uber Technologies, Inc., an American mobility as a service provider [ Source ]

Platform economy is the trend in commerce to move toward digital platform business models. Platforms are computer systems which allows the consumers, entrepreneurs and businesses to meet, buy and sell goods and services virtually. Nowadays, many firms has moved towards the platform economy business model, including Uber, a platform which I am intending to analyze my personal data for this essay.

Uber is a digital platform, where the drivers and delivery workers can connect with riders, businesses and eaters. According to my professor, Uber is not a transportation platform, but a data company. It's main asset is not having the ability to transport the passengers, but having access to the immense data of passengers, rides and the drivers so that it can connect all the stakeholders to do the tangible business transactions.

Uber and I

For this assignment, my task is to identify a platform or app which collects data about me. I am requesting my personal data from Uber and analyzing it because I use it frequently for practical purposes (travelling around the city), would be easy to analyse the geospatial trends if any is found. I would consider myself as a light user of the platform since I only hired taxi using Uber for 25 times over a 5 months period. I am requesting my personal data after my 25th ride.


2. Requesting Data

Requesting

The process to request our personal data from Uber is very easy and straightforward, without any obstacles. Therefore, I chose Uber initially and did not change my choice of platform afterwards. I only needed to wait for 12 hours after I requested the download, until the file becomes available for download. To download, we simply go to the Uber website as shown below, and click the download tab. Once we have clicked, Uber will send us an automated email stating that we have requested the data.

Uber Files

Figure : Requesting User Data from Uber

Receiving

Once the data is ready after 12 hours, my personal data from Uber can be downloaded in a Zip file from the website's download page. However, since this is personal data, the data is to be deleted after 7 days, and Uber recommends us to download the file on our personal computers only.

Uber Files

Figure : Email from Uber for Requested Data Download


3. Description of Uber Data

Individual Files Received

The files extracted from the zip file are shown below in the screenshot. Most of them are in comma separated values (csv) files. There is one readme webpage (html file) and one text (.txt) file with notice for California users. The data is splitted into three folders, namely "Account and Profile", "Regional Information" and "Rider". Only the contents of "Account and Profile", and "Rider" folders are specific to me. The contents of each files are described briefly as below.

  • communications_sent-0.csv : Messages sent from me to the Uber drivers, and timestamps
  • payment_methods-0.csv : The list of credit/debit cards which I use to pay
  • profile_data.csv : My name, email, phone number, and other personal information
  • rider_eater_saved_locations.csv : No data since I do not use Uber Eats
  • readme.html : A webpage which redirects to Uber Help
  • For_California_users.txt : A data privacy notice about users in California due to California Consumer Privacy Act (CCPA)
  • rider_app_analytics-0.csv : My approximate locations (latitude, longitude), logged while I am using Uber app. The data is not necessarily linked with my trips. This is more of where my phone is at at particular intervals for the app to keep track of my whereabouts.
  • trips_data.csv : The main csv document which contains information about the taxi trips I have booked, including the start location, end location, price, distance, and addresses.
Uber Files

Figure : Files extracted from Uber Data Zip File

Personally, I was not amazed by the data obtained because I was expecting the exact same. This is the same data as Uber has described on their Uber Help page. Nevertheless, I know that this is not all the data Uber has about me, because this data is only the data which I submitted when filling out the account registration form, plus the trip data which Uber automatically generated. There must be the data that Uber has process further than this for marketing purposes, or to use my data with the rest of other passengers' data to predict the traffic flow, consumer behaviour.etc. However, this kind of high-level data is not included in the downloaded data. Uber did not deny keeping these data secret. On their website, they explicitly states the following about our data downloads.

“Some information is reasonably not included in your data download. This can be for security reasons or because the information is proprietary. ”
Uber Help

Data Samples

Sample data from two of the most important csv files which I am going to visualize and analyze are shown below. Important personal information specific to me is replaced with xx for privacy protection. Please note that the actual data downloaded is much larger than what is shown here.

The first csv file is trips_data.csv, which contains information about the taxi trips I have booked, including the start location, end location, price, distance, and addresses. One inconsistent thing about this data is that for the taxi trips I have cancelled, the Begin Trip Time is stated as 1st Jan 1970. This might be a convention used for non-existing dates, but made the data analysis harder. Luckily, I have ignored or corrected this data in processing.

City Request Time Begin Trip Time Begin Trip Lat Begin Trip Lng Begin Trip Address Dropoff Time Dropoff Lat Dropoff Lng Dropoff Address Distance (miles) Fare Amount Fare Currency
196 2022-03-27 09:19:10 2022-03-27 09:28:24 xx xx xx 2022-03-27 09:48:45 xx xx xx xx 46.56 AED
196 2022-03-27 08:41:40 2022-03-27 08:52:51 xx xx xx 2022-03-27 09:10:25 xx xx xx xx 41.95 AED

The next csv data sample is communications_sent-0.csv, which contains the messages sent from me to the Uber drivers, and timestamps as follows. The subject field is empty in this data file. This only contains the messages I sent out to the Uber drivers, but does not include the incoming messages to me.

Time Subject Content Medium Direction
2022-02-25 05:12:55 I cannot see you intercom_text out
2022-02-25 05:12:55 I cannot see you intercom_text out
2022-02-25 05:11:28 Yes intercom_text out
2022-02-25 05:11:28 Yes intercom_text out
2022-02-25 05:10:36 Coming intercom_text out
2022-02-25 05:10:36 Coming intercom_text out

4. Data Analysis and Visualizations

Uber Messages

For the data obtained, I tried to make 4 different visualizations to extract the patterns and spatial trends. They are namely (1) the word cloud of most frequent words in my Uber messages, (2) Distance travelled in each trip, (3) Price per mile travelled and (4) a scatter plot of latitude and longitude for the locations I travelled to.

Word Cloud

Figure : Most frequent words from my Uber messages

The word cloud shown above is generated from communications_sent-0.csv file using an online tool called voyant-tools. This visualization selects the 25 most frequent words in my chat history with Uber drivers, and renders them into an infographic with the size of the word proportional to their frequency. According to the word cloud above, it appears that "coming" is the most used word. It is expected because that is what I send to every driver after they have arrived, to notify them that I am coming to the pick-up spot.


Distance Travelled

I also tried to plot the distance travelled (in miles) with the dates for each trip. It is found that for almost all the trips, I travel around 6-7 miles per trip, with 2 very odd outliers (14 miles on 4 Feb and 20 miles on the same day. I only travel around the city if I need to do any administrative tasks (e.g. managing bank accounts, applying for visa). For most of the times, it seems that all my tasks in the city are within 6 to 7 miles of my residence.


Price per mile

Moreover, I also tried to plot the price per miles travelled of all the trips I have travelled to see if Uber is charging a fair price (or a consistent price). The result was very unexpected because I was expecting a consistent price for price per mile. However, it is found out that normally, Uber charges me at an average of 6 AED per mile. In only two days (19 March and 13 Jan), the prices are much higher (two or three times) than the average. Upon further analysis, I found out that on these days, I travelled for a very short distance. This might be the reason for price discrimination from Uber.


Latitude-Longitude

In addition, I also plotted the latitude and longitude (normalized to protect privacy) of my dropoff locations into a scatter plot to see spatial information about my Uber taxi trips. It is found out that for most of the times, I have been taking Uber taxi to go to a particular location on the right side of the map (1,0.997). These are my trips back to my residence. For all other points, they are distributed across the map without a particular pattern (except at (0.999,0.996) and (1,0.996)).

Key Takeaways

The key takeaways for my data visualizations are that

  • The most frequent word I say to my Uber drivers is "coming".
  • I travel 6 miles per trip on average.
  • Uber charges more for shorter trips.
  • I always take a cab back to my residence whenever I go out with taxi.

5. Uber's Policies

Uber Privacy Notice

According to Uber Privacy Notice, it states that "Uber does not sell or share user personal data with third parties for their direct marketing, except with users' consent." However, we have no good way of verifying this. In the same document, it is stated that "Uber may share a user's personal data other than as described in this notice if we notify the user and they consent to the sharing." . The statement is written in very vague terms so that it is highly questionable whether Uber does not sell the data.


6. Final Thoughts

Personal Thoughts

I do think that I gained a lot more knowledge about platform economy from this assignment activity and what kind of personal data these big companies really hold. However, since they are only showing the data which we have given them, and not showing the data which was processed based on our data to make business decisions, I believe that deep down inside, there is a lot more personal data which is not returned.

✓ Ready to Grade - 6th May 2022