Assignments
These two assignments open ended applications of the work that we have done in the rest of the course presented as a web-written document.
These two assignments open ended applications of the work that we have done in the rest of the course presented as a web-written document.
The data Uber collects about Pi
Pi Ko
May 6, 2022 - 1700 words . 5.7 min read
💻 Note : Please view the article on computer for best experience.
Figure : Uber Technologies, Inc., an American mobility as a service provider [ Source ]
Platform economy is the trend in commerce to move toward digital platform business models. Platforms are computer systems which allows the consumers, entrepreneurs and businesses to meet, buy and sell goods and services virtually. Nowadays, many firms has moved towards the platform economy business model, including Uber, a platform which I am intending to analyze my personal data for this essay.
Uber is a digital platform, where the drivers and delivery workers can connect with riders, businesses and eaters. According to my professor, Uber is not a transportation platform, but a data company. It's main asset is not having the ability to transport the passengers, but having access to the immense data of passengers, rides and the drivers so that it can connect all the stakeholders to do the tangible business transactions.
For this assignment, my task is to identify a platform or app which collects data about me. I am requesting my personal data from Uber and analyzing it because I use it frequently for practical purposes (travelling around the city), would be easy to analyse the geospatial trends if any is found. I would consider myself as a light user of the platform since I only hired taxi using Uber for 25 times over a 5 months period. I am requesting my personal data after my 25th ride.
The process to request our personal data from Uber is very easy and straightforward, without any obstacles. Therefore, I chose Uber initially and did not change my choice of platform afterwards. I only needed to wait for 12 hours after I requested the download, until the file becomes available for download. To download, we simply go to the Uber website as shown below, and click the download tab. Once we have clicked, Uber will send us an automated email stating that we have requested the data.
Figure : Requesting User Data from Uber
Once the data is ready after 12 hours, my personal data from Uber can be downloaded in a Zip file from the website's download page. However, since this is personal data, the data is to be deleted after 7 days, and Uber recommends us to download the file on our personal computers only.
Figure : Email from Uber for Requested Data Download
The files extracted from the zip file are shown below in the screenshot. Most of them are in comma separated values (csv) files. There is one readme webpage (html file) and one text (.txt) file with notice for California users. The data is splitted into three folders, namely "Account and Profile", "Regional Information" and "Rider". Only the contents of "Account and Profile", and "Rider" folders are specific to me. The contents of each files are described briefly as below.
Figure : Files extracted from Uber Data Zip File
Personally, I was not amazed by the data obtained because I was expecting the exact same. This is the same data as Uber has described on their Uber Help page. Nevertheless, I know that this is not all the data Uber has about me, because this data is only the data which I submitted when filling out the account registration form, plus the trip data which Uber automatically generated. There must be the data that Uber has process further than this for marketing purposes, or to use my data with the rest of other passengers' data to predict the traffic flow, consumer behaviour.etc. However, this kind of high-level data is not included in the downloaded data. Uber did not deny keeping these data secret. On their website, they explicitly states the following about our data downloads.
“Some information is reasonably not included in your data download. This can be for security reasons or because the information is proprietary. ”
― Uber Help
Sample data from two of the most important csv files which I am going to visualize and analyze are shown below. Important personal information specific to me is replaced with xx for privacy protection. Please note that the actual data downloaded is much larger than what is shown here.
The first csv file is trips_data.csv, which contains information about the taxi trips I have booked, including the start location, end location, price, distance, and addresses. One inconsistent thing about this data is that for the taxi trips I have cancelled, the Begin Trip Time is stated as 1st Jan 1970. This might be a convention used for non-existing dates, but made the data analysis harder. Luckily, I have ignored or corrected this data in processing.
City | Request Time | Begin Trip Time | Begin Trip Lat | Begin Trip Lng | Begin Trip Address | Dropoff Time | Dropoff Lat | Dropoff Lng | Dropoff Address | Distance (miles) | Fare Amount | Fare Currency |
---|---|---|---|---|---|---|---|---|---|---|---|---|
196 | 2022-03-27 09:19:10 | 2022-03-27 09:28:24 | xx | xx | xx | 2022-03-27 09:48:45 | xx | xx | xx | xx | 46.56 | AED |
196 | 2022-03-27 08:41:40 | 2022-03-27 08:52:51 | xx | xx | xx | 2022-03-27 09:10:25 | xx | xx | xx | xx | 41.95 | AED |
The next csv data sample is communications_sent-0.csv, which contains the messages sent from me to the Uber drivers, and timestamps as follows. The subject field is empty in this data file. This only contains the messages I sent out to the Uber drivers, but does not include the incoming messages to me.
Time | Subject | Content | Medium | Direction |
---|---|---|---|---|
2022-02-25 05:12:55 | I cannot see you | intercom_text | out | |
2022-02-25 05:12:55 | I cannot see you | intercom_text | out | |
2022-02-25 05:11:28 | Yes | intercom_text | out | |
2022-02-25 05:11:28 | Yes | intercom_text | out | |
2022-02-25 05:10:36 | Coming | intercom_text | out | |
2022-02-25 05:10:36 | Coming | intercom_text | out |
For the data obtained, I tried to make 4 different visualizations to extract the patterns and spatial trends. They are namely (1) the word cloud of most frequent words in my Uber messages, (2) Distance travelled in each trip, (3) Price per mile travelled and (4) a scatter plot of latitude and longitude for the locations I travelled to.
Figure : Most frequent words from my Uber messages
The word cloud shown above is generated from communications_sent-0.csv file using an online tool called voyant-tools. This visualization selects the 25 most frequent words in my chat history with Uber drivers, and renders them into an infographic with the size of the word proportional to their frequency. According to the word cloud above, it appears that "coming" is the most used word. It is expected because that is what I send to every driver after they have arrived, to notify them that I am coming to the pick-up spot.
I also tried to plot the distance travelled (in miles) with the dates for each trip. It is found that for almost all the trips, I travel around 6-7 miles per trip, with 2 very odd outliers (14 miles on 4 Feb and 20 miles on the same day. I only travel around the city if I need to do any administrative tasks (e.g. managing bank accounts, applying for visa). For most of the times, it seems that all my tasks in the city are within 6 to 7 miles of my residence.
Moreover, I also tried to plot the price per miles travelled of all the trips I have travelled to see if Uber is charging a fair price (or a consistent price). The result was very unexpected because I was expecting a consistent price for price per mile. However, it is found out that normally, Uber charges me at an average of 6 AED per mile. In only two days (19 March and 13 Jan), the prices are much higher (two or three times) than the average. Upon further analysis, I found out that on these days, I travelled for a very short distance. This might be the reason for price discrimination from Uber.
In addition, I also plotted the latitude and longitude (normalized to protect privacy) of my dropoff locations into a scatter plot to see spatial information about my Uber taxi trips. It is found out that for most of the times, I have been taking Uber taxi to go to a particular location on the right side of the map (1,0.997). These are my trips back to my residence. For all other points, they are distributed across the map without a particular pattern (except at (0.999,0.996) and (1,0.996)).
The key takeaways for my data visualizations are that
According to Uber Privacy Notice, it states that "Uber does not sell or share user personal data with third parties for their direct marketing, except with users' consent." However, we have no good way of verifying this. In the same document, it is stated that "Uber may share a user's personal data other than as described in this notice if we notify the user and they consent to the sharing." . The statement is written in very vague terms so that it is highly questionable whether Uber does not sell the data.
I do think that I gained a lot more knowledge about platform economy from this assignment activity and what kind of personal data these big companies really hold. However, since they are only showing the data which we have given them, and not showing the data which was processed based on our data to make business decisions, I believe that deep down inside, there is a lot more personal data which is not returned.
✓ Ready to Grade - 6th May 2022