Unverified Commit 07a79bf0 authored by Athokshay Ashok's avatar Athokshay Ashok Committed by GitHub
Browse files

Add files via upload

parent 8e753a32
......@@ -68,6 +68,13 @@
"## Week 3: Segmenting and Clustering the Neighborhoods in the City of Toronto, Canada"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 1"
]
},
{
"cell_type": "code",
"execution_count": 53,
......@@ -262,7 +269,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
......@@ -276,7 +283,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": 59,
"metadata": {},
"outputs": [
{
......@@ -349,7 +356,7 @@
"M7A Downtown Toronto Queen's Park, Ontario Provincial Government"
]
},
"execution_count": 55,
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
......@@ -364,7 +371,7 @@
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": 60,
"metadata": {},
"outputs": [
{
......@@ -437,7 +444,7 @@
"M7A Downtown Toronto Queen's Park, Ontario Provincial Government"
]
},
"execution_count": 56,
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
......@@ -450,7 +457,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": 61,
"metadata": {},
"outputs": [
{
......@@ -459,7 +466,7 @@
"(103, 2)"
]
},
"execution_count": 57,
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
......
%% Cell type:markdown id: tags:
# IBM Capstone Project
%% Cell type:code id: tags:
``` python
import numpy as np
import pandas as pd
!pip install BeautifulSoup4
!pip install requests
!pip install lxml
```
%%%% Output: stream
Requirement already satisfied: BeautifulSoup4 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (4.9.1)
Requirement already satisfied: soupsieve>1.2 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from BeautifulSoup4) (2.0.1)
Requirement already satisfied: requests in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (2.24.0)
Requirement already satisfied: idna<3,>=2.5 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (1.25.10)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (2019.11.28)
Requirement already satisfied: lxml in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (4.5.2)
%%%% Output: stream
ERROR: Could not find a version that satisfies the requirement xml (from versions: none)
ERROR: No matching distribution found for xml
%% Cell type:code id: tags:
``` python
print('Hello Capstone Project Course!')
```
%%%% Output: stream
Hello Capstone Project Course!
%% Cell type:markdown id: tags:
## Week 3: Segmenting and Clustering the Neighborhoods in the City of Toronto, Canada
%% Cell type:markdown id: tags:
### Part 1
%% Cell type:code id: tags:
``` python
from bs4 import BeautifulSoup
import requests
#Use Beautiful Soup to extract page text
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(source.text, 'html.parser')
#Find table in HTML and extract all data into rows
data = []
columns = []
table = soup.find(class_='wikitable')
for index, tr in enumerate(table.find_all('tr')):
section = []
for td in tr.find_all(['th','td']):
section.append(td.text.rstrip())
if (index == 0):
columns = section
else:
data.append(section)
canada_df = pd.DataFrame(data = data,columns = columns)
canada_df.head()
```
%%%% Output: execute_result
Postal Code Borough Neighbourhood
0 M1A Not assigned Not assigned
1 M2A Not assigned Not assigned
2 M3A North York Parkwoods
3 M4A North York Victoria Village
4 M5A Downtown Toronto Regent Park, Harbourfront
%% Cell type:code id: tags:
``` python
#Remove all rows where borough is not assigned
canada_df = canada_df[canada_df['Borough'] != 'Not assigned']
canada_df.head()
```
%%%% Output: execute_result
Postal Code Borough Neighbourhood
2 M3A North York Parkwoods
3 M4A North York Victoria Village
4 M5A Downtown Toronto Regent Park, Harbourfront
5 M6A North York Lawrence Manor, Lawrence Heights
6 M7A Downtown Toronto Queen's Park, Ontario Provincial Government
%% Cell type:code id: tags:
``` python
# More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page,
# you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park.
# These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in
#the above table.
# This did not need to be addressed since the data was already grouped by postal code with all the corresponding neighborhoods.
```
%% Cell type:code id: tags:
``` python
#Update index to be postcode
if(canada_df.index.name != 'Postal Code'):
canada_df = canada_df.set_index('Postal Code')
canada_df.head()
```
%%%% Output: execute_result
Borough Neighbourhood
Postal Code
M3A North York Parkwoods
M4A North York Victoria Village
M5A Downtown Toronto Regent Park, Harbourfront
M6A North York Lawrence Manor, Lawrence Heights
M7A Downtown Toronto Queen's Park, Ontario Provincial Government
%% Cell type:code id: tags:
``` python
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
canada_df['Neighbourhood'].replace("Not assigned", canada_df["Borough"],inplace=True)
canada_df.head()
```
%%%% Output: execute_result
Borough Neighbourhood
Postal Code
M3A North York Parkwoods
M4A North York Victoria Village
M5A Downtown Toronto Regent Park, Harbourfront
M6A North York Lawrence Manor, Lawrence Heights
M7A Downtown Toronto Queen's Park, Ontario Provincial Government
%% Cell type:code id: tags:
``` python
canada_df.shape
```
%%%% Output: execute_result
(103, 2)
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment