Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Athokshay Ashok
Coursera_Capstone
Commits
07a79bf0
Unverified
Commit
07a79bf0
authored
Aug 01, 2020
by
Athokshay Ashok
Committed by
GitHub
Aug 01, 2020
Browse files
Add files via upload
parent
8e753a32
Changes
1
Hide whitespace changes
Inline
Side-by-side
Week 3 Part 1.ipynb
View file @
07a79bf0
...
...
@@ -68,6 +68,13 @@
"## Week 3: Segmenting and Clustering the Neighborhoods in the City of Toronto, Canada"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part 1"
]
},
{
"cell_type": "code",
"execution_count": 53,
...
...
@@ -262,7 +269,7 @@
},
{
"cell_type": "code",
"execution_count":
null
,
"execution_count":
58
,
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -276,7 +283,7 @@
},
{
"cell_type": "code",
"execution_count": 5
5
,
"execution_count": 5
9
,
"metadata": {},
"outputs": [
{
...
...
@@ -349,7 +356,7 @@
"M7A Downtown Toronto Queen's Park, Ontario Provincial Government"
]
},
"execution_count": 5
5
,
"execution_count": 5
9
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -364,7 +371,7 @@
},
{
"cell_type": "code",
"execution_count":
5
6,
"execution_count": 6
0
,
"metadata": {},
"outputs": [
{
...
...
@@ -437,7 +444,7 @@
"M7A Downtown Toronto Queen's Park, Ontario Provincial Government"
]
},
"execution_count":
5
6,
"execution_count": 6
0
,
"metadata": {},
"output_type": "execute_result"
}
...
...
@@ -450,7 +457,7 @@
},
{
"cell_type": "code",
"execution_count":
57
,
"execution_count":
61
,
"metadata": {},
"outputs": [
{
...
...
@@ -459,7 +466,7 @@
"(103, 2)"
]
},
"execution_count":
57
,
"execution_count":
61
,
"metadata": {},
"output_type": "execute_result"
}
...
...
%% Cell type:markdown id: tags:
# IBM Capstone Project
%% Cell type:code id: tags:
```
python
import
numpy
as
np
import
pandas
as
pd
!
pip
install
BeautifulSoup4
!
pip
install
requests
!
pip
install
lxml
```
%%%% Output: stream
Requirement already satisfied: BeautifulSoup4 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (4.9.1)
Requirement already satisfied: soupsieve>1.2 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from BeautifulSoup4) (2.0.1)
Requirement already satisfied: requests in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (2.24.0)
Requirement already satisfied: idna<3,>=2.5 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (1.25.10)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (from requests) (2019.11.28)
Requirement already satisfied: lxml in c:\users\athok\miniconda3\envs\ml135_env\lib\site-packages (4.5.2)
%%%% Output: stream
ERROR: Could not find a version that satisfies the requirement xml (from versions: none)
ERROR: No matching distribution found for xml
%% Cell type:code id: tags:
```
python
print
(
'Hello Capstone Project Course!'
)
```
%%%% Output: stream
Hello Capstone Project Course!
%% Cell type:markdown id: tags:
## Week 3: Segmenting and Clustering the Neighborhoods in the City of Toronto, Canada
%% Cell type:markdown id: tags:
### Part 1
%% Cell type:code id: tags:
```
python
from
bs4
import
BeautifulSoup
import
requests
#Use Beautiful Soup to extract page text
source
=
requests
.
get
(
"https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
)
soup
=
BeautifulSoup
(
source
.
text
,
'html.parser'
)
#Find table in HTML and extract all data into rows
data
=
[]
columns
=
[]
table
=
soup
.
find
(
class_
=
'wikitable'
)
for
index
,
tr
in
enumerate
(
table
.
find_all
(
'tr'
)):
section
=
[]
for
td
in
tr
.
find_all
([
'th'
,
'td'
]):
section
.
append
(
td
.
text
.
rstrip
())
if
(
index
==
0
):
columns
=
section
else
:
data
.
append
(
section
)
canada_df
=
pd
.
DataFrame
(
data
=
data
,
columns
=
columns
)
canada_df
.
head
()
```
%%%% Output: execute_result
Postal Code Borough Neighbourhood
0 M1A Not assigned Not assigned
1 M2A Not assigned Not assigned
2 M3A North York Parkwoods
3 M4A North York Victoria Village
4 M5A Downtown Toronto Regent Park, Harbourfront
%% Cell type:code id: tags:
```
python
#Remove all rows where borough is not assigned
canada_df
=
canada_df
[
canada_df
[
'Borough'
]
!=
'Not assigned'
]
canada_df
.
head
()
```
%%%% Output: execute_result
Postal Code Borough Neighbourhood
2 M3A North York Parkwoods
3 M4A North York Victoria Village
4 M5A Downtown Toronto Regent Park, Harbourfront
5 M6A North York Lawrence Manor, Lawrence Heights
6 M7A Downtown Toronto Queen's Park, Ontario Provincial Government
%% Cell type:code id: tags:
```
python
# More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page,
# you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park.
# These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in
#the above table.
# This did not need to be addressed since the data was already grouped by postal code with all the corresponding neighborhoods.
```
%% Cell type:code id: tags:
```
python
#Update index to be postcode
if
(
canada_df
.
index
.
name
!=
'Postal Code'
):
canada_df
=
canada_df
.
set_index
(
'Postal Code'
)
canada_df
.
head
()
```
%%%% Output: execute_result
Borough Neighbourhood
Postal Code
M3A North York Parkwoods
M4A North York Victoria Village
M5A Downtown Toronto Regent Park, Harbourfront
M6A North York Lawrence Manor, Lawrence Heights
M7A Downtown Toronto Queen's Park, Ontario Provincial Government
%% Cell type:code id: tags:
```
python
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
canada_df
[
'Neighbourhood'
].
replace
(
"Not assigned"
,
canada_df
[
"Borough"
],
inplace
=
True
)
canada_df
.
head
()
```
%%%% Output: execute_result
Borough Neighbourhood
Postal Code
M3A North York Parkwoods
M4A North York Victoria Village
M5A Downtown Toronto Regent Park, Harbourfront
M6A North York Lawrence Manor, Lawrence Heights
M7A Downtown Toronto Queen's Park, Ontario Provincial Government
%% Cell type:code id: tags:
```
python
canada_df
.
shape
```
%%%% Output: execute_result
(103, 2)
%% Cell type:code id: tags:
```
python
```
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment