1. Download Python
https://www.python.org/downloads/
2. Install Python
Create a Python folder and run executable (name likley to diifer)
C:\Users\arrge\Python>C:\Users\arrge\Downloads\python-3.9.1-amd64.exe
3. Get PIP
https://bootstrap.pypa.io/get-pip.py
Save it somewhere
C:\Users\arrge\Python>py get-pip.py
C:\Users\arrge\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
warnings.warn(
Collecting pip
Downloading pip-21.0.1-py3-none-any.whl (1.5 MB)
|████████████████████████████████| 1.5 MB 2.2 MB/s
Collecting wheel
Downloading wheel-0.36.2-py2.py3-none-any.whl (35 kB)
Installing collected packages: wheel, pip
WARNING: The script wheel.exe is installed in 'C:\Users\arrge\AppData\Local\Programs\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Attempting uninstall: pip
Found existing installation: pip 20.2.3
Uninstalling pip-20.2.3:
Successfully uninstalled pip-20.2.3
WARNING: The scripts pip.exe, pip3.9.exe and pip3.exe are installed in 'C:\Users\arrge\AppData\Local\Programs\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-21.0.1 wheel-0.36.2
C:\Users\arrge\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
warnings.warn(
Collecting pip
Downloading pip-21.0.1-py3-none-any.whl (1.5 MB)
|████████████████████████████████| 1.5 MB 2.2 MB/s
Collecting wheel
Downloading wheel-0.36.2-py2.py3-none-any.whl (35 kB)
Installing collected packages: wheel, pip
WARNING: The script wheel.exe is installed in 'C:\Users\arrge\AppData\Local\Programs\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Attempting uninstall: pip
Found existing installation: pip 20.2.3
Uninstalling pip-20.2.3:
Successfully uninstalled pip-20.2.3
WARNING: The scripts pip.exe, pip3.9.exe and pip3.exe are installed in 'C:\Users\arrge\AppData\Local\Programs\Python\Python39\Scripts' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed pip-21.0.1 wheel-0.36.2
4. Install Beautiful Soup and Requests
pip install beautifulsoup4
pip install requests
5. Write script test.py
from bs4 import BeautifulSoup, SoupStrainer
import requests
for i in range (2,136):
xstr = str(i)
if i < 10:
xstr = "00" + str(i)
elif i < 100:
xstr = "0" + str(i)
url = "http://www.thearsenalhistory.com/stat/aftlu_files/sheet" + xstr + ".htm"
print (url)
page = requests.get(url)
data = page.text
soup = BeautifulSoup(data, features="html.parser")
filename = str(1884+i) + "_" + str(1885+i) + ".htm"
with open(filename, 'w', encoding="utf-8") as f:
tables = soup.find_all('table')
for tb in tables:
print(tb.prettify().replace("-","~"), file=f)
6. run script test.py
C:\Users\arrge\Python>py test.py
http://www.thearsenalhistory.com/stat/aftlu_files/sheet002.htm
http://www.thearsenalhistory.com/stat/aftlu_files/sheet003.htm
http://www.thearsenalhistory.com/stat/aftlu_files/sheet004.htm
No comments:
Post a Comment