BeautifulSoup for scraping Coronavirus cases information

Ketut Artayasa
Sharing while learning
2 min readJun 16, 2020

--

Photo by Nhu Nguyen on Unsplash

During COVID 19 pandemic there are many websites that inform about Coronavirus cases by country, provinces, and districts. Some of them even provide API for other developers. For example, https://kawalcorona.com/api provide free API.

I try to get Coronavirus cases from http://infocorona.baliprov.go.id/ which provide the most complete Coronavirus cases information in my area. Actually, we just need a browser for getting information from this website, but I want to try getting Coronavirus case information in a different way.

This is my scenario:

  1. Get information from https://infocorona.baliprov.go.id.
  2. Display information to ESP8266 with 2x16 LCD (micropython supported).

There is a limitation on this tiny hardware and urequests module that prevent me scraping directly to https://infocorona.baliprov.go.id from ESP8266, even I need BeautifulSoup (Python module for scraping) and requests, so I need another more powerful machine for running scraper script than push/provide the information to ESP8266 (hmm… API services or MQTT broker? that was our next problem), let’s first get a solution for getting information from https://infocorona.baliprov.go.id.

For getting the information that I needed, I will create a simple python script, that uses requests, lxml, and bs4 as a module. First, we need to install all requirements, I use the easiest way via PIP

pip install beautifulsoup4 requests lxml

Create my tiny script tarik_covid.py

import requests
from bs4 import BeautifulSoup
from lxml import html
headerku = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'
}
URL2 = 'https://infocorona.baliprov.go.id/API/good_api_lagi.php'r = requests.post(URL2, headers=headerku)
soup = BeautifulSoup(r.text,'lxml')
#After try many time :)
t7 = soup.find_all('div')[7]
t71 = t7.find_all('div')[1]
positif_corona = t71.find('h3').text
print("Confirmed cases: " + positif_corona)

Output:

Confirmed cases: 760 Orang

Now, I have information about Coronavirus cases, my next problem is how to send this data to ESP8266 and display this to 2x16 LCD which attached to ESP8266.

Just enough for today, CU

NB: This is just a tiny simple scenario of web scraping, you can improve convenience to your problems, hardware limitation, legality, and ethics of web scraping.

--

--