COVID19 Lockdown Dev Log – Day 19, Save Webscraped Data In ‘.CSV’ Files
What I Worked On
Saving data from a webscrape in a CSV file.
What I Learned
Note: This post is a follow-up on yesterday’s post 😀
In NodeJs you can use the ‘Stream’ API and ‘fs’ API to create files. I won’t go into depth with the APIs, just know that we will use them to create a ‘.CSV’ file out of our scraped HTML data.
In the ‘index.js’ we import the ‘fs’ (filesystem) module and create a stream that we will use to write to the filesystem with “createWriteStream()”:
// index.js
const fs = require('fs');
const writeStream = fs.createWriteStream('articles.csv');
//...more code
Note that the argument for “createWriteStream()” is a string where you define the name of the file and its type.
Let’s loop through the HTML elements, scrape them and save the data to the ‘articles.csv’ file:
// index.js
const fs = require('fs');
const writeStream = fs.createWriteStream('articles.csv');
//...more code
lists.each((index, element) => {
const title = $(element).find('.list__title').text();
const list = $(element).find('ul').text().replace(/\s\s\n/, '');
if(title === "Seneste plus") {
return
}
writeStream.write(`${title} \n ${list} \n`)
})
Using “writeStream.write()” we add the data to our ‘articles.csv’ file and voila! The result looks like this:
Mest læste
Svendborg
Med sang, afstand og nyvaskede hænder: Forårs-sfo blev fejret i Lundby
//...more articles
Seneste nyt
20.07
Lokal erhvervshjælp: Politikere enige om mere markedsføring af Svendborg
//...more articles
I know what you’re thinking: “That is a lot of whitespace!” Agreed 😀 The next step is to use Regex to organize the data with less whitespace. This will be in tomorrow’s blogpost 😀
What Distracted Me During The Day
- Phonecalls
- Messenger
- Nordnet
Resources
- NodeJs Stream – https://nodejs.org/api/stream.html
- createWriteStream – https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options