COVID19 Lockdown Dev Log – Day 19, Save Webscraped Data In ‘.CSV’ Files

What I Worked On

Saving data from a webscrape in a CSV file.

What I Learned

Note: This post is a follow-up on yesterday’s post 😀

In NodeJs you can use the ‘Stream’ API and ‘fs’ API to create files. I won’t go into depth with the APIs, just know that we will use them to create a ‘.CSV’ file out of our scraped HTML data.

In the ‘index.js’ we import the ‘fs’ (filesystem) module and create a stream that we will use to write to the filesystem with “createWriteStream()”:

// index.js

const fs = require('fs');
const writeStream = fs.createWriteStream('articles.csv');

//...more code

Note that the argument for “createWriteStream()” is a string where you define the name of the file and its type.

Let’s loop through the HTML elements, scrape them and save the data to the ‘articles.csv’ file:

// index.js

const fs = require('fs');
const writeStream = fs.createWriteStream('articles.csv');

//...more code

lists.each((index, element) => {
    const title = $(element).find('.list__title').text();
    const list = $(element).find('ul').text().replace(/\s\s\n/, '');
    if(title === "Seneste plus") {
        return
    }
    writeStream.write(`${title} \n ${list} \n`)
})

Using “writeStream.write()” we add the data to our ‘articles.csv’ file and voila! The result looks like this:

Mest læste 
 
                                              
                        
                            
                                
                                    
                                        Svendborg
                                    
                                    Med sang, afstand og nyvaskede hænder: Forårs-sfo blev fejret i Lundby
                                
                            
                        
                    
                
//...more articles                            
                    
                        
                            
                                
                                    
                                       
Seneste nyt 
 
                                              
                        
                            
                                
                                    
                                        20.07
                                    
                                    Lokal erhvervshjælp: Politikere enige om mere markedsføring af Svendborg
                                
                            
//...more articles

I know what you’re thinking: “That is a lot of whitespace!” Agreed 😀 The next step is to use Regex to organize the data with less whitespace. This will be in tomorrow’s blogpost 😀

What Distracted Me During The Day

  • Phonecalls
  • LinkedIn
  • Messenger
  • Nordnet

Resources