Node shell scripting

You want some menial job automated? Some research-y task done? Parse some CSV files, do something with the data? Batch process? Well why not use the language already know instead of learning bash, shell, php scripting or whatever?

Nodejs

Grab the latest version from https://nodejs.org/en/download/current/ and let's go!

Check, check, 1, 2, 3

Open your favorite command prompt/terminal/shell (Mac users, go to Spotlight search, type "Terminal"), then type:

$ node -v

(I use $ to denote something you type in the terminal, you need not type it.)

You should see the version of the Node you just installed displayed in the terminal.

v

Hello script

Now let's create the simplest script and make sure it works.

$ touch hello.js
$ open hello.js

In your editor of choice, type the following in your hello.js file:

console.log(__dirname);

Save the file, now run it:

$ node hello.js

Ta-da!

hello

As you can see

  • You run a shell script with $ node scriptname.js
  • You can print results back to the user with console.log
  • You can figure out the directory where your script is located with the constant __dirname. BTW, similarly __filename gives you the full path and the name of the script, in this example /Users/stoyanstefanov/Desktop/nodeshell/hello.js

__dirname is useful when you have for example a bunch of files you want to fiddle with in a subdirectory. You don't want to limit where your script can be run from. So it's best to avoid relative paths but always start with __dirname.

Read a directory

Now, imagine you need a script that does something with a list of files you conveniently put in data subdirectory together with your script.

dir

Let's list all the files in this directory using a new script you can call dirlist.js.

$ touch dirlist.js

You start by including two libraries (modules): fs (as in "file system") ans path.

const fs = require('fs');
const path = require('path');

path helps you deal with differences in Windows vs Unix slashes and backslashes and so on. It's alwayts best if you avoid string concatenation and use path's methods instead to make your scripts more reusable and resilient.

fs gives you a number of methods to read, write, delete files and directories.

For the purpose of listing a directory you can use fs.readdir or fs.readdirSync. Let's go with the second. Why? Well, it's simpler. You'll notice several methods have synchonous and asynchronous versions. The async versions are definitely better because they don't block your scripts with one-after-the-other synchronous operations. They tend to make your scripts run faster because you can do several things in parallel. However they are a bit more complicated because you need to provide a callback function to be notified when the async operation is complete.

To keep things simple, sync all the way!

And so function that gives you an array of files in a directory is:

const readDir = fs.readdirSync;

Which directory to list?

const dataDir = path.resolve(__dirname, 'data');

Using resolve you can concatenate the current directory and the data subdirectory without any backslashes, etc.

Now, reading the directory gives you an array and you can simply print the name of each file in the directory in a forEach loop.

readDir(dataDir).forEach(f => console.log(f));

So the whole script is:

const fs = require('fs');
const path = require('path');

const readDir = fs.readdirSync;

const dataDir = path.resolve(__dirname, 'data');

readDir(dataDir).forEach(f => console.log(f));

And the script in action....

$ node dirlist.js 
1.txt
hello.txt
what.cvs

You probably want the full path to each file if you're going to read it, so just another path.resolve() call should do the trick:

readDir(dataDir).forEach(f => 
  console.log(path.resolve(dataDir, f))
);

dirlist

Filtering out things you don't care about

Say you run dirlist.js and spits out something like:

$ node dirlist.js 
/Users/stoyanstefanov/Desktop/nodeshell/data/.DS_Store
/Users/stoyanstefanov/Desktop/nodeshell/data/1.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/hello.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data
/Users/stoyanstefanov/Desktop/nodeshell/data/what.cvs

Darn Dot-files

There's a .DS_Store file in there, some Max OS garbage you've no use for. Let's remove it from the list of files. You can simply check if the file name starts with a dot (.), like so...

Before:

readDir(dataDir).forEach(f => 
  console.log(path.resolve(dataDir, f))
);

After:

readDir(dataDir).forEach(f => {
  if (f.startsWith('.')) {
    return; // no .DS_Store etc, thank you
  }
  const file = path.resolve(dataDir, f);
  console.log(file);
});

Files only

How about removing the moar-data directory from the list? You only care about files. There's a stat() method in the fs module that can help here. More specifically, the sync version, statSync().

const stat = fs.statSync;

Calling this method on a file gives you properties (such as file creation time) and methods (such as isDirectory()). Here's how you can filter out directories:

readDir(dataDir).forEach(f => {
  if (f.startsWith('.')) {
    return; // no .DS_Store etc, thank you
  }

  const file = path.resolve(dataDir, f);
  const stats = stat(file);
  
  if (stats.isDirectory()) {
    return; // files only shall pass
  }

  console.log(file);
});

Now the list contains interesting files only:

$ node dirlist.js 
/Users/stoyanstefanov/Desktop/nodeshell/data/1.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/hello.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/what.cvs

Tip: stats.size gives you file size.

txt files only

And what if this script only works with text files? Your friend is path.extname(file).

if (path.extname(file) !== '.txt') {
  return; // actually text files only
}

Tip: explore what path.parse() can do for you. E.g.

console.log(
  require('path').parse(__filename)
);

You should see something like:

{
  root: '/',
  dir: '/Users/stoyanstefanov/Desktop/nodeshell',
  base: 'dirlist.js',
  ext: '.js',
  name: 'dirlist' 
}

Recursive directory listing

What if the subdirectory moar-data contains additional files you're interested in? Let's list your main directory and its subdirectories and their subdirectories. In other words, list recursively.

All you need to do is put your forEach loop into a function listFiles so it an be reused. This function takes any directory as input (listFiles(dir)), lists it and if it finds that one of the items in the list is another directory (stats.isDirectory()) it navigates into the new directory, instead of ignoring it like before.

const fs = require('fs');
const path = require('path');

const readDir = fs.readdirSync;
const stat = fs.statSync;

const dataDir = path.resolve(__dirname, 'data');

function listFiles(dir) {
  
  readDir(dir).forEach(f => {
    if (f.startsWith('.')) {
      return; // no .DS_Store etc, thank you
    }

    const file = path.resolve(dir, f);
    const stats = stat(file);
  
    if (stats.isDirectory()) {
      return listFiles(file);
    }
    
    if (path.extname(file) !== '.txt') {
      return; // actually text files only
    }
    
    console.log(file);      
  });
}

listFiles(dataDir);

The script in action:

$ node dirlist-recursive.js 
/Users/stoyanstefanov/Desktop/nodeshell/data/1.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/hello.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data/2.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data/bye.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data/even-more/nomore.txt

Reading files

/* TODO */

Parsing file contents

/* TODO */

Creating directories

/* TODO */

Writing files

To create a blank file called hello.md in a directory called data/ (which is a subdirectory to where your script lives) you go:

const fs = require('fs');
const path = require('path');
const outputDir = path.resolve(__dirname, './data');
fs.writeFile(outputDir + '/hello.md', '', () => {});

The call to fs.writeFile() takes arguments:

  • file name
  • contents to write to the file, in this case - nothing
  • optional options, e.g. file permissions, in this case skipped completely
  • a callback function (in this case empty) you can use to tell if there was an error creating this file

process

Where am I?

There's a global `process` that can give such vital advise as: "what is the Current Working Directory"

console.log('Starting here: ' + process.cwd());

I'm done!

Another thing hanging off `process` is `exit()` which kills the execution of the current script. You can also provide exit codes in case your script A is run by another script B and B wants to know whether A managed to finish successfully.

Generally exit code 0 means all is fine, 1 and above is an error.

// yikes!
process.exit(1)

Executing other command-line tools

/* TODO */

The Web

Dealing with URLs

/* TODO */

cURL

/* TODO */

Scraping with PhantomJS

/* TODO */

Reading JSON files

Reading JSON files is as simple as just require()-ing them. When you require a JSON file, it's read, parsed (e.g. with JSON.parse()) and the JS object is returned to you.

Here's how to read the first element of the first element of the arrays stored in JSON files in ./data/json

const fs = require('fs');
const path = require('path');

const readDir = fs.readdirSync;

const dataDir = path.resolve(__dirname, './data/json');

readDir(dataDir).forEach(f => {
  if (f.startsWith('.')) {
    return; // no .DS_Store etc, thank you
  }
  const file = path.resolve(dataDir, f);
  const jsonData = require(file);
  console.log(jsonData[0][0]);
});

Usage:

$ node json-read.js 
A cappella
Affettuoso

And here's what the first JSON fille's contents that we just read looks like:

[
  [
    "A cappella",
    "in chapel style", 
    "Sung with no (instrumental) accompaniment, has lots of harmonizing"
  ],
  ....

Writing JSON files

While reading some JSON files (example above) you notice that they contain arrays (rows) and each row's second element (index 1) is lowercase. How about updating all these JSON files and capitalizing the first letter?

After reading the contents on the file into an object with const jsonData = require(file);, all you do is loop over the data, massage the [1]'th element and update the jsonData. After the loop, you have the updated data. Time to turn that data object to a pretty string with 2 spaces indentation via JSON.stringify(jsonData, null, 2). Finally, use writeFileSync() from the file system module to write the string to a file on the disk:

fs.writeFileSync('path/to/file', stringContents);

Here's the complete example:

const fs = require('fs');
const path = require('path');

const readDir = fs.readdirSync;

const dataDir = path.resolve(__dirname, './data/json');

readDir(dataDir).forEach(f => {
  if (f.startsWith('.')) {
    return; // no .DS_Store etc, thank you
  }
  const file = path.resolve(dataDir, f);
  const jsonData = require(file);
  jsonData.forEach((row, idx) => {
    jsonData[idx][1] = row[1].charAt(0).toUpperCase() + row[1].substring(1);    
  });
  fs.writeFileSync(file, JSON.stringify(jsonData, null, 2));
});

Using third party modules

/* TODO */

Taking command line arguments

Arguments array

You can read arguments passed to your script using process.argv. The first two arguments are:

  1. the node executable
  2. the path to the script

To test, put this in arg.js

console.log(process.argv);

Running as always...

$ node arg.js 
[ '/usr/local/bin/node',
  '/Users/stoyanstefanov/Desktop/nodeshell/arg.js' ]

Running with more arguments...

$ node arg.js hello "hi there"
[ '/usr/local/bin/node',
  '/Users/stoyanstefanov/Desktop/nodeshell/arg.js',
  'hello',
  'hi there' ]

You rarely need the first two arguments. To get the "real" ones, you simply access the elements of the arguments array as any other JavaScript array.

For example you can require the user to provide you with 2 arguments or refuse to do anything otherwise.

if (process.argv.length < 4) {
  console.log('this script requires two arguments');
  process.exit(1);
} else {
  console.log('Arg 1: ', process.argv[2]);
  console.log('Arg 2: ', process.argv[3]);
}

Testing...

$ node arg-idx.js
this script requires two arguments

$ node arg-idx.js hello there
Arg 1:  hello
Arg 2:  there

An array of interesting args only

You can slice away the first two arguments as they are mostly useless and focus on the rest. Easy:

const allArgs = process.argv.slice(2);
console.log(allArgs);

Testing:

As you can see, any quotes are conveniently gone.

commander

Bells & whistles

Colored output

Printing out text with special escape codes can produce output with different background or foreground colors, underline, bold, blink... Let's see an example.

// print the word "Error" in red
console.log('\x1B[31mError\x1B[39m');

// "Success" is green 
console.log('\x1B[32mSuccess\x1B[39m');

Unicode

You can also add Unicode characters to spice things up:

// print the word "Error" in red
console.log('\x1B[31m✖ Error\x1B[39m');

// "Success" is green 
console.log('\x1B[32m✔ Success\x1B[39m');

Result:

ANSI escape codes

These weird characters surrounding the words Error and Success were ANSI escape code, you can learn a lot more about what they mean and how to use them here.

Project: DIY unit testing scripts

/* TODO */

Publishing your script to NPM

/* TODO */

Sorry, comments disabled and hidden due to excessive spam. Working on restoring the existing comments...

Meanwhile you can find me on twitter - @stoyanstefanov