Node shell scripting

You want some menial job automated? Some research-y task done? Parse some CSV files, do something with the data? Batch process? Well why not use the language already know instead of learning bash, shell, php scripting or whatever?

Nodejs

Grab the latest version from https://nodejs.org/en/download/current/ and let's go!

Check, check, 1, 2, 3

Open your favorite command prompt/terminal/shell (Mac users, go to Spotlight search, type "Terminal"), then type:

$ node -v

(I use $ to denote something you type in the terminal, you need not type it.)

You should see the version of the Node you just installed displayed in the terminal.

v

Hello script

Now let's create the simplest script and make sure it works.

$ touch hello.js
$ open hello.js

In your editor of choice, type the following in your hello.js file:

console.log(__dirname);

Save the file, now run it:

$ node hello.js

Ta-da!

hello

As you can see

  • You run a shell script with $ node scriptname.js
  • You can print results back to the user with console.log
  • You can figure out the directory where your script is located with the constant __dirname. BTW, similarly __filename gives you the full path and the name of the script, in this example /Users/stoyanstefanov/Desktop/nodeshell/hello.js

__dirname is useful when you have for example a bunch of files you want to fiddle with in a subdirectory. You don't want to limit where your script can be run from. So it's best to avoid relative paths but always start with __dirname.

Read a directory

Now, imagine you need a script that does something with a list of files you conveniently put in data subdirectory together with your script.

dir

Let's list all the files in this directory using a new script you can call dirlist.js.

$ touch dirlist.js

You start by including two libraries (modules): fs (as in "file system") ans path.

const fs = require('fs');
const path = require('path');

path helps you deal with differences in Windows vs Unix slashes and backslashes and so on. It's alwayts best if you avoid string concatenation and use path's methods instead to make your scripts more reusable and resilient.

fs gives you a number of methods to read, write, delete files and directories.

For the purpose of listing a directory you can use fs.readdir or fs.readdirSync. Let's go with the second. Why? Well, it's simpler. You'll notice several methods have synchonous and asynchronous versions. The async versions are definitely better because they don't block your scripts with one-after-the-other synchronous operations. They tend to make your scripts run faster because you can do several things in parallel. However they are a bit more complicated because you need to provide a callback function to be notified when the async operation is complete.

To keep things simple, sync all the way!

And so function that gives you an array of files in a directory is:

const readDir = fs.readdirSync;

Which directory to list?

const dataDir = path.resolve(__dirname, 'data');

Using resolve you can concatenate the current directory and the data subdirectory without any backslashes, etc.

Now, reading the directory gives you an array and you can simply print the name of each file in the directory in a forEach loop.

readDir(dataDir).forEach(f => console.log(f));

So the whole script is:

const fs = require('fs');
const path = require('path');

const readDir = fs.readdirSync;

const dataDir = path.resolve(__dirname, 'data');

readDir(dataDir).forEach(f => console.log(f));

And the script in action....

$ node dirlist.js 
1.txt
hello.txt
what.cvs

You probably want the full path to each file if you're going to read it, so just another path.resolve() call should do the trick:

readDir(dataDir).forEach(f => 
  console.log(path.resolve(dataDir, f))
);

dirlist

Filtering out things you don't care about

Say you run dirlist.js and spits out something like:

$ node dirlist.js 
/Users/stoyanstefanov/Desktop/nodeshell/data/.DS_Store
/Users/stoyanstefanov/Desktop/nodeshell/data/1.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/hello.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data
/Users/stoyanstefanov/Desktop/nodeshell/data/what.cvs

Darn Dot-files

There's a .DS_Store file in there, some Max OS garbage you've no use for. Let's remove it from the list of files. You can simply check if the file name starts with a dot (.), like so...

Before:

readDir(dataDir).forEach(f => 
  console.log(path.resolve(dataDir, f))
);

After:

readDir(dataDir).forEach(f => {
  if (f.startsWith('.')) {
    return; // no .DS_Store etc, thank you
  }
  const file = path.resolve(dataDir, f);
  console.log(file);
});

Files only

How about removing the moar-data directory from the list? You only care about files. There's a stat() method in the fs module that can help here. More specifically, the sync version, statSync().

const stat = fs.statSync;

Calling this method on a file gives you properties (such as file creation time) and methods (such as isDirectory()). Here's how you can filter out directories:

readDir(dataDir).forEach(f => {
  if (f.startsWith('.')) {
    return; // no .DS_Store etc, thank you
  }

  const file = path.resolve(dataDir, f);
  const stats = stat(file);
  
  if (stats.isDirectory()) {
    return; // files only shall pass
  }

  console.log(file);
});

Now the list contains interesting files only:

$ node dirlist.js 
/Users/stoyanstefanov/Desktop/nodeshell/data/1.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/hello.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/what.cvs

Tip: stats.size gives you file size.

txt files only

And what if this script only works with text files? Your friend is path.extname(file).

if (path.extname(file) !== '.txt') {
  return; // actually text files only
}

Tip: explore what path.parse() can do for you. E.g.

console.log(
  require('path').parse(__filename)
);

You should see something like:

{
  root: '/',
  dir: '/Users/stoyanstefanov/Desktop/nodeshell',
  base: 'dirlist.js',
  ext: '.js',
  name: 'dirlist' 
}

Recursive directory listing

What if the subdirectory moar-data contains additional files you're interested in? Let's list your main directory and its subdirectories and their subdirectories. In other words, list recursively.

All you need to do is put your forEach loop into a function listFiles so it an be reused. This function takes any directory as input (listFiles(dir)), lists it and if it finds that one of the items in the list is another directory (stats.isDirectory()) it navigates into the new directory, instead of ignoring it like before.

const fs = require('fs');
const path = require('path');

const readDir = fs.readdirSync;
const stat = fs.statSync;

const dataDir = path.resolve(__dirname, 'data');

function listFiles(dir) {
  
  readDir(dir).forEach(f => {
    if (f.startsWith('.')) {
      return; // no .DS_Store etc, thank you
    }

    const file = path.resolve(dir, f);
    const stats = stat(file);
  
    if (stats.isDirectory()) {
      return listFiles(file);
    }
    
    if (path.extname(file) !== '.txt') {
      return; // actually text files only
    }
    
    console.log(file);      
  });
}

listFiles(dataDir);

The script in action:

$ node dirlist-recursive.js 
/Users/stoyanstefanov/Desktop/nodeshell/data/1.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/hello.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data/2.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data/bye.txt
/Users/stoyanstefanov/Desktop/nodeshell/data/moar-data/even-more/nomore.txt

Reading files

/* TODO */

Parsing file contents

/* TODO */

Creating directories

/* TODO */

Writing files

/* TODO */

process

Where am I?

process.cwd()

I'm done!

process.exit(1)

Executing other command-line tools

/* TODO */

The Web

Dealing with URLs

/* TODO */

cURL

/* TODO */

Scraping with PhantomJS

/* TODO */

Reading JSON files

/* TODO */

Writing JSON files

Using third party modules

/* TODO */

Taking command line arguments

Arguments array

You can read arguments passed to your script using process.argv. The first two arguments are:

  1. the node executable
  2. the path to the script

To test, put this in arg.js

console.log(process.argv);

Running as always...

$ node arg.js 
[ '/usr/local/bin/node',
  '/Users/stoyanstefanov/Desktop/nodeshell/arg.js' ]

Running with more arguments...

$ node arg.js hello "hi there"
[ '/usr/local/bin/node',
  '/Users/stoyanstefanov/Desktop/nodeshell/arg.js',
  'hello',
  'hi there' ]

You rarely need the first two arguments. To get the "real" ones, you simply access the elements of the arguments array as any other JavaScript array.

For example you can require the user to provide you with 2 arguments or refuse to do anything otherwise.

if (process.argv.length < 4) {
  console.log('this script requires two arguments');
  process.exit(1);
} else {
  console.log('Arg 1: ', process.argv[2]);
  console.log('Arg 2: ', process.argv[3]);
}

Testing...

$ node arg-idx.js
this script requires two arguments

$ node arg-idx.js hello there
Arg 1:  hello
Arg 2:  there

commander

Project: DIY unit testing scripts

/* TODO */

Publishing your script to NPM

/* TODO */

Sorry, comments disabled and hidden due to excessive spam. Working on restoring the existing comments...

Meanwhile you can find me on twitter - @stoyanstefanov