xargs vs parallel

Have you ever used xargs on linux? Did you ever get frustrated that it doesn’t work very well with the find command?

What do I mean?

Let’s assume we have a list of .csv (1.csv .. 5.csv) files in a directory and we run the following command:

find ./ -name "*.csv" | xargs echo

Then we would expect that for each filename it would run the echo command. But it doesn’t! Instead it returns us:

./1.csv ./2.csv ./3.csv ./4.csv ./5.csv

Apparently it calls echo only once passing it all the filenames.

There is a way to fix this problem by telling the find command to add a 0 (NULL character at the end of each filename & then tell xargs to accept those zeros as delimiter) but it’s not very elegant and it’s easy to forget:

find ./ -name "*.csv" -print0 | xargs -0 echo

Well today I discovered on stackoverflow a comment about the parallel command:

find ./ -name "*.csv" | parallel echo

The parallel command does exactly the same thing as xargs (but instead it properly handles results of the find command) and in addition to that it executes each argument in parallel! Depending on how many processors you have on your machine, the faster it will execute!

This is particularly handy when you are doing lots of uploads (that’s what I was doing today).

Note: You might need to install the parallel command on your machine (on ubuntu you can do it with sudo apt-get install parallel)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s