Finding duplicate files with Go

For some time now, I've been messing around with the Go programming language. I like the idea of a systems language with a modern take on the standard library (like build in support for JSON, HTTP calls and built in vectors called slices), concurrency, easy use of 3rd party dependencies and fast, clean code.

As a fun experiment of how easy it is to traverse a filesystem, hashing files, checking sizes on files, parsing arguments among others I've implemented a small and simple tool for finding duplicate files in a list of directories (supplied as arguments to the application).

It goes something like this:

Gather all filenames and sizes in a map, where the key is the size of the files and the value is a slice of filenames as strings.
For all the sizes with multiple filenames, hash up to the first 1024 bytes of the files and add to a similar map.
For all hashes with multiple filenames, print them as duplicates and let the user choose what to do.

The code (in the newest version) can be found here: https://bitbucket.org/dennishedegaard/duplifinder/src/master/duplifinder.go

In my daytime job I usually spend my time programming python. Python is a nice language, but no language is perfect. It is extremely nice for writing fast and working code, making very readable code and in general doing things at a higher level of abstraction.Go on the other hand seems to have a much larger emphasize on clean code, being nearer the metal and having more control.

Examples of the difference is clear when building Go code, it will not run if you have a variable declared that is not used somewhere. Same goes for an import not being used or types being wrong. Python on the other hand will run pretty much anything and stops only at the first bad line of code, this makes it hard to validate if there's a syntax error in the code. Python has 3rd party tools for checking these things (pyflakes, pylint, pep8 etc), but they do not catch nearly as much as the build in validator in Go.

Go is a statically typed language, it however infers types like many other modern languages, python is dynamically typed and it can be a challenge to figure out what a variable is pointing to in a large codebase.

Another difference is the fact that Go has no exceptions (it has panics but they are used differently, more like serious crashes), instead most methods that can fail return 2 results (like when you return a tuple in python), it returns the result, and an error. If an error occured the result is usually nil while the error is an error-object and vice versa.

When you write a Go program you usually run it by calling "go run <file>", this will build and run the program. The you're ready to deploy you can simply call "go build <file>" and a binary is built. I have tried moving binaries between systems without a Go runtime, and they still work.

I will probably keep messing around with Go, especially since it's so different from python, I've always had a weak spot for system languages like C and C++ but hated the somewhat small standard libraries and APIs and the constant checking for bad pointers and memory leaks with valgrind every now and then. Go seems to have the perfect blend of simplicity, power and expressiveness without the history that shaped older languages.

Go also features API for doing web development, for now I've done most of that using the web.go framework, it seems a bit far away from my usual framework (Django).

I will probably keep on coding Go for various projects in the future, it's a nice language with some new and interesting ideas. I tried rust as well, but the unstable API and lack of documentation is still keeping me away for now.