This is part 2 of a series of simple Haskell tutorials. See the first one here.
Now that we are able to parse log lines, let’s turn it into a simple command line tool. We’ll be working with IO and do notation now, so make sure you atleast know the basics of it. If you’re unsure, check out Learn You A Haskell.
Looking at our use case, let’s assume we have a single uncompressed log file. The file consists of lines, and our parser takes lines. So let’s run it down, we need to:
- Read the file.
- Split it into lines.
- Parse each line.
- Show the results!
Step one is handled by
readFile, which is in Prelude (that means it is automatically imported into each Haskell file).
readFile has the type IO String. What this means to us is that it can’t leave the context of IO. If you are confused by this, please go read the chapter linked above!
main = do file <- readFile "logfile.txt"
main function, or action, is the starting place of any Haskell program. We start by binding the result of
readFile of “logfile.txt” to the name file. The type of file is now String, which is exactly the type that
lines from Prelude takes! However, while
readFile returns in the IO monad,
lines does not, so we use a let binding instead of
main = do file <- readFile "logfile.txt" let logLines = lines file
So far so good! If you try to compile the file now you will get an error, because you can’t have a let binding at the end of a do block. We’ll be fixing this in a moment. But before that, let’s look at
parse from Parsec. I wont put the type of
parse here, it looks awful scary. Fortunately, it’s still quite easy to use. The basic usage is:
parse line "(test)" testLine
So it takes a Parser of something, a title String and the String to be parsed. This line will return an Either ParseError String. However, in our use case we have a [String], so we need to
mapM (the monadic version of
parse function over the list of log lines.
main = do file <- readFile "logfile.txt" let logLines = lines file result <- mapM (parse line "(test)") logLines
The type of result is [Either ParseError String]. To extract it, let’s use
either, also in Prelude.
either takes two functions, and in the case of a left will apply the first function, in the case of a right will apply the second function. We will need to map this as well, as we have a list of eithers. ParseError is printable, so let’s just use
main = do file <- readFile "logfile.txt" let logLines = lines file result <- mapM (parse line "(test)") logLines mapM_ (either print print) result
There we are! This will print out each parse result, whether an error or the LogLine data type we created in the first part.
Without even really trying we have done something that can be pretty hard in a lot of languages, this script will run in constant space! Through the magic of laziness, each line read will be parsed, printed and then freed to be garbage collected without having to wait for its friends. That said, this is not an efficient implementation, and in a later tutorial we will look at how to optimize this.
Edit: this post was translated to Japanese by Hiroyuki Fudaba and available is here.