Report abuse

import java.util.regex._
import scalax.data.Implicits._
import scalax.io.Implicits._
import scalax.control.ManagedSequence

/**
 * I've made this immutable, representing some fixed accumulation of times for some
 * fixed number of instances. Having this class immutable will make our lives a lot
 * easier if we ever decide to make this program concurrent. Of course, this is
 * probably IO-bound, so concurrency probably won't help unless we're reading from
 * multiple log files at the same time.

 * I've also added a + method to the class, so I can add two Times together and get
 * a new Time that represents the accumulation of both of the previous time. Yes,
 * this means I'll be creating and gargabe collecting lots of objects, but the JVM
 * GC is actually really good at disposing of short-lived objects, so the performance
 * penalty is near-zero.
 */
case class Time(instances: Int, totalTime: Int, viewTime: Int, dbTime: Int) {
    def this() = this(0, 0, 0, 0)

    def +(other: Time) =
        Time(instances + other.instances, totalTime + other.totalTime, viewTime + other.viewTime, dbTime + other.dbTime)

    def avg(time: Int) = time.toFloat/instances

    override def toString = "Total Time: " + totalTime + "ms (View: " + viewTime +"ms DB:" + dbTime + "ms )"
    def avgToString =  "Average Time: " + avg(totalTime) + "ms (View: " + avg(viewTime) +"ms DB:" + avg(dbTime) + "ms )"
}

object LogParser {
    def main(args: Array[String]) {
        val results = parseLogFile(args(0))

        println("Total URIs: " + results.size)

        for ((uri, time) <- results) {
            println(uri + " => " + time.instances)
            println("\t" + time)
            println("\t" + time.avgToString)
        }
    }

    def parseLogFile(filename: String): Map[String, Time] = {
        // Pattern is this: "Completed in 100ms (View: 25, DB: 75) | 200 OK [http://app.domain.com?params=here]"
        val p = Pattern.compile("Completed in (\\d+)ms \\(View: (\\d+), DB: (\\d+)\\) \\| (\\d+) OK \\[http://app.domain.com(.*)\\?")

        /**
         * The next two lines have some deep Scalax magic. Well, not really, but it might
         * look like deep magic if you've never used Scalax. The rest of this expression
         * is fairly straight-forward.
         *
         * First, "toFile" is a method that Scalax adds to any String to turn it into
         * a java.io.File. Finally, "lines" is a method that Scalax adds to any
         * java.io.File to turn it into a ManagedSequence[String]. ManagedSequence does
         * some Automatic Resource Management for us. It will take care of opening and
         * closing the file properly (even if an exception is thrown while processing it!)
         * and making sure that only one line is processed at a time (that is, the entire
         * file is processed in O(1) space).
         * 
         * In order to make this work, ManagedSequence is lazy. That is, defining my "times"
         * variable doesn't actually read anything from, or even open, the file. All of the
         * IO happens only once I do something with the stuff inside the ManagedSequence.
         * In this case, that happens when I call foldLeft on my ManagedSequence later on.
         */
        val times: ManagedSequence[(String, Time)] = for {
            line <- filename.toFile.lines
            val m = p.matcher(line)
                // This line filters only those lines which match our pattern.
                // Non-matching lines get skipped.
                if m.find
            val uri = m.group(5)
            val time = Time(1, m.group(1).toInt, m.group(2).toInt, m.group(3).toInt)
        } yield (uri, time)

        /**
         * This is just a standard empty Map. I've made it immutable instead of mutable,
         * in part because I prefer immutable data structures, but also because we might
         * want to add concurrency later and if we do immutability will turn out to be a
         * big bonus. I'm adding a default value to my map. That is, if I try to access
         * a key that is not in the map, I'll get an empty (zeroed out) Time instead
         * of an error.
         */
        val emptyMap: Map[String, Time] =
            Map.empty.withDefaultValue(new Time)

        /**
         * If you're not familiar with foldLeft, this can look a little scary at first.
         * Don't worry, folds are easy. Here is your most basic fold, which adds up all
         * the numbers in a List:
         *
         *   List(1, 2, 3).foldLeft(0)(_ + _) === (((0 + 1) + 2) + 3) === 6
         * 
         * The second argument to foldLeft is a function of two arguments. If we wanted
         * to name the arguments, we could expand the above code to:
         *
         *   List(1, 2, 3).foldLeft(0)((sum, n) => sum + n)
         *
         * As you can see, the second argument to foldLeft takes a running sum, which starts
         * off at zero, and the next number (n = 1, 2, or 3) and returns the new running sum
         *
         * We're going to take the same approach, but instead of keeping a running sum we're
         * going to keep a running map of Uri -> Time. We start off with the emptyMap,
         * then we provide a function which takes a running map and a pair of Uri -> Time
         * (from our ManagedSequence) and returns a new running map.
         *
         * So how do we return the new running map? Our running map is immutable, so how come
         * it looks like we're changing it? Well, let's look a closer look at the line that
         * returns our new running map.
         *
         *   map(uri) += time
         *
         * First, map(uri) would typically return a Time. However, Time has no += method defined
         * on it. Instead, Scala transforms += into a sequence of = and +, like so:
         *
         *   map(uri) = map(uri) + time
         *
         * Next, Scala turns expressions of the form a(b) = x into method calls of the form
         * a.update(b, x). It's another piece of syntactic convenience, just like a(b) is turned
         * into a.apply(b).
         *
         *   map.update(uri, map.apply(uri) + time)
         * 
         * Now this starts to make sense. Map's apply method returns the Time for that uri, if no Time
         * is found we get our default value, the empty Time. Then Time + Time returns the sum
         * of both the older Times. Finally, map's update method returns a new map with a modified
         * value for the uri key.
         *
         * (To be fair, that last line was a little too sugary even for me. I probably would have
         * written it as map(uri) = map(uri) + time. You still have to know how that translates
         * into apply and update, but that's a little more common and less prone to bugs like += is.)
         *
         * Since we're keeping a running map, every (uri, time) pair in our ManagedSequence will
         * get added to the running map we're keeping. The eventual return result will be an
         * immutable Map[String, Time]. Neat!
         *
         * Note that this is where the laziness of ManagedSequence breaks down and it's forced to
         * do all the work it's been putting off. As soon as our foldLeft starts, the file gets
         * opened and each line gets regex matched and turned into a pair of (uri, time) and then
         * incorporated into a running map of Uris -> Times. All of this is done one line at
         * a time, to make sure we don't blow up our memory if the file is really big. Once the
         * foldLeft is over, ManagedSequence takes care to close the file properly.
         *
         * Because everything is immutable, we'll be creating a lot of intermediate data structures
         * like small Times and partial Maps. However, they're so short lived (all thanks to the
         * laziness of ManagedSequence!) that they'll hopefully never leave the GC's nursery and
         * get collected fairly efficiently. If performance is a concern I'd profile the program,
         * but I would be very surprised if didn't run almost as fast as the optimal imperative version.
         */
        times.foldLeft(emptyMap) { case (map, (uri, time)) =>
            map(uri) += time
        }
    }
}