Thank you to anyone who has already donated - your generous donations helped make three months of treatment possible.
My brother Nate continues to fight stage IV Hodgkin's lymphoma. He's just 31, with a wife and baby girl. They have no active income (since he's been unable to return to work), no insurance, and cannot afford the treatment he needs. Nate and his family need your help. Please consider a donation, every dollar helps. Thanks.
import java.util.regex._ import scalax.data.Implicits._ import scalax.io.Implicits._ import scalax.control.ManagedSequence /** * I've made this immutable, representing some fixed accumulation of times for some * fixed number of instances. Having this class immutable will make our lives a lot * easier if we ever decide to make this program concurrent. Of course, this is * probably IO-bound, so concurrency probably won't help unless we're reading from * multiple log files at the same time. * I've also added a + method to the class, so I can add two Times together and get * a new Time that represents the accumulation of both of the previous time. Yes, * this means I'll be creating and gargabe collecting lots of objects, but the JVM * GC is actually really good at disposing of short-lived objects, so the performance * penalty is near-zero. */ case class Time(instances: Int, totalTime: Int, viewTime: Int, dbTime: Int) { def this() = this(0, 0, 0, 0) def +(other: Time) = { def main(args: Array[String]) { val results = parseLogFile(args(0)) println("Total URIs: " + results.size) for ((uri, time) <- results) { println(uri + " => " + time.instances) println("\t" + time) println("\t" + time.avgToString) } } def parseLogFile(filename: String): Map[String, Time] = { // Pattern is this: "Completed in 100ms (View: 25, DB: 75) | 200 OK [http://app.domain.com?params=here]" val p = Pattern.compile("Completed in (\\d+)ms \\(View: (\\d+), DB: (\\d+)\\) \\| (\\d+) OK \\[http://app.domain.com(.*)\\?") /** * The next two lines have some deep Scalax magic. Well, not really, but it might * look like deep magic if you've never used Scalax. The rest of this expression * is fairly straight-forward. * * First, "toFile" is a method that Scalax adds to any String to turn it into * a java.io.File. Finally, "lines" is a method that Scalax adds to any * java.io.File to turn it into a ManagedSequence[String]. ManagedSequence does * some Automatic Resource Management for us. It will take care of opening and * closing the file properly (even if an exception is thrown while processing it!) * and making sure that only one line is processed at a time (that is, the entire * file is processed in O(1) space). * * In order to make this work, ManagedSequence is lazy. That is, defining my "times" * variable doesn't actually read anything from, or even open, the file. All of the * IO happens only once I do something with the stuff inside the ManagedSequence. * In this case, that happens when I call foldLeft on my ManagedSequence later on. */ val times: ManagedSequence[(String, Time)] = for { line <- filename.toFile.lines val m = p.matcher(line) // This line filters only those lines which match our pattern. // Non-matching lines get skipped. if m.find val uri = m.group(5) val time = Time(1, m.group(1).toInt, m.group(2).toInt, m.group(3).toInt) } yield (uri, time) /** * This is just a standard empty Map. I've made it immutable instead of mutable, * in part because I prefer immutable data structures, but also because we might * want to add concurrency later and if we do immutability will turn out to be a * big bonus. I'm adding a default value to my map. That is, if I try to access * a key that is not in the map, I'll get an empty (zeroed out) Time instead * of an error. */ val emptyMap: Map[String, Time] = Map.empty.withDefaultValue(new Time) /** * If you're not familiar with foldLeft, this can look a little scary at first. * Don't worry, folds are easy. Here is your most basic fold, which adds up all * the numbers in a List: * * List(1, 2, 3).foldLeft(0)(_ + _) === (((0 + 1) + 2) + 3) === 6 * * The second argument to foldLeft is a function of two arguments. If we wanted * to name the arguments, we could expand the above code to: * * List(1, 2, 3).foldLeft(0)((sum, n) => sum + n) * * As you can see, the second argument to foldLeft takes a running sum, which starts * off at zero, and the next number (n = 1, 2, or 3) and returns the new running sum * * We're going to take the same approach, but instead of keeping a running sum we're * going to keep a running map of Uri -> Time. We start off with the emptyMap, * then we provide a function which takes a running map and a pair of Uri -> Time * (from our ManagedSequence) and returns a new running map. * * So how do we return the new running map? Our running map is immutable, so how come * it looks like we're changing it? Well, let's look a closer look at the line that * returns our new running map. * * map(uri) += time * * First, map(uri) would typically return a Time. However, Time has no += method defined * on it. Instead, Scala transforms += into a sequence of = and +, like so: * * map(uri) = map(uri) + time * * Next, Scala turns expressions of the form a(b) = x into method calls of the form * a.update(b, x). It's another piece of syntactic convenience, just like a(b) is turned * into a.apply(b). * * map.update(uri, map.apply(uri) + time) * * Now this starts to make sense. Map's apply method returns the Time for that uri, if no Time * is found we get our default value, the empty Time. Then Time + Time returns the sum * of both the older Times. Finally, map's update method returns a new map with a modified * value for the uri key. * * (To be fair, that last line was a little too sugary even for me. I probably would have * written it as map(uri) = map(uri) + time. You still have to know how that translates * into apply and update, but that's a little more common and less prone to bugs like += is.) * * Since we're keeping a running map, every (uri, time) pair in our ManagedSequence will * get added to the running map we're keeping. The eventual return result will be an * immutable Map[String, Time]. Neat! * * Note that this is where the laziness of ManagedSequence breaks down and it's forced to * do all the work it's been putting off. As soon as our foldLeft starts, the file gets * opened and each line gets regex matched and turned into a pair of (uri, time) and then * incorporated into a running map of Uris -> Times. All of this is done one line at * a time, to make sure we don't blow up our memory if the file is really big. Once the * foldLeft is over, ManagedSequence takes care to close the file properly. * * Because everything is immutable, we'll be creating a lot of intermediate data structures * like small Times and partial Maps. However, they're so short lived (all thanks to the * laziness of ManagedSequence!) that they'll hopefully never leave the GC's nursery and * get collected fairly efficiently. If performance is a concern I'd profile the program, * but I would be very surprised if didn't run almost as fast as the optimal imperative version. */ times.foldLeft(emptyMap) { case (map, (uri, time)) =>