A Python script inline timer

I’ve been learning Python recently (actually it is a revisit since I learned and used some Python more than 10 years ago) and found Google’s tutorial is easy to pick up. My first solution to one of the exercise “word count” took a lot of time to analysis a 600K text, which is not right.

I want some profile tool to locate the bottle neck, and found this article pretty good. After using the simple timing context manager, I found the problem was that I used
if key in dict.keys() because I saw this usage before:
for key in sorted(dict.keys()):

However, this is a linear look up in a list which didn’t use the constant access performance of HashMap. The correct way is to use if key in dict .

The problem has been solved, but I’m not totally satisfied with the timing context manager tool. To use it means to indent code blocks, and moving timer position in script means lots of indent operations.

After some time I produced this script inline timer tool. To use it, just import the module, place times.start and times.end in the position you want in your script. It will print the total time elapsed between these two points in seconds. The precision is set by the digit parameter of times.start.

If you want more fine grained time measurement, one way is put any number of
times.last_seg in script between start and end. Each point will print the time elapsed since last segment point.

Another way is put time segment start and stops in pair to enclose any code block. You don’t need to indent existing code, just put times.seg_start and times.seg_stop before and after the code block.

All the methods except the times.start accept string parameters to mark the meaning of time point.

times.py

import time

__author__ = 'draco'
# measure script time duration in segments
# the clock value list: start, segments, end
T = []
Digit = [7]


def start(digit=7):
    """Timer start. digit control the number width to align"""
    del T[:]  # clean up first
    Digit[0] = digit
    T.append(time.time())
    print '==>| Timer start | set to', Digit[0], 'digits after decimal point'


def last_seg(s='since last point'):
    """calculate the duration between last point till this one"""
    T.append(time.time())
    duration = T[-1] - T[-2]
    print "=> | %.*f s" % (Digit[0], duration), s


def seg_start(s='start...'):
    """set a segment start, always used with seg_stop in pairs"""
    T.append(time.time())
    print "=> << | 0", ' ' * (Digit[0] + 3), s


def seg_stop(s='...stop'):
    """set a segment end, always used with seg_start in pairs"""
    T.append(time.time())
    duration = T[-1] - T[-2]
    print "      | %.*f s " % (Digit[0], duration), s, ' >>'

def end(s='since last point. Timer end.'):
    T.append(time.time())
    duration = T[-1] - T[-2]
    total = T[-1] - T[0]
    print "=> | %.*f s" % (Digit[0], duration), s
    print "==>| %.*f s" % (Digit[0], total), 'Total time elapsed'

Using timer in script:

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

"""Wordcount exercise
Google's Python class
"""

import sys
import times

###

# This basic command line argument parsing code is provided and
# calls the print_words() and print_top() functions which you must define.

def count_words(filename):
    """helper method for the other 2, return dict"""
    # read file into words
    times.seg_start('count words start')
    f = open(filename, 'rU')
    word_count = {}
    for line in f:
        words = line.split()
        for word in words:
            word = word.lower()
            if word in word_count:      # in dict.keys() could be slow, didn't use hash. 0.069 for 640k
            #if word_count.get(word): # 0.03s for 250k text, 0.084 for 640k
                word_count[word] += 1
            else:
                word_count[word] = 1
    times.seg_stop('count words stop')
    return word_count

def print_words(filename):
    "count words"
    #sort dict and print
    word_count = count_words(filename)
    for word in sorted(word_count.keys()):
        print word, word_count[word]


def print_top(filename):
    word_count = count_words(filename)
    word_count_pairs = word_count.items()
    times.seg_start('sorting start')
    word_count_pairs = sorted(word_count_pairs, key = lambda pair: pair[-1], reverse=True)
    times.seg_stop('sorting stop')
    for i in range(3):
        print word_count_pairs[i][0], word_count_pairs[i][1]
    times.last_seg()

def main():
  if len(sys.argv) != 3:
    print 'usage: ./wordcount.py {--count | --topcount} file'
    sys.exit(1)
  times.start(5)
  option = sys.argv[1]
  filename = sys.argv[2]
  if option == '--count':
    print_words(filename)
  elif option == '--topcount':
    print_top(filename)
  else:
    print 'unknown option: ' + option
    sys.exit(1)
  times.end()
if __name__ == '__main__':
  main()

The output:

==>| Timer start | set to 5 digits after decimal point
=> << | 0          count words start
      | 0.06900 s  count words stop  >>
=> << | 0          sorting start
      | 0.00800 s  sorting stop  >>
the 5027
to 3353
and 2831
=> | 0.00100 s since last point
=> | 0.00300 s since last point. Timer end.
==>| 0.08600 s Total time elapsed

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s