Software Development

Find the bug – The case of the degrading system – Answer

In my previous post I showed the following code, and asked what the bug was, and what the implications of that would be.

class Program
{
    private Timer nextcheck;
    public event EventHandler ServerSigFailed;
    static void Main(string[] args)
    {
        var program = new Program();
        if(program.ValidateServerSig() == false) 
            return;
        program.DoOtherStuff();
    }
    public bool ValidateServerSig()
    {
        nextcheck = new Timer(state => ValidateServerSig());
        var response = DoRequest("http://remote-srv/signature");
        if(response.Failed)
        {
            var copy = ServerSigFailed;
            if(copy!=null) copy(this, EventArgs.Empty);
            return false;
        }
        var result = Utils.CheckPublic KeySignatureMatches(response);
        if(result.Valid)
        if(response.Failed)
        {
            var copy = ServerSigFailed;
            if(copy!=null) copy(this, EventArgs.Empty);
            return false;
        }
        
        // setup next check
        nextcheck.Change(TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(15));
        return true;
    }
}

The answer is quite simple, look at the first line of the ValidateServerSig function. It setups a new timer that recursively call this function. So every 15 seconds, we are going to have a new timer to run this function.

And 15 seconds after that, we are going to have 4 timers, and 15 seconds after that…

But the interesting thing here that those are timers, so they get scheduled on the thread pool. So while the growth rate of the number of tasks is phenomenal, in practice, they don’t get run all together.So after 20 minutes of this, it will only run about 800 times. Because the thread pool is growing slowly, we’ll end up with a much slower growth rate, but eventually the thread pool queues are going to be filled with nothing but this task. And since we typically also process requests on the thread pool, eventually requests cannot be handled, because the threads are always busy running the validation code.

In the real codebase, the timing was 5 minutes, and this issue typically manifested itself only after several weeks of continuous running.

Related Articles

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
nilankar ghosh
nilankar ghosh
7 years ago

Send me valuable material.

Back to top button