'Emergency' Down Time

Any board related changes or announcements can be found here.

'Emergency' Down Time

Postby isdr » Thu 28 Jan 2010 2:54 pm

Over the last couple of weeks resource utilization on the server hosting UCC has spiked dramatically. I'm finally to a point where I need to test what process(es) are using these resources.

For that reason, it will be necessary very soon now to shut down the board for about an hour, so I can prove or disprove if UCC is the problem.

I don't know *exactly* when that hour will come. I'm trying to coordinate with Tom. But I need it to be ASAP (preferably today) during an otherwise normal peak usage period.

More info will be forthcoming.

Scott Dale Robison
isdr
Novice
 
Posts: 24
Joined: Tue 01 Sep 2009 3:35 pm

Re: 'Emergency' Down Time

Postby isdr » Thu 28 Jan 2010 5:23 pm

My apologies. The one hour down time turned into two hours.

That being said, I got the information I needed.

Thank you for your time.

Scott Dale Robison
isdr
Novice
 
Posts: 24
Joined: Tue 01 Sep 2009 3:35 pm

Re: 'Emergency' Down Time

Postby ian husford » Fri 29 Jan 2010 11:40 am

isdr wrote:My apologies. The one hour down time turned into two hours.

That being said, I got the information I needed.

Thank you for your time.

Scott Dale Robison


You mean I missed a two-hour down time of this board?

I must not be on the board often enough then. :lolbang:

ian
The First and Second Amendments are so intertwined that that we can not have one without the other. And if there comes a time when we lose both, the United States will cease as we know it.
User avatar
ian husford
Sniper
 
Posts: 1098
Joined: Fri 30 Nov 2007 10:18 am
Location: East Mill Creek, Utah

Re: 'Emergency' Down Time

Postby Rupper » Fri 29 Jan 2010 2:05 pm

isdr wrote:My apologies. The one hour down time turned into two hours.

That being said, I got the information I needed.

Thank you for your time.

Scott Dale Robison


Us computer geek types are dying to know what the problem was. OK, I shouldn't speak for everyone else, but at least I am. :crown:
User avatar
Rupper
Marksman
 
Posts: 263
Joined: Thu 10 Dec 2009 11:22 pm
Location: Riverton, UT

Re: 'Emergency' Down Time

Postby isdr » Fri 29 Jan 2010 2:10 pm

Well, I don't know for sure that I've solved it yet. I enabled some extra logging information in mysql to track slow queries, and found one that seems to be a troublemaker. It was in the syndication module. While the query worked, the database wasn't indexed properly for it to run efficiently.

The good news: I just in the last few minutes added a new index to the database and now syndication is almost instant.

The bad news: I have a hard time believing that was the source of the problem, unless no one ever used syndication until the last two or three weeks.

Time will tell.

Edit: The excessive resource utilization that I referenced yesterday was all disk IO related. For the two hours the board was unavailable, virtually no disk IO was recorded. So I was able to confirm that UCC was the source of the trouble. In any case, I'll stay on it until I'm happy with the disk IO levels.

Another Edit: I am cautiously optimistic, as disk IO usage is practically flat (especially the last half hour compared to the previous 20+ hours). Perhaps the software or database was updated? Regardless, I'm happy with what I'm seeing so far.

Yet Another Edit: Almost two hours in, and disk IO is still very low. {knock on wood}

SDR
isdr
Novice
 
Posts: 24
Joined: Tue 01 Sep 2009 3:35 pm

Re: 'Emergency' Down Time

Postby RustyShackleford » Fri 29 Jan 2010 7:52 pm

isdr wrote:Well, I don't know for sure that I've solved it yet. I enabled some extra logging information in mysql to track slow queries, and found one that seems to be a troublemaker. It was in the syndication module. While the query worked, the database wasn't indexed properly for it to run efficiently.

The good news: I just in the last few minutes added a new index to the database and now syndication is almost instant.

The bad news: I have a hard time believing that was the source of the problem, unless no one ever used syndication until the last two or three weeks.

Time will tell.

Edit: The excessive resource utilization that I referenced yesterday was all disk IO related. For the two hours the board was unavailable, virtually no disk IO was recorded. So I was able to confirm that UCC was the source of the trouble. In any case, I'll stay on it until I'm happy with the disk IO levels.

Another Edit: I am cautiously optimistic, as disk IO usage is practically flat (especially the last half hour compared to the previous 20+ hours). Perhaps the software or database was updated? Regardless, I'm happy with what I'm seeing so far.

Yet Another Edit: Almost two hours in, and disk IO is still very low. {knock on wood}

SDR

To quote the Farmer from the movie "Napoleon Dynamite"
"I don't understand a word you just said."...
But I do appreciate your great computer hacking skills!
The board is running great today.
If you consider yourself an American, and are not yet on a government watchlist...You're Not Trying.
User avatar
RustyShackleford
Sniper
 
Posts: 2661
Joined: Wed 26 Aug 2009 8:55 am
Location: St. George UT

Re: 'Emergency' Down Time

Postby isdr » Sat 30 Jan 2010 10:46 am

To help explain a little better what I was talking about above, here is a picture of our disk IO utilization for the past month:

disk-io-20100130-171453.png
Image of the disk IO utilization of the UCC server for the last 30 days.
disk-io-20100130-171453.png (37.94 KiB) Viewed 3693 times

As you can plainly see, we used very little disk IO until 17 or 18 days ago. At that point our disk usage went up significantly. For the month of January, we've averaged 4030.65 bytes per second. By way of comparison, for the month of December, we averaged only 30.83 bytes per second.

Understand that disk IO levels under 20 kilobytes per second aren't *bad* (and all the spikes in the averages above were well under that). But when I compared the January numbers with December, it was obvious that something was wrong. Since nothing had changed in the server configuration from one day to the next (and the fact that I'm not a full time system / database administrator) it took some time to figure out what the problem was.

When the syndication module is accessed for RSS or Atom feeds, it has to find the 50 most recent posts that have been approved (which is true for all posts by default, I believe). There are two ways for the database server to do that. One is to look at each and every post, discarding any that are not approved, then sorting the remainder by post date/time, and discarding all but the 50 most recent. The more posts there are, the longer the process takes, and the more data there is to read from the disk. The other way is to use an index to quickly find the exact data needed. In a book you can find data quickly by looking it up in the index, *unless* the publisher neglected to include the information you want in the index. If that is the case, you are forced to read every page to find the information you want.

That is essentially what happened here. The database didn't include an appropriate index, so the only recourse was to read everything to find what was needed. By adding an extra index to the database, we eliminate a lot of disk IO, and accessing the syndication feeds has gone from being a 30+ second operation to an under 1 second operation.

Hopefully those words were more understandable. If not ... well, you'll just have to continue enjoying the board anyway. :)

SDR

Edited to fix lousy grammar in one place. There may be more, but I only noticed the one after posting. :)
isdr
Novice
 
Posts: 24
Joined: Tue 01 Sep 2009 3:35 pm

Re: 'Emergency' Down Time

Postby Jeff Johnson » Sat 30 Jan 2010 3:33 pm

Thanks for finding and fixing that problem! :thumbsup:
IANAL => I Am Not A Lawyer
NRA Life Member | USSC | CCRKBA | UtahConcealedCarry.com
User avatar
Jeff Johnson
Site Admin
 
Posts: 5616
Joined: Mon 03 May 2004 7:48 pm
Location: Cache Valley

Re: 'Emergency' Down Time

Postby isdr » Sat 30 Jan 2010 4:05 pm

Glad to help (as it helps my sites that run on the same server as well).

And for the record, I just realized I've been saying bytes per second (or variations there of) when I meant operations per second. As I said, I'm not a professional system administrator. :)

SDR
isdr
Novice
 
Posts: 24
Joined: Tue 01 Sep 2009 3:35 pm

Re: 'Emergency' Down Time

Postby Rupper » Sat 30 Jan 2010 6:00 pm

isdr wrote:As I said, I'm not a professional system administrator. :)

SDR


You sound like you are doing a fine job to me. I think you need to give yourself more credit.
User avatar
Rupper
Marksman
 
Posts: 263
Joined: Thu 10 Dec 2009 11:22 pm
Location: Riverton, UT

Re: 'Emergency' Down Time

Postby Jarubla » Sun 31 Jan 2010 5:54 am

Yep, good catch on the I/O utilization and updating the index.

I am curious, so if you'll humor me what sort of I/O are you seeing now with the new index?

-Jay
Give a man a fish and feed him for a day, teach a man to fish and you'll never see him on the weekends
User avatar
Jarubla
Sharp Shooter
 
Posts: 856
Joined: Sun 04 Jan 2009 2:23 pm
Location: in ur fridge

Re: 'Emergency' Down Time

Postby isdr » Sun 31 Jan 2010 12:00 pm

The average operations per second for the past 24 hours is 53.68 and the maximum is 2068.90. Compare that to the January average of 4031.00 and maximum of 13573.90. Since basically half the month was low and half the month was high in disk IO utilization, I would estimate an average of between 9000 and 10000 operations per second during the excessive disk IO period.

Edit: Oh, and regarding giving myself more credit ... I give myself plenty in the areas I'm skilled in (computer programming). It's the system administration where I just don't have much experience, but I get by. :)

SDR
isdr
Novice
 
Posts: 24
Joined: Tue 01 Sep 2009 3:35 pm


Return to Site Announcements

Who is online

Users browsing this forum: No registered users and 1 guest