Utah Guns Forum banner

'Emergency' Down Time

5K views 11 replies 6 participants last post by  isdr 
#1 ·
Over the last couple of weeks resource utilization on the server hosting UCC has spiked dramatically. I'm finally to a point where I need to test what process(es) are using these resources.

For that reason, it will be necessary very soon now to shut down the board for about an hour, so I can prove or disprove if UCC is the problem.

I don't know *exactly* when that hour will come. I'm trying to coordinate with Tom. But I need it to be ASAP (preferably today) during an otherwise normal peak usage period.

More info will be forthcoming.

Scott Dale Robison
 
#3 ·
isdr said:
My apologies. The one hour down time turned into two hours.

That being said, I got the information I needed.

Thank you for your time.

Scott Dale Robison
You mean I missed a two-hour down time of this board?

I must not be on the board often enough then. :lolbang:

ian
 
#5 ·
Well, I don't know for sure that I've solved it yet. I enabled some extra logging information in mysql to track slow queries, and found one that seems to be a troublemaker. It was in the syndication module. While the query worked, the database wasn't indexed properly for it to run efficiently.

The good news: I just in the last few minutes added a new index to the database and now syndication is almost instant.

The bad news: I have a hard time believing that was the source of the problem, unless no one ever used syndication until the last two or three weeks.

Time will tell.

Edit: The excessive resource utilization that I referenced yesterday was all disk IO related. For the two hours the board was unavailable, virtually no disk IO was recorded. So I was able to confirm that UCC was the source of the trouble. In any case, I'll stay on it until I'm happy with the disk IO levels.

Another Edit: I am cautiously optimistic, as disk IO usage is practically flat (especially the last half hour compared to the previous 20+ hours). Perhaps the software or database was updated? Regardless, I'm happy with what I'm seeing so far.

Yet Another Edit: Almost two hours in, and disk IO is still very low. {knock on wood}

SDR
 
#6 ·
isdr said:
Well, I don't know for sure that I've solved it yet. I enabled some extra logging information in mysql to track slow queries, and found one that seems to be a troublemaker. It was in the syndication module. While the query worked, the database wasn't indexed properly for it to run efficiently.

The good news: I just in the last few minutes added a new index to the database and now syndication is almost instant.

The bad news: I have a hard time believing that was the source of the problem, unless no one ever used syndication until the last two or three weeks.

Time will tell.

Edit: The excessive resource utilization that I referenced yesterday was all disk IO related. For the two hours the board was unavailable, virtually no disk IO was recorded. So I was able to confirm that UCC was the source of the trouble. In any case, I'll stay on it until I'm happy with the disk IO levels.

Another Edit: I am cautiously optimistic, as disk IO usage is practically flat (especially the last half hour compared to the previous 20+ hours). Perhaps the software or database was updated? Regardless, I'm happy with what I'm seeing so far.

Yet Another Edit: Almost two hours in, and disk IO is still very low. {knock on wood}

SDR
To quote the Farmer from the movie "Napoleon Dynamite"
"I don't understand a word you just said."...
But I do appreciate your great computer hacking skills!
The board is running great today.
 
#7 · (Edited by Moderator)
To help explain a little better what I was talking about above, here is a picture of our disk IO utilization for the past month:

disk-io-20100130-171453.png

As you can plainly see, we used very little disk IO until 17 or 18 days ago. At that point our disk usage went up significantly. For the month of January, we've averaged 4030.65 bytes per second. By way of comparison, for the month of December, we averaged only 30.83 bytes per second.

Understand that disk IO levels under 20 kilobytes per second aren't *bad* (and all the spikes in the averages above were well under that). But when I compared the January numbers with December, it was obvious that something was wrong. Since nothing had changed in the server configuration from one day to the next (and the fact that I'm not a full time system / database administrator) it took some time to figure out what the problem was.

When the syndication module is accessed for RSS or Atom feeds, it has to find the 50 most recent posts that have been approved (which is true for all posts by default, I believe). There are two ways for the database server to do that. One is to look at each and every post, discarding any that are not approved, then sorting the remainder by post date/time, and discarding all but the 50 most recent. The more posts there are, the longer the process takes, and the more data there is to read from the disk. The other way is to use an index to quickly find the exact data needed. In a book you can find data quickly by looking it up in the index, *unless* the publisher neglected to include the information you want in the index. If that is the case, you are forced to read every page to find the information you want.

That is essentially what happened here. The database didn't include an appropriate index, so the only recourse was to read everything to find what was needed. By adding an extra index to the database, we eliminate a lot of disk IO, and accessing the syndication feeds has gone from being a 30+ second operation to an under 1 second operation.

Hopefully those words were more understandable. If not ... well, you'll just have to continue enjoying the board anyway. :)

SDR

Edited to fix lousy grammar in one place. There may be more, but I only noticed the one after posting. :)
 

Attachments

#9 ·
Glad to help (as it helps my sites that run on the same server as well).

And for the record, I just realized I've been saying bytes per second (or variations there of) when I meant operations per second. As I said, I'm not a professional system administrator. :)

SDR
 
#11 ·
Yep, good catch on the I/O utilization and updating the index.

I am curious, so if you'll humor me what sort of I/O are you seeing now with the new index?

-Jay
 
#12 ·
The average operations per second for the past 24 hours is 53.68 and the maximum is 2068.90. Compare that to the January average of 4031.00 and maximum of 13573.90. Since basically half the month was low and half the month was high in disk IO utilization, I would estimate an average of between 9000 and 10000 operations per second during the excessive disk IO period.

Edit: Oh, and regarding giving myself more credit ... I give myself plenty in the areas I'm skilled in (computer programming). It's the system administration where I just don't have much experience, but I get by. :)

SDR
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top