PDA

View Full Version : Avsim again...



ryanbatc
March 6th, 2014, 17:06
Sorry to keep wondering but is Avsim forums down? I can access their libraries but haven't been able to access forums for 2 days.

TuFun
March 6th, 2014, 17:11
"Just conversed with the Forums Manager at Avsim who asked to spread the word that they have suffered a disk failure on a RAID array on their primary server that has taken down the entire sight. Expect to be back online late Friday PM or early Saturday AM." per miatamariner over at flightsim.com

Milton Shupe
March 6th, 2014, 17:29
That is interesting. Every redundant array I have managed in business, I never had a system wide failure. One disk goes out; unplug it, put in another; no impact.

henrystreet
March 6th, 2014, 18:22
a straight RAID 0 will be the highest performance but very susceptible to a single drive failure. a RAID 5 without distributed parity is susceptible to a single drive failure if the parity drive is the one to fail. neither one of these configurations should ever be used on a server. likely we will get more details after the forums are back online.

Geomitrak
March 6th, 2014, 20:50
I can't get on Avsim half the time anyway, so I didn't notice the difference. I just thought it was 'situation normal'.

bstolle
March 6th, 2014, 20:58
That's true! Looking for an alternative forum (I don't like the way Tom and a few his staff are 'managing' the forums at all) since years but haven't really found one yet, although simviation.com is quite ok.

Dumonceau
March 6th, 2014, 23:35
That is interesting. Every redundant array I have managed in business, I never had a system wide failure. One disk goes out; unplug it, put in another; no impact.

I agree 150%! Restore time can sometimes be quite long with performance degrading, but in a redundant array impact should be minimal. Strange indeed!

Dumonceau

alehead
March 7th, 2014, 02:53
a straight RAID 0 will be the highest performance but very susceptible to a single drive failure. a RAID 5 without distributed parity is susceptible to a single drive failure if the parity drive is the one to fail. neither one of these configurations should ever be used on a server. likely we will get more details after the forums are back online.

I am an IT admin myself and work with RAID1/RAID5-based systems locally... our datacentre stuff is a totally different kettle of fish.

Given that avsim host official support fora for companies like PMDG, they would need to have some better failover and/or redundancy than that I am sensing currently. They have been battling with performance for months now, avsim is one of the slowest fora I have ever seen. TA goes on about numbers simultaneously connected to avsim, the numbers are even on the homepage when logged in. They force login to view the fora, more than 250 "guests" and you can see nothing, not even the news items...

They need to get with it and get some serious hardware running with good 24/7 support, particularly if they want to keep the official support forum customers like PMDG happy. If I were R Randazzo and co, I would seriously be looking into a new support forum host...

I know that disk errors can happen, but they have been offline for over 2 days now. I was only using avsim for support these days, as news sites like airdailyx and aerosoft's asn have long since succeeded avsim. In pure terms of news delivered, I would say airdailyx is about the most avid and up-to-date...

A

n4gix
March 7th, 2014, 06:22
They need to get with it and get some serious hardware running with good 24/7 support, particularly if they want to keep the official support forum customers like PMDG happy. If I were R Randazzo and co, I would seriously be looking into a new support forum host...
Just to put some perspective on the situation, the server rack is 100% Hewlett-Packard provided equipment. At the time of installation about six years ago the total cost was just under $100,000. I'd certainly classify that as "serious hardware."

AVSIM also purchased a service contract (HP Care Package*) that will fortunately cover this under their extended warranty protection. A service tech is supposed to be at the data center sometime today, and the latest word is that the forum server should be up no later than Saturday afternoon.

The tech will also be investigating just why a single disk failure brought the entire RAID 5 array to a screeching halt. It should not have since it was configured as a distributed parity RAID 5 array.

The latest word from Tom was this:

Also, I would like you to know that we are taking this opportunity to increase our storage capacity on that server by 50%. We will have more disks delivered next week, and they will be installed as soon as we can get HP Care Package folks on-site.

* http://www8.hp.com/us/en/business-services/it-services.html?compURI=1077422

Paul J
March 7th, 2014, 08:29
That is interesting. Every redundant array I have managed in business, I never had a system wide failure. One disk goes out; unplug it, put in another; no impact.

That's exactly the way r5 arrays go, Milton: over the last 20 years of deep (https://www.dropbox.com/sh/0sq7yfsb197828l/Pc85x9XokI) involvement with several data centers, I've had a number of single-disc r5 failures using the old HSG80, plus a number on the HP EVA7000 - plus one memorable multi-disc failure on the eva, caused by buying bulk scsi 320 drives (they were cheap - and vibrated).
Almost every modern san has hot-swap drives. In addition - a commercial system - "State of the Art", not only should use hot-swapdrives - but it should also mirrored to a stand-by site, with a fairly short predictable downltime, and the data should have a nightly backup in operation - and spare disks in stock in the data center.
Someone mentioned raid 5 shouldn't be used??? Raid 5 is the single most common array used in the SAN world - and parity is spread over the whole array - not placed on one disk.. There are newer spin-offs too - notably r6; r50 is pretty common, but very often the cost is prohibitive. In addition - i-scsi, ssd's and SATA drives bringing the cost of a san down.

but to be down for three days.... my job would be on the line. There are not too many "three strikes and you're out!" commercial enterprises around that would tolerate a three-day business interuption.

All the Best,

pj

Dangerousdave26
March 7th, 2014, 08:44
Someone mentioned raid 5 shouldn't be used??? Raid 5 is the single most common array used in the SAN world - and parity is spread over the whole array - not placed on one disk..

I saw that as well. It was definitely an incorrect statement but RAID can be confusing at times. The poster was probably thinking of RAID 4.


A RAID 5 comprises block-level striping with distributed parity. Unlike in RAID 4, parity information is distributed among the drives. It requires that all drives but one be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks

henrystreet
March 7th, 2014, 08:58
I saw that as well. It was definitely an incorrect statement but RAID can be confusing at times. The poster was probably thinking of RAID 4.

Thanks Dave for clarifying, I did mean RAID 4 WITHOUT distributed parity.

Timbohobo
March 7th, 2014, 16:45
No great loss, It might just be me, but I find Avsim way too emotional, full of grown men chucking tantrums like 9 year olds with toys :mixed-smiley-010: at least here fully grown men chuck warplanes at each other

I wonder if the few companies who use it primarily for their support forums have ever thought of the fact that as a guest you cannot see any of the forums, and hence are invisible to the world. I certainly do not buy anything that only has a support forum there and nowhere else

alehead
March 7th, 2014, 23:30
Thanks for the info on the hardware setup n4gix (sorry, but I cannot see your name through the tapatalk interface)
This is the first time for me that I have heard of a single drive failure in RAID5 taking down the entire array. $100k of hardware albeit 6 years ago you say doesn't help much if you have no failover. 3+ days in downtime in business is serious... I wish them all the best in getting back up to speed.

I fully understand Timbohobo's comments. I am sure a number of hosted developer teams over there may already be looking for a different solution to their forum hosting needs, as well as the advertising partners...

Anyway...


Andrew Entwistle

Timbohobo
March 11th, 2014, 14:08
I wonder how long until the 'It was hacked' story happens and the hand goes out again (I got banned for 60 years for saying that on AVSIM ...btw, so it must be a topic that pushes some guilty button somewhere haha) :173go1:

n4gix
March 12th, 2014, 08:05
UPDATE as of 11:08 PM 3/11/2014: Restoration from the backup system is underway. Estimated time to complete: 1 day. Whether it will be successful or not is yet to be determined.

There's no mystery here. It was a case of hardware failure. Nothing more; nothing less... :triumphant:

scott967b
March 12th, 2014, 15:39
Thanks. No idea if it's the case here, but this isn't a good time to find out how your "restore" feature works.

scott s.
.

odourboy
March 13th, 2014, 06:41
Well sort of... doesn't appear to be accepting new posts (at least, not from me, even though I logged in). Definitely a step in the right direction though!:applause:

I got this:
The administrator has limited the number of new posts you can submit within a short time frame. Please wait 171230889 seconds before replying or posting a new topic.

I only have to wait 23 years to post! LOL

ViperPilot2
March 13th, 2014, 07:41
Well sort of... doesn't appear to be accepting new posts (at least, not from me, even though I logged in). Definitely a step in the right direction though!:applause:

I got this:
The administrator has limited the number of new posts you can submit within a short time frame. Please wait 171230889 seconds before replying or posting a new topic.

I only have to wait 23 years to post! LOL

When you visited last... what was the Date? I was there about 20 minutes ago, and all of the Dates said the Year 2339!
I got the same Message when trying to post, too.

:dizzy:

Geomitrak
March 13th, 2014, 08:12
Just posted there - everything seems back to normal now.

Phantom88
March 14th, 2014, 06:15
Interesting........Not Accepting my posts for some reason.

n4gix
March 14th, 2014, 07:10
When the forum server was brought back online, the database was initialized with a date sometime in 2008. When Tom reset the time and date, he mistyped 14 for the day instead of 13, hence the forum server's data was now 24 hours "fast..."

Tom has reset the forum server's time and date early this morning, but the re-synch won't occur until sometime this afternoon apparently.

I cannot post or reply there myself until later today...

Tom posted some details about AVSIM's server farm that frankly surprised me. I knew it was "big" but really had no idea at the true size...

We run two main servers that provide over 11 Tera-bytes of bandwidth of files and data a month. Each of those servers run 8 processors, with 32 gigs of RAM on each. Our MySql database is over 7 gigs in size.

We have looked at those "cloud" systems and at a minimum, they would quadruple our costs (at a minimum and actually it could be five to six times more or greater). Each month we would be unable to predict what our final cost would be because of their variable pricing due to bandwidth and other charges. Our advertising and donations do not pay our costs now. Doing that would just cause us to shut the doors sooner than later.

Go to the MS Azure site and price out a comparable Linux based system. When I looked, their Linux based solution was over $4,000 a month before you priced in bandwidth and database size. Factor in 11 Tera-bytes a month and the size of the database and I shudder to think what the monthly cost would be. I can say this with certainty... The community would not be willing to pay for it, that's for sure.

Phantom88
March 14th, 2014, 08:04
Bill,Thank you for the info.