  OOPS
@comcast.net | Callcentric Down in Miami Florida
More than two hours passed since outage and TA can't register Callcentric. | |
|
 cyclone_z
join:2006-06-19 Ames, IA | DDoS? Is it a denial of service attack?
From what I've heard cable VoIP isn't much better in the reliability department. | |
|
 |
 |   dcurrey Premium join:2004-06-29 | Re: I don't think so! Absolutely no danger of me going to cable phone options. Too many good voip providers out there with better features and far better pricing. | |
|
 |  |  |
 AVonGauss Premium,MVM join:2007-11-01 Boynton Beach, FL
| To me... I don't believe anybody outside CallCentric knows what happened at this point, but to me, this just illustrates in general the need for good multiple paths of customer communication and probably a bit of additional disaster recovery planning. What concerns me about this event is not so much the event itself, but the lack of ability for the customer to get information regarding the event. | |
|
 |  JSRoman Premium join:2005-03-10 Callahan, FL | Re: To me... This happened 2 hours ago. Aren't you jumping the gun a little? -- »www.seabee.navy.mil | |
|
 |  |  AVonGauss Premium,MVM join:2007-11-01 Boynton Beach, FL | Re: To me... How so? | |
|
 |  |  |  JSRoman Premium join:2005-03-10 Callahan, FL
| Re: To me... How much info do they need to provide besides we are sorry and we are working on it, no eta at this time. Once we know more you'll know more. The priority should be fixing the problem and constant communication takes away from that. -- »www.seabee.navy.mil | |
|
 |  |  |  |  AVonGauss Premium,MVM join:2007-11-01 Boynton Beach, FL | Re: To me... Until they posted on this forum over an hour later, most customers did not even know that much information. | |
|
 |  |  |  |  |   bender Bite my shiny metal ass Premium join:2005-03-19 Evanston, IL clubs: | Re: To me... don't you just have to try to make a call and have it not work to know theres an outage? | |
|
  Mainah
@rr.com | Still working here I just started with Callcentric this week, due in part to reviews on BBR. I have not been down today - have continued to get inbound calls. I don't use them for outbound, though. | |
|
 |
 |  Fisamo Premium join:2004-02-20 Apex, NC
·VOIPo
·AT&T CallVantage
| Re: not good So one outage is automatically considered 'unreliable service'? Seems to me that CallCentric has done an excellent job keeping their systems up and running for quite a while (almost NO forum complaints that I've EVER seen, not counting anything associated with today's event).
Agreed--a multi-hour outage is not a good thing, for anyone. However, as has been stated by others, cable operators, telcos (POTS), and others have experienced multi-hour and multi-day outages, and you don't generally hear much about it. However, this one event does not cause me to automatically rate this provider as 'unreliable', especially given their track record of UPTIME, customer service, good value, etc.
FWIW, I say this as a non-customer... For my needs, I felt that a different provider (Voipo) would be best. But I would still recommend CallCentric without hesitation. | |
|
 |  |   crazyk4952 Premium join:2002-02-04 united state clubs: | Re: not good If this is their only outage in the next few months, then I do not think I would allow this incident to affect my opinion of them. However, if another incident happens soon, then I will really start to question their reliability. | |
|
 |  |
 |   Tweak Premium join:2002-06-08 Oklahoma City, OK
·Cox HSI
| I am a callcentric customer I have an international direct dial number. I have been very pleased with Call centric . If you need reliability go with a true clec or Ilec your local cable company or telco. Due to the massive complexity of these systems its too much to ask for 100% up-time. Even a government regulated telco or cableco aren't expected to have that much up-time, due to the nature of how these services operate. If you have a need for more uptime you might want to consider getting a carrier class non internet voip provider. Yes its more expensive but you get what you pay for. | |
|
 |  |  |
 |  |  |   Tweak Premium join:2002-06-08 Oklahoma City, OK 1 edit | Re: You expect too much from Internet voip You cant expect anything to have a 100% uptime. You pay more for the higher reliability. | |
|
 beachnik
join:2004-01-03 Manhattan Beach, CA | today's outage...
I've been with CALLcentric since around March 09. While today's outage was a little annoying, my overall experience with them has been positive. | |
|
  Tweak Premium join:2002-06-08 Oklahoma City, OK
·Cox HSI
| Wow im impressed with the level of detail Many customers have asked us about what caused the outage today, and what we are doing to prevent future outages. Below we will provide a summary of the cause of the outage, and what we are doing to improve our network. As the cause of this issue was technical, we have tried to provide a basic overview of the issue that occurred.
As background, which is relevant to this outage: Callcentric developed and designed our systems in-house; and we continue to maintain and improve on our services in-house as well. When we launched Callcentric publicly in July 2005 we built our systems in a scalable fashion in order to accommodate the growth of customers and traffic on our network. Most of the systems and hardware on our network including our core database infrastructure (which runs on Sun Solaris Cluster) has been operational since September 2004 as we developed our service and opened it to the public. While we spent a great deal of time building our infrastructure before we launched our service, we've found over the last 4+ years that there are some areas of our network that contain bottlenecks that we've been working to resolve over the last 1 year as our customer base and traffic have grown. This is in addition to having an open network and supporting an ever growing list of customer provided software and hardware as well as customer network architectures which has put additional strain on our network over the years. While some of these bottlenecks we should have anticipated better in the past; the incredible and unexpected growth of our customer base has exceeded the expectations we had when our systems architecture was designed 5 years ago.
As was announced a few weeks ago, we plan to perform a maintenance window on Monday October 5, 2009 from 03:00 AM to 07:00 AM US Eastern time (07:00 GMT/UTC to 11:00 GMT/UTC). This maintenance window is being done to replace a core component of our systems - our primary database. Due to the complexity involved in this change it requires our network to be taken offline while this work is performed. This maintenance is the first of a total of three major changes we plan to make to our core network infrastructure over the coming months.
The reason we are performing this maintenance window is related to the outage that occurred today. As was mentioned above, our core database has been running since 2004 with very few issues over the years. However, about 1 year ago we began planning to replace this infrastructure for many reasons including processing power, memory, and storage space. We've spent the last year moving many of our systems around and adding additional systems for non-real-time database activities in order to off-load our primary and "real-time" database. The changes and planning we've done over the last year were also for the purpose of being able to perform the changes that will occur during the maintenance window in a timely manner.
Unfortunately over the last few days we've begun to have some serious issues with our core database. This included 3 other temporary (less than 1 minute) failures which went un-noticed by customers because redundant systems took over. Our engineers and developers have spent the last few days trying to decipher the causes of these failures, and this morning about 20 minutes before the outage started they had identified the issue as a memory leak in one of our applications. Our engineers were planning how to best mitigate this memory leak as the service outage began; before they were able to take action to correct it.
Because we are approaching what's known as an "edge" condition on parts of our systems (which our maintenance window next week is designed to resolve), there was a rolling and cascading effect caused by one systems failure which affected our core database, and in turn our application servers, proxy servers, web servers, load balancers, and session border controllers. In essence, loads constantly shifted from one part of our network to another causing incredibly high loads on our infrastructure which quickly took everything down. The outage lasted for as long as it did due to the complexity of the load shifts that were occurring while we were trying to stabilize each part of the system; enough to bring the entire system back online.
We believe at this time that we have our systems back on-line in a way that we can spend the next few days without service affecting issues until we perform the maintenance window on October 5th, which should mitigate the issues that occurred today and allow the growth of our customer base and traffic going forward for the long-term.
In addition to the outage that occurred today, one other item we did not perform well on today was customer notification. Due to the way our systems failed we couldn't get our web site up quickly to display a message that we were experiencing a service outage and that we were working on the problem. While we were immediately aware of the outage due to both internal and external (third party) monitoring, we didn't have a good way to notify our customers that we were aware of the issue and working to correct it. As the service outages we've had in the past have not been frequent and generally did not last as long as this outage; customer notification is an issue we didn't spend enough time considering in the past even though we should have. We have a number of internal ideas as well as customer suggested ideas we will begin investigating so that we can provide better and more timely information to customers in the event of future outages or problems; which of course we are also trying to prevent in general.
Finally, thank you to both the customers that sent in polite and encouraging comments today during and after this outage, as well as to the customers that were furious and used fairly explicit language. Both groups provided us with some good ideas and motivation to work even harder. We sincerely apologize for the outage that occured today. We work very hard to try and avoid any downtime on our services, and will continue to try and do an even better job going forward. We greatly appreciate all of our customers business and hope to keep your business going forward.
Sincerely, Greg Blumstein VP Operations Callcentric, Inc. | |
|
 crashoverride
join:2009-09-05
| No Excuse
Still does not excuse them from not having an update on their site or at least posting somewhere what was going on. All they did was release a statement after the fact. At least when PP down their site was still up with a status report, live chat was available, and the post in the PP forum. | |
|
 |   Tweak Premium join:2002-06-08 Oklahoma City, OK
·Cox HSI
| Re: No Excuse If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices. | |
|
 |  |  crashoverride
join:2009-09-05
1 edit | Re: No Excuse said by Tweak :If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices. Still not an EXCUSE... When PP had a meltdown don't you think they rectified the problem as well? Only difference is they were smart enough mot to put all their eggs in one basket. CC could have done better, and I feel they really let us down. Especially since they never gave an update until after the outage. Sorry not feeling much love for CC right now... | |
|
 |  |  |   Tweak Premium join:2002-06-08 Oklahoma City, OK
·Cox HSI
1 edit | Re: No Excuse You actually don't know that call centric put all its eggs in one basket. Really what good does it do to the customer posting an outage notification? You can pick up the phone and realize you have no dial tone . The 2 or 3 minutes it takes typing up the notification is 2 or 3 minutes less spent working on resolving the problem. Really what good does it know about an update? What is pp? | |
|
 |  |  |  |
 |  |  |  |  crashoverride
join:2009-09-05
| Re: No Excuse said by ptrowski :said by crashoverride :said by Tweak :If you read the statement they apologized about that. You know I really would like a company focus on fixing a problem rather then having to post an update. When we start demanding carrier class service from these types of company's we will start paying the higher prices. Still not an EXCUSE... When PP had a meltdown don't you think they rectified the problem as well? Only difference is they were smart enough mot to put all their eggs in one basket. CC could have done better, and I feel they really let us down. Especially since they never gave an update until after the outage. Sorry not feeling much love for CC right now... You forgot that it was TWO outages in TWO days for Phonepower. I not concerned with the outage so much as the lack of communication. | |
|
 |
|
 |