I tend to use a flexible-inverted-cone approach to any new project.
This serves three (3) purposes:
Once we have defined the funnel then we begin to take on the next phase which is the thought-prototyping phase. However, this does not merely consist of wiring-up some code and pushing it to some resources. The prototyping phase consists of several pragmatic steps to evaluate the feasibility of a solution before any code is written or any physical resources allocated. Many of these steps are merely to provoke thought and leverage not only the understanding of the key stakeholders, but also the knowledge of the team and what facilities are currently available and what may become available in the future. Among my favorite methods are:
Rather than focus on specific technologies, I look at what the need is from a contextual standpoint and what facilities are available to satisfy the need. Technology comes and goes, but a good architecture is readily adaptable and affords the customer a timely and consistent experience over time.
There are two primary routes to take when considering a solution:
There are several parts to managing a service including: development, delivery, operating, refinement.
Within the context of each element there are numerous processes which need to be implemented and measured to make sure that the service is satisfying its mandate.
I like the CI/CD process here, but I would modify it a bit. For example, there are several areas that CI/CD doesn't take into account, at least not directly; these include:
Incident severity can vary; however, the goal should be to address all incidents with the same level of care which is highly correlated to the level of automation you have in place and which has gone through robust testing based on FMEAs. Having a process that is well understood and curated is important to provide the customer the best possible experience when something does occur...because it will. At a certain point, solutions must depend on that which is outside of the services control and viability. For example, all service owners must understand the relative importance of their service to that of the other services: just because your service is down doesn't mean that yours is the most important overall...many times it's not and that needs to be understood and communicated with the stakeholders. A mitigation may simply be to put your service in a holding pattern while the other services are brought back online, and understanding that provides a navigation point for resilience.
Below are some observations from the trenches that a service operator needs to be aware of and refine their offering to incorporate these items: