Category: Software Architecture

Pre-Launch Prep

I’ve been advising a few pre-launch startups that are getting ready to do their first ever product launch.  From first-hand experience I know that the first product launch can be nerve-wrecking.  You expect the product to be pixel perfect, and all the features to be fully functional and bug-free.  But there is such a thing as spending too much time on perfecting the product, and forgetting that you need to be able to support load on the system.  What good is a shiny product that people can’t even sign up for because a server has gone down, or their signup process takes on the order of minutes or hours!

Here are a few tips I’ve been giving people to help alleviate their pre-launch preoccupations:

  1. Logging.  For good developers this should be a no-brainer, because they would havetested or debugged their code and would have included logging to do so.  For Java I use log4J, Ruby has its own class Logger.  You also want to make sure you are rotating the logs daily.
  2. Keep an eye on Exception and Errors. Once you’ve got your logging in place create a simple shell script to grep the logs and tally up errors and exceptions.  You can set this up in cron to delivery the logs to your email every hour or day.
  3. Fix Showstoppers. These are bugs that prevent a user from signing up, or are glaringly obvious and create a negative and lasting impression on new users.
  4. Monitor Your Servers. Tools like Nagios or Ganglia.
  5. Monitor the drop off points. There are typically three: coming to the site, sign-up, and then performing the first action (depending on your product’s features).  To do this you will need Google Analytics.
  6. Load Test. Test the limits of your system so that you know how much load it can handle.  JMeter is a good tool to use to do this.
  7. Load Balancer. If you are responsible for your servers then I’d highly recommend doing this.  Check out this intro lesson on load balancing.

Hold off on all new feature requests!  First impressions count for a lot, and whats more important than being feature rich is being able to have a site up and running.  What good are more features if users can’t even get through your front door!  Focus on supporting the new and incoming users and giving them a worthwhile and timely experience.  Stay tuned for a follow-up on what to do the morning after…

Enhanced by Zemanta

Post to Twitter Tweet This Post

Think About Scale from the Start

If you are thinking about scaling a web application or service, congratulations, because you have users that liked you or were curious enough to sign up and stick around! You will of course be acquiring more users shortly.  While the trajectory of user growth is unknown, and depends a lot on your usage model (viral social network vs. word-of-mouth individual user service) there are a few things you need to address:

  1. Capacity – your site will need to handle more concurrent users, signing up users alone can generate a lot of load on the system, even before they get to using the product.
  2. Reliability - users will want to use the service on their own time.  The site needs to be up and running 24 hours with  limited maintenance windows.
  3. Scalability - if users can generate data on your site you will have more data you need to store/retrieve.

If you’re growing too fast a common way to solve #1 and #3 is to throw hardware at the problem.  A startup focuses on creating the MVP (minimum viable product), which means the prototype has just enough functionality to add a significant value to the lives of users that convinces them to sign up and use it for awhile.  Putting the product out there initially means you’re testing the product/market fit, and as a result you’re unsure of how many user will signup, and what their usage patterns will be .  Let’s say you are a cocky and a cheapskate, you know you’ll have users, but you don’t want to solve problems by buying hardware all the time.  If you’re cautious you will do the following:

  1. Start performance tuning and load testing even before releasing the product!
  2. Create a restricted alpha and beta, which allows you to control the growth rate.
  3. Measure the adoption rates and usage patterns for your alpha and beta users.
  4. Use the measured adoption rate to anticipate how many servers you can afford.
  5. Monitor spikes in adoption due to press releases or other news events. And be ready to re-route traffic (failover) in the event of a server failing.

These are the top 5 ways you can initial think about scaling your app without a whole lot of code re-writes.  But there will come a time in which you will need to redo a lot of the prototype’s code base.  We’ll save that for another post…

Enhanced by Zemanta

Post to Twitter Tweet This Post

Presentation for Code Camp ‘08

Part III. Rapid Development

I’ve covered three of the areas that are very important to becoming a web-service (latency, throughput, and quality), and I’m sure this seems daunting or overwhelming. But keep in mind I’m talking about how Mint’s code and service evolved; we didn’t do everything at once because we did not have the resources or the time. As Mint started maturing there were two areas that we stressed:

    1. Manageability: keeping the code and data base clean, and extensible in case features are cut, added, or revised over time. Its very important to start a project thinking about manageability, or how the feature will evolve within the application.
      • Code manageability: re-factor, don’t introduce a lot of complexity, focus on the tiered architecture to figure out where certain pieces logically fit (e.g. persistence, business logic)
      • Data base manageability:consider how quickly a data set is going to grow when designing tables, foreign key associations, retrieving data, and frequency with which data is accessed.

 

  1. Optimization: improving performance of code at runtime in order to satisfy latency and throughput requirements. While this is important, it is not something that one should focus on from the beginning.
    • Do not make architectural decisions that are too long term, do what you need for the next 6 months. Why? Because its a startup, the product will continue to evolve in approximately 6 month cycles. Don’t waste time optimizing everything, or before you see a demand for a feature. Remember its a startup; resources are scarce and time is critical.
    • E.g. why didnt cache user data we do this from the start? Initially aggregating data nightly because synchronizing data across nodes was difficult and had no mechanism for centralized locking, but once this was put in place we switched to loading data on demand (during user login) and then going through process of aggregating and caching it (in the future we might only show most recent data instead of all data).
    • Why we didn’t shard databases from the start? Huge amount of overhead and engineering resources that needed to be allocated more impending issues.
Enhanced by Zemanta

Post to Twitter Tweet This Post

Presentation for Code Camp ‘08

Part II. Web Application to Web Service

Creating a prototype is very challenging, but its not sufficient. Many companies fail to actually create a service, because they simply take a prototype and add more features to satisfy the demands of the users. Transforming a prototype into a product is what I call the app to service phase, and it takes more than just piling on features. An application is just a point tool that a user uses to complete a few simple tasks, which is why I liken it to a prototype, which is fine if that’s your goal. But if your goal is to create a product that has an ever growing user base then you have to broaden your thinking from features to logistics.

Despite what your product’s goal is there are inevitably three areas that will influence the evolution of you software’s architecture:

1. The growth rate of your user base
2. Data processing and storage
3. Company growth, and more specifically how quickly you grow your engineering team

Enhanced by Zemanta

Post to Twitter Tweet This Post

Presentation for Code Camp ‘08

I will be giving a presentation at Code Camp in about one month. The title of my presentation is “The Evolution of a Scrappy Startup to a Successful Web Service”. In the following posts I will attempt to flush out some of the ideas I plan on presenting. Please feel free to comment on my ideas and provide feedback.

Part I: Prototyping

I remember Mint’s alpha launch as if it were yesterday even though it was almost two years. The main purpose of our launch was to get a prototype out for our friends, family, and investors to try out. We had pinned down our mission statement: “Mint: do more with your money.” We wanted to convey this message in our prototype. Thus, the feature set we chose was deliberately limited. Our main feature set consisted of showing users their transactions, and aggregating the balances of their checking and savings accounts, that was it! Nothing more nothing less. Why? Because the basic definition of a prototype in software development, is a rudimentary working model of a product or information system, build for demonstration purpose or as part of the development process. We planned on developing more features, but we wanted to demonstrate what we were going to do as a product. Starting out with a small feature set and growing it from there was the best way to go.

1. How did we arrive at this feature set? We started out by flushing out what was going to differentiate our product from the rest. Account aggregation, showing people their checking and savings accounts and categorized transactions on one web site.

2. What doesn’t belong in prototyping? Everyone wastes time trying to spec out a complete feature set, and all the bells and whistles they’d like to have in their product. We tried to limit that process. Once we had pinned down our feature set we went from there. The first step to prototyping is to figure out what the critical problems are that we are trying to solve and will encounter in trying to solve them. Since we were a financial web-app we had to handle security, and some concurrency amongst our 100+ users. We also had a few algorithms that we were implementing that drove our business differentiation, but none of these were completely flushed out. We coded up the bare-bones implementation for each of these. We also set up a simple unit test framework, but nothing too fancy and no system tests, because the focus was to get the product out there and have real users test it with real data!

3. So what did we it look like? Our prototype consisted of several Java modules but no real architecture, because we only had a couple hundred users and less than a handful of engineers to build it. So we made due with what we had. But we had good separation amongst our modules in terms of separating the core business logic from the data processing engines, and the UI from the server side logic.

Enhanced by Zemanta

Post to Twitter Tweet This Post

Performance: Part II Address Scalability Before Its Too Late

As your product and user base grows you want to ensure that your customers both old and new have a good user experience. You want their experience to improve and not stagnate or diminish over time; scalability is another key element to address to ensure the success of your website. Scalability is defined as the capacity to keep pace with changes and growth.

Maintaining a scalable website requires thinking from a business perspective. You want to understand how rapidly your site is growing, and what the frequency of usage is. These two factors serve as metrics for predicting how to allocate resources. You can also use historical usage as an indicator of how much activity you expect in the future. Press releases and media events increase the user base over the course of a few days or even a week by unexpected amounts. Ideally, the number of users is increasing at steady predictable rate week over week. Depending on the type of site you’re running you can figure out what the peak hours for use are, or if user activity increases during certain seasons of the year. Also, knowing the frequency or peak hours of usage helps to schedule maintenance, new feature release, and cron jobs, which won’t interfere with the user’s experience.

Continuing to think in terms of business, you’re site is operating on a fixed budget, hence the amount of resources you have is directly proportional to your operating costs. Until you receive another round of funding or bring in revenue you have to make due with what you have. Therefore, you must understand the limits of your resources in terms of response time, throughput, and concurrency in order to allocate resources efficiently but still guarantee quality. Load testing* is the best way to predict the limits of each of these.

Now lets switch back to thinking like an software engineer. From a code base perspective, web applications should be tier based. Here is a simple tiered approach:

UI -> Business Logic -> Persisted Object -> DB

You might also have another tiered data model that runs in parallel, which could be used to process or retrieve data that is not in immediate use by the user. Usually a messaging protocol such as JMS or RMI is used as a means of communication between these parallel data model tiers. The benefit of having a tier-based is that you can cache data that doesn’t change frequently across tiers, thereby limiting the number of expensive DB calls made. Moreover, as the number of concurrent users increases securing data across users becomes pivotal. With a tier based approach only certain tiers can manipulate and persist data.

I’m sure we’ve all learned from our intro computer architecture class that CPU bound processes are the fastest and can be parallelized, whereas I/O processes are the bottleneck. In the case of a website, accessing the DB is the slowest I/O process. However, you can speed up access to data by sharding the database. Sharding breaks up a large database into smaller pieces that contains redundant information or a parent db can map data to separate dbs.

The last and priciest technique is having multiple servers. Configuring a load balancer to handle requests and the send them to each server is one way improving throughput and response times for user.

Improving the scalability of your website is a good problem to have, because it means your site is growing! But you don’t want to wait until a server crashes or a db thrashes. A little forethought will continue to grow your user base and keep them coming back for more!

* Future article on load testing.

Enhanced by Zemanta

Post to Twitter Tweet This Post

Performance: Part I Develop a Monitoring Scheme

Two years ago Aaron Patzer was frustrated with Quicken and Money, because setting up the service alone took over an hour. His painful experience led him to ditch both these products, and create his own product Mint.com, in hopes of delivering a faster and more useful personal financial service. Unfortunately, not everyone is a programmer, who can automate a tedious task, and not all tasks can be automated through software. Hence bad products continue to perpetuate the marketplace, and users are left waiting: in a line, for a site to download, or for a better service to come along. But sometimes their prayers are answered and a better service does come along. However, its only a matter of time till this service becomes plagued by inefficiencies. So how do we keep a web service performant? In the realm of software I believe there are three basic principles to keep a site up and running with crowd pleasing response times:

1. Develop a monitoring scheme
2. Address scalability before its too late
3. Re-write code

The first computer I ever had was a 486. While I initially believed that the thrill of just having a computer was enough, I was soon annoyed with the machine when I installed my first modem, and the damn thing would take minutes to connect. Fortunately, my dad’s frustration for it grew as well, and we soon replaced it with a Pentium I, II, III, and so on. So being a very green software engineer, I believed that a fast CPU makes life faster. However, I learned a very valuable lesson in my graduate computer architecture class at Stanford, what “Intel giveth, Gates Taketh away.” Throwing hardware to improve the performance of your machine is easy, but its expensive and only a temporary solution. And even if your company can buy a faster server it doesn’t mean that the consumer can afford to do the same!

So how does a web service stay efficient overtime? Monitoring! Specifically, monitoring features in the context of data flow. Any given feature fits into one of two categories: it either has data that will be displayed to the user, or as the user uses the feature, they generate data that must be stored. Here are the two simple flows:

a. server -> data -> user
b. server-> user -> data -> server

In flow a its important to get the data from the server to the user asap. In flow b its important to get the data from the user back to the server to process it, store it, and in some cases return the processed data back to the user. Now that we know the areas where the data will be exchanged those are the places we need to monitor.

For flow a, we need to figure out how large of a data set we are dealing with, and how long it will take to retrieve the data set from the server. Once we have the data set we need to figure out how long it takes until the user sees the complete data set.

For flow b, we might need to return data to the user once its processed which is similar to flow a, but the harder part is taking in the data from the user, and processing it quickly, and then responding to the user with the new data. Unfortunately, the round trip associated with this flow is an unavoidable evil. But, there are ways to optimize it …

Enhanced by Zemanta

Post to Twitter Tweet This Post