Feather Weight Hibernate Objects

In my previous post Cheap Tricks to Fullfill Your Need for Speed, I talked about how you could reduce the memory footprint of your data base query by using a SQL query instead of a Hibernate query, and retrieve only the columns that are you need. However, there maybe times when you actually want to use a pojo, and retrieving columns and storing them in an object array is insufficient. One solution is to create a feather weight Hibernate object. A feather weight Hibernate object contains a subset of the original Hibernate object’s properties (i.e. a limited set of the data base columns), based on which properties you specify. You still map to the same data base table, but your Hibernate mapping file contains fewer columns, therefore while querying you retrieve data from fewer columns thereby speeding up your query. You can continue to use Hibernate objects as pojos by retrieving the Hibernate through a criteria query or HQL statements.

Another reason to do use feather weight objects is when creating data base tables to store data from third party vendors. In many cases the vendors provide you with a lot of data fields, which you will want to store in the table because your data needs might change based on your feature set. But you might not need to retrieve all the fields for the current feature set. Using feature weight Hibernate objects allows you to continue storing all the fields, but only retrieve those that are the most essential.

Post to Twitter Tweet This Post

Presentation for Code Camp ‘08

Part III. Rapid Development

I’ve covered three of the areas that are very important to becoming a web-service (latency, throughput, and quality), and I’m sure this seems daunting or overwhelming. But keep in mind I’m talking about how Mint’s code and service evolved; we didn’t do everything at once because we did not have the resources or the time. As Mint started maturing there were two areas that we stressed:

  1. Manageability: keeping the code and data base clean, and extensible in case features are cut, added, or revised over time. Its very important to start a project thinking about manageability, or how the feature will evolve within the application.
    • Code manageability: re-factor, don’t introduce a lot of complexity, focus on the tiered architecture to figure out where certain pieces logically fit (e.g. persistence, business logic)
    • Data base manageability:consider how quickly a data set is going to grow when designing tables, foreign key associations, retrieving data, and frequency with which data is accessed.
  2.  

  3. Optimization: improving performance of code at runtime in order to satisfy latency and throughput requirements. While this is important, it is not something that one should focus on from the beginning.
    • Do not make architectural decisions that are too long term, do what you need for the next 6 months. Why? Because its a startup, the product will continue to evolve in approximately 6 month cycles. Don’t waste time optimizing everything, or before you see a demand for a feature. Remember its a startup; resources are scarce and time is critical.
    • E.g. why didnt cache user data we do this from the start? Initially aggregating data nightly because synchronizing data across nodes was difficult and had no mechanism for centralized locking, but once this was put in place we switched to loading data on demand (during user login) and then going through process of aggregating and caching it (in the future we might only show most recent data instead of all data).
    • Why we didn’t shard databases from the start? Huge amount of overhead and engineering resources that needed to be allocated more impending issues.

Post to Twitter Tweet This Post

Cheap Tricks to Fullfill Your Need for Speed

As I’ve mentioned countless times, the performance of a website can really make or break a user’s experience. Everyone places emphasis on tuning and sharding databases, and buying multi-threaded and multi-core processors. These are all viable solutions but they are either pricey or labor intensive. So what are some quick and clean ways to improve the performance of your web app or web service?

Start with the slowest moving part and work from there, disk I/O, or more specifically data base access times. What impacts data base access time? Aside from mechanical parts…slow queries. Ok what are some slow queries? Queries that perform joins and table scans. The biggest and quietest culprit in producing slow queries ORM (Object relationship mapping) software such as Hibernate. Hibernate’s job is to make life easier for Java developers by abstracting away the relationship of a POJO (plain old java object) to a data base object. However, it has a few pitfalls which cause slow queries. Lets explore just two for today:

1. Non-lazy loading of non-nullable objects.
2. Retrieving every column in a row.

If you have a data base object A that has a foreign key relationship to another object B, which is non-nullable Hibernate will load B when only object A is requested. One solution to avoid this is to avoid using a Hibernate query such as the Criteria query, and instead using a straight up JDBC call to retrieve object A.

Suppose only the name field of object A is desired, while this is possible to do in Hibernate, a straight up JDBC call to retrieve a single column from object A is much faster.

Using ORM software apis is easy, but you should understand the price you pay in terms of performance. Use Hibernate queries when you need relationships to be managed amongst objects like when you want to traverse a class hierarchy in Java. But if all you want is a small piece of data using straight JDBC calls are faster and will get you the data you need when you need it.

Post to Twitter Tweet This Post

Presentation for Code Camp ‘08

Part II. Web Application to Web Service

Creating a prototype is very challenging, but its not sufficient. Many companies fail to actually create a service, because they simply take a prototype and add more features to satisfy the demands of the users. Transforming a prototype into a product is what I call the app to service phase, and it takes more than just piling on features. An application is just a point tool that a user uses to complete a few simple tasks, which is why I liken it to a prototype, which is fine if that’s your goal. But if your goal is to create a product that has an ever growing user base then you have to broaden your thinking from features to logistics.

Despite what your product’s goal is there are inevitably three areas that will influence the evolution of you software’s architecture:

1. The growth rate of your user base
2. Data processing and storage
3. Company growth, and more specifically how quickly you grow your engineering team

Post to Twitter Tweet This Post

Trade-offs in Unit Testing – Part II Stubs

After spending a couple days implementing test cases using Mock objects, I switched to testing using stubs, because I still needed to verify the functionality of the code.

It took me only a few hours to write the test cases I needed using stubs, and with and IDE it’s even easier. But I don’t think its a fair assessment to compare using Mocks to Stubs because each tool is used for a different purpose (behavioral vs. functional).

However, I did notice one very obvious disadvantage to using stubs; if your application is dynamic and the functionality is changing test cases become brittle quickly, and require you to add additional members or functions.

But I would still advocate using stubs in writing test cases, because of the simplicity involved in writing them, being able to abstract away or limit dependent classes.

Post to Twitter Tweet This Post

Presentation for Code Camp ‘08

I will be giving a presentation at Code Camp in about one month. The title of my presentation is “The Evolution of a Scrappy Startup to a Successful Web Service”. In the following posts I will attempt to flush out some of the ideas I plan on presenting. Please feel free to comment on my ideas and provide feedback.

Part I: Prototyping

I remember Mint’s alpha launch as if it were yesterday even though it was almost two years. The main purpose of our launch was to get a prototype out for our friends, family, and investors to try out. We had pinned down our mission statement: “Mint: do more with your money.” We wanted to convey this message in our prototype. Thus, the feature set we chose was deliberately limited. Our main feature set consisted of showing users their transactions, and aggregating the balances of their checking and savings accounts, that was it! Nothing more nothing less. Why? Because the basic definition of a prototype in software development, is a rudimentary working model of a product or information system, build for demonstration purpose or as part of the development process. We planned on developing more features, but we wanted to demonstrate what we were going to do as a product. Starting out with a small feature set and growing it from there was the best way to go.

1. How did we arrive at this feature set? We started out by flushing out what was going to differentiate our product from the rest. Account aggregation, showing people their checking and savings accounts and categorized transactions on one web site.

2. What doesn’t belong in prototyping? Everyone wastes time trying to spec out a complete feature set, and all the bells and whistles they’d like to have in their product. We tried to limit that process. Once we had pinned down our feature set we went from there. The first step to prototyping is to figure out what the critical problems are that we are trying to solve and will encounter in trying to solve them. Since we were a financial web-app we had to handle security, and some concurrency amongst our 100+ users. We also had a few algorithms that we were implementing that drove our business differentiation, but none of these were completely flushed out. We coded up the bare-bones implementation for each of these. We also set up a simple unit test framework, but nothing too fancy and no system tests, because the focus was to get the product out there and have real users test it with real data!

3. So what did we it look like? Our prototype consisted of several Java modules but no real architecture, because we only had a couple hundred users and less than a handful of engineers to build it. So we made due with what we had. But we had good separation amongst our modules in terms of separating the core business logic from the data processing engines, and the UI from the server side logic.

Post to Twitter Tweet This Post

Territorial Women

When I was a senior at Duke I was determined to be my own woman. In my mind I thought it would be the last year of my life that I would be living my life on my own terms and not have anyone to answer to; I had the freedom to set my own goals, and work towards my own success.

At the time I thought I was making the right decision. Having my own life and my own territory was the only one thing that mattered to me, no one was going to trespass. I was reminded by this recently when I was watching an episode of LipStick Jungle. Mary Tyler Moore who plays Brooke Shields, Wendy’s mom, berates her for helping her daughter with her homework, and making the family dinner instead of working long hours at her company. Wendy’s mom is trying to help Wendy realize that someday Wendy may not be the president of her company, she won’t be the leader or own her career. Wendy’s mom lost ownership of her career years ago when she made the decision to build a family. Now she’s trying desperately to re-start the career she left, but she’s over 60, and no one believes she has the drive of a younger woman. Wendy’s mom and I both believe that building one’s career is about carving out a space for yourself. Its how we measure our success and our happiness, because its the only area in your life that you can control, and make decisions in. And when you’re younger you have more freedom because you’re only responsible for yourself. Losing one’s career is tantamount to losing one’s purpose, and having purpose is what keeps us productive and makes us happy.

To some Wendy’s mom’s situation might seem bleak, but I don’t think thats the point. Freedom is about making choices, and yes sometimes those choices mean giving up certain values like a career. And while it might take years to rebuild what you once gave up that doesn’t mean it isn’t worth the effort. In the words of Dido, “If my life is for rent and I don’t learn to buy, the I deserve nothing more than I get, cause nothing I have is truly mine.”

Post to Twitter Tweet This Post

Dukie Femgineers

I recently gave a talk to SWE (Society of Women Engineers) at my alma mater Duke University. The purpose of the talk was to give young women engineers insight into how college, specifically Duke did and didn’t prepare me to work as an engineer for a startup like Mint.com.

I thought the talk was well received and the girls asked a lot of good thoughful questions regarding startups, funding, marketing, and what employers were looking for in potential candidates.

Here is the gist of the talk I gave:

Hi, I’m Poornima Vijayashanker. I graduated from Duke as an ECE and CS major in ‘04. I left Duke to move to the Bay Area and work as a software engineer. For the past two years I’ve been working at Mint.com. Mint.com came about after the founder and CEO Aaron Patzer was frustrated with existing financial software. As a young and active individual he didn’t want to spend hours budgeting and tracking his finances. He found Mint.com as a way to help people save time and do more with their money. I knew about Mint.com since its inception. I came up with the name, and I’ve helped Aaron build his engineering team over the next two years.

My interest in working for Mint.com grew from my desire to be more than just an engineer. I wanted to have the freedom to create a product that would change people’s lives, and I also wanted to experience the evolution of a company and a product.

Duke prepared me for the startup environment. For starts, Duke’s curriculum is very challenging. The CS courses teach students good software fundamentals, and coding skills which apply directly to industry. Duke’s engineering school teaches the princples of problems solving that can be continually applied to any business or engineering problem.

When I first came to Duke I thought I was accepted by mistake. I was certainly not the smartest person there, and found the first two years very difficult. But then I started to see how my peers solved problems, and I began to think about engineering problems differently. Collaborating on projects and working as a TA exposed me to different methods of writing code. I became smarter just by being surrounded by smart kids

Finally, Duke’s professors are very supportive of their students, which is exhibited by the countless teaching awards they win year after year. My own engineering Professor Lisa Huettel is an endless source of inspiration to me both as a female and as a engineer. Her patience year after year, and boundless enthusiasm for engineering keep me motivated.

However, I think there are a few areas for improvement.

At Duke, every year is sort of the same. Sure each class is slightly different; there are those that are based on projects, others that have problem sets and tests, and some that you write papers for. And each year gets more and more challenging, but overall you have a very similar routine once you understand the system.

But at a startup you are constantly dealing with change. In the two years I’ve been at Mint I’ve seen the company grow from 3 people to 27, which meant that I played a pivotal role in the growth of it, including hiring my own boss, and explaining the architecture of the system to new employees. I wasn’t just coding everyday. I’ve also had to learn how to support a growing user base. During all these changes I had to learn to adapt, and accommodate them in order to be successful.

Duke didn’t teach me how to make tradeoffs, which is a highly valuable skill required in business and engineering because of time and resource constraints. In every class you are assigned a set of problems or a project, which you have to complete fully. Whereas in industry you might be given three features, but may only have time to implement one or two of them, and then you spend the remaining release cycle unit testing them, because you want to deliever a quality solution to customers, whose satisfaction and approval are your final grade.

Also, when writing a piece of code its not enough for it to be correct, it has to be scalable (handle multiple incoming and outgoing requests),
not generate too much load on the database, and be robust enough to avoid security breaches. Thinking in these terms comes with time and experience, but it could be incorporated into the existing curriculum by giving students more freedom when it comes to software or hardware design.

There was a lot that Duke did for me, but I would say the one thing it did best was teach me how to have fun. Not many people can say they’ve experienced an NCAA basketball championship, or been tenting with classmates in K-ville. Duke does an especially good job of keeping its students happy and instilling an element of work/life balance in them that I havent seen at many other institutions. I think this transcends everything else, because ultimately in working for a startup you want to be ethusiastic and passionate about the company and the product, and having fun every step of the way is the only way to ensure a happy and successful engineer.

In closing, I enjoy working at Mint.com, and the thousands of positive testimonals represent how successful it has been. I would like to thank Duke for playing a part in its and my own success.

Post to Twitter Tweet This Post

Performance: Part II Address Scalability Before Its Too Late

As your product and user base grows you want to ensure that your customers both old and new have a good user experience. You want their experience to improve and not stagnate or diminish over time; scalability is another key element to address to ensure the success of your website. Scalability is defined as the capacity to keep pace with changes and growth. 

Maintaining a scalable website requires thinking from a business perspective. You want to understand how rapidly your site is growing, and what the frequency of usage is. These two factors serve as metrics for predicting how to allocate resources. You can also use historical usage as an indicator of how much activity you expect in the future. Press releases and media events increase the user base over the course of a few days or even a week by unexpected amounts. Ideally, the number of users is increasing at steady predictable rate week over week. Depending on the type of site you’re running you can figure out what the peak hours for use are, or if user activity increases during certain seasons of the year. Also, knowing the frequency or peak hours of usage helps to schedule maintenance, new feature release, and cron jobs, which won’t interfere with the user’s experience.

Continuing to think in terms of business, you’re site is operating on a fixed budget, hence the amount of resources you have is directly proportional to your operating costs. Until you receive another round of funding or bring in revenue you have to make due with what you have. Therefore, you must understand the limits of your resources in terms of response time, throughput, and concurrency in order to allocate resources efficiently but still guarantee quality. Load testing* is the best way to predict the limits of each of these. 

Now lets switch back to thinking like an software engineer. From a code base perspective, web applications should be tier based. Here is a simple tiered approach:

UI -> Business Logic -> Persisted Object -> DB

You might also have another tiered data model that runs in parallel, which could be used to process or retrieve data that is not in immediate use by the user. Usually a messaging protocol such as JMS or RMI is used as a means of communication between these parallel data model tiers. The benefit of having a tier-based is that you can cache data that doesn’t change frequently across tiers, thereby limiting the number of expensive DB calls made. Moreover, as the number of concurrent users increases securing data across users becomes pivotal. With a tier based approach only certain tiers can manipulate and persist data. 

I’m sure we’ve all learned from our intro computer architecture class that CPU bound processes are the fastest and can be parallelized, whereas I/O processes are the bottleneck. In the case of a website, accessing the DB is the slowest I/O process. However, you can speed up access to data by sharding the database. Sharding breaks up a large database into smaller pieces that contains redundant information or a parent db can map data to separate dbs.

The last and priciest technique is having multiple servers. Configuring a load balancer to handle requests and the send them to each server is one way improving throughput and response times for user.

Improving the scalability of your website is a good problem to have, because it means your site is growing! But you don’t want to wait until a server crashes or a db thrashes. A little forethought will continue to grow your user base and keep them coming back for more!

* Future article on load testing.

Post to Twitter Tweet This Post

Performance: Part I Develop a Monitoring Scheme

Two years ago Aaron Patzer was frustrated with Quicken and Money, because setting up the service alone took over an hour. His painful experience led him to ditch both these products, and create his own product Mint.com, in hopes of delivering a faster and more useful personal financial service. Unfortunately, not everyone is a programmer, who can automate a tedious task, and not all tasks can be automated through software. Hence bad products continue to perpetuate the marketplace, and users are left waiting: in a line, for a site to download, or for a better service to come along. But sometimes their prayers are answered and a better service does come along. However, its only a matter of time till this service becomes plagued by inefficiencies. So how do we keep a web service performant? In the realm of software I believe there are three basic principles to keep a site up and running with crowd pleasing response times:

1. Develop a monitoring scheme
2. Address scalability before its too late
3. Re-write code

The first computer I ever had was a 486. While I initially believed that the thrill of just having a computer was enough, I was soon annoyed with the machine when I installed my first modem, and the damn thing would take minutes to connect. Fortunately, my dad’s frustration for it grew as well, and we soon replaced it with a Pentium I, II, III, and so on. So being a very green software engineer, I believed that a fast CPU makes life faster. However, I learned a very valuable lesson in my graduate computer architecture class at Stanford, what “Intel giveth, Gates Taketh away.” Throwing hardware to improve the performance of your machine is easy, but its expensive and only a temporary solution. And even if your company can buy a faster server it doesn’t mean that the consumer can afford to do the same!

So how does a web service stay efficient overtime? Monitoring! Specifically, monitoring features in the context of data flow. Any given feature fits into one of two categories: it either has data that will be displayed to the user, or as the user uses the feature, they generate data that must be stored. Here are the two simple flows:

a. server -> data -> user 
b. server-> user -> data -> server

In flow a its important to get the data from the server to the user asap. In flow b its important to get the data from the user back to the server to process it, store it, and in some cases return the processed data back to the user. Now that we know the areas where the data will be exchanged those are the places we need to monitor. 

For flow a, we need to figure out how large of a data set we are dealing with, and how long it will take to retrieve the data set from the server. Once we have the data set we need to figure out how long it takes until the user sees the complete data set. 

For flow b, we might need to return data to the user once its processed which is similar to flow a, but the harder part is taking in the data from the user, and processing it quickly, and then responding to the user with the new data. Unfortunately, the round trip associated with this flow is an unavoidable evil. But, there are ways to optimize it …

Post to Twitter Tweet This Post