After we released a new version of our application a performance issue appeared. And I got an idea, how to locate the cause.
Every HTTP request ends with a log statement similar to this:
08:56:23,123 INFO [cz.softeu.xyz] Total_processing_time=12 ms, sql_count=6 (restore_time=1 ms | process_validation_time=0 ms | update_model_time=0 ms | invoke_app_time=1 ms | render_time=10 ms), url=https://server.cz/app/entity/123/action/?arg=456
When the app is slow, the
Total_processing_time (i.e. how long it takes to finish that request) will be higher. How about I count the average
Total_processing_time that our application needs for every kind of
url and then compare those average times before the last version was released with average times after the release? It must be clear what kinds of requests suddenly take much longer time to finish. And that will be the place where we can start looking for the problem.
I had 60 log files (60 days of history), each with 500 000 lines, a reasonable knowledge of Ruby and I was ready to start.
Because everything related to monitoring and performance is Jindra’s area of expertise, I asked him for his opinion first.
Jindra told me about a trick he used before in a similar situation:
Don’t write a script that goes through our log files and counts something for you. Instead, write a script that turns those logs into SQL database. And then - let the database do the work for you.
I liked it immediately.
Here is the process I used in the end:
log files (~30 000 000 lines) -> grep out important lines (~1 200 000 lines) -> ruby script that turns logs into SQL inserts -> SQLite database with all the data -> queries, limited only by my imagination (counts, averages ... very fast)
It turned out that this particular performance problem was caused by Jindra’s monitoring of the performance itself, it was totally unrelated to what kind of URL was accessed and Jindra fixed it when I was half way through this experiment :-)
I finished it anyway and learned a lot.
Database may be a better tool for your analysis than plain text files.10. 10. 2013, kategorie: it