Monday, May 21, 2012

More Tips

It's nice to be learning new stuff. I'm just going to write line by line of some more stuff that I learned ..

Always use UTF-8 as the default charset in Mysql. 

I added this in my.cnf (/etc/)
[mysqld
default-character-set=utf8 
default-collation=utf8_general_ci 
character-set-server=utf8 
collation-server=utf8_general_ci 
init-connect='SET NAMES utf8' 
[client
default-character-set=utf8

If you are trying to sync between DBs, then take chunks of data.

Say you are trying to sync between DBs. And you want to do some transformation before sync (May be rename some columns). Then you would want to do a select query to get a subset of data (limit n) and then insert it in bulk using values (n values).

If there is a possibility of updates instead of inserts, then use the 'ON DUPLICATE KEY UPDATE' clause.
It's a good idea to update based on timestamps. What I did was that I touched a file everytime I do an update, so that the next time I know where I need to start from.

If a transaction fails because of Lock Errors write a retry routine in Code.

Lock might occur. Especially on DBs where there are a lot of reads/writes. So I would say try writing for x times and then raise an exception.

Optimize Active Admin if using with a DB with many records.

I have had this problem where the data in the table was HUGE and there were associations with another table with HUGE number of records. This caused the application to load very slowly and also made the writes to fail.

The fix that I made was to take out thos associations and write getters and setters instead of the associations. Some extra code, but make a HUGE difference to performance.

Use S3 when you want to store data.

The application had to generate reports and store them for future viewing. The flow in implementation is as follows


  1. User Creates Report
  2. He sets the report to execute on a specific time (he could also execute this report manually)
  3. On triggering of the report, a GeneratedReport is created and the job is pushed to delayed_job
  4. When the delayed_job gets to it, the report is executed and the result is written to a tmp file
  5. This tmp file is uploaded to S3 as a private file and a url is made with an expiration of months.

More to Come.