Recently i was assigned to integrate thinking sphinx in a messaging application and the messaging table was having more than 20 Million records.
Though i have configured thinking sphinx in different project so the estimation was that it will take some couple of hours to add it on server but this time i was wrong Because ...
In thinking sphinx the query we used to add delta filed is some thing like this
alter table table_name add column delta boolean default true
But when ever you run the rake command to build the indexes it will first execute the following query
update table_name set detla = false where delta = true
Wow what a beautiful query as in my case this query takes around 4 hours to complete and during this time whole table was locked and server remains down.
why this happens ?
Some projects nature is to introduce search in very initial level or with very little number of records in table so the second query runs very fast and after its completion indexing process starts. As the all db records are being getting qualified for where clause in second query and its going to update all records again so table gets lock.
The solution for this problem is to alter all records delta value back to false first in chunks, then run the rake task to index all records and then after wards index your records.
some thing like this
TableModel.find_in_batches do |records|
TableModel.update_all {:delta => true}, ["id in (?)", records.map(&:id)]
end
and then afterwards running
rake thinking_sphinx:index
cheer :)
Though i have configured thinking sphinx in different project so the estimation was that it will take some couple of hours to add it on server but this time i was wrong Because ...
In thinking sphinx the query we used to add delta filed is some thing like this
alter table table_name add column delta boolean default true
But when ever you run the rake command to build the indexes it will first execute the following query
update table_name set detla = false where delta = true
Wow what a beautiful query as in my case this query takes around 4 hours to complete and during this time whole table was locked and server remains down.
why this happens ?
Some projects nature is to introduce search in very initial level or with very little number of records in table so the second query runs very fast and after its completion indexing process starts. As the all db records are being getting qualified for where clause in second query and its going to update all records again so table gets lock.
The solution for this problem is to alter all records delta value back to false first in chunks, then run the rake task to index all records and then after wards index your records.
some thing like this
TableModel.find_in_batches do |records|
TableModel.update_all {:delta => true}, ["id in (?)", records.map(&:id)]
end
and then afterwards running
rake thinking_sphinx:index
cheer :)
No comments:
Post a Comment