Hello,

I'm working on data modeling for a Pinterest-like project. There are
basically two main concepts: Pin and Board, just like Pinterest, where pin
is an item containing an image, description and some other information such
as a like count, and each board should contain a sorted list of Pins.

The board can be modeled with primary key (board_id, created_at, pin_id)
where created_at is used to sort the pins of the board by date. The problem
is whether I should denormalize details of pins into the board table or
just retrieve pins by page (page size can be 10~20) and then multi-get by
pin_ids to obtain details.

Since there are some boards that are accessed very often (like the home
board), denormalization seems to be a reasonable choice to enhance read
performance. However, we then have to update not only the pin table be also
each row in the board table that contains the pin whenever a pin is
updated, which sometimes could be quite frequent (such as updating the like
count). Since a pin may be contained by many boards (could be thousands),
denormalization seems to bring a lot of load on the write side as well as
application code complexity.

Any suggestion to whether our data model should go denormalized or the
normalized/multi-get way which then perhaps need a separate cached layer
for read?

Thanks,

Ziju

Reply via email to