Saturday, May 01, 2021

handling duplicates in an sql query

There was a database in which we needed to join three tables, asset id -> subtheme id -> subtheme and the subtheme ids had duplicates as well as multiple subthemes mapped to an asset id. Getting a go-ahead from the content lead, the solution chosen was to just take a single subtheme, using select distinct - something like the first method at https://www.sisense.com/blog/4-ways-to-join-only-the-first-row-in-sql/  

left join (select distinct on (asset_id) * 
         from asset_subtheme_xref
     order by asset_id asc
) as asx on asx.asset_id = tma.asset_id
        left join subtheme_master lsm on lsm.subtheme_id = asx.subtheme_id

Edit: Later on, multiple subthemes were desired. Then, instead of using select distinct in this way, we would need some sort of subquery similar to the one used below for another field, using concat to add all the different subthemes together with CHR(10) - linefeed - between them - all enclosed in parentheses - 

(select string_agg(concat(btwo.name,',',stwo.name,',',sutwo.NAME,',',ctwo.name), CHR(10))
as "Asset tagged to Board Standard Chapter"
from asset_bssc_xref tabxr 
left join chapters ctwo on ctwo.chapter_id = tabxr.chapter_id
left join board_master btwo on btwo.board_id = ctwo.board_id 
left join standard_master stwo on stwo.standard_id = ctwo.standard_id
left join subject_master sutwo on sutwo.subject_id = ctwo.subject_id 
where 
tabxr.asset_id = tma.asset_id) as "Asset Tagged to"


No comments:

Post a Comment