Number of files downloaded per user query
The following was my procedure for making corrections to this ad-hoc Moodle query:
Expanding out the sub-query and running it in DBeaver, could see the discrepancies in the count. Changed the select count() to just select, then went step by step to identify which rows were causing discrepancies, copy-pasting the filenames to adjacent columns on a temporary google sheet for easy comparison.
It turns out the bugs in the query were due to some assumptions which I made about the prefix_logstore_standard_log table:
1. I assumed each file download led to a single log entry - no, there are multiple log entries for each download
2. I assumed that the shorter query which I used for counting the number of files will be sufficient, apparently it is not.
3. There were also overcounting in case the user was downloading the same file multiple times.
So, I have made some modifications to both the queries, now they shows the number of files downloaded which are of non-zero size. The files listing only shows one filename for each download and does not indicate multiple downloads, so now I have modified the number of files counted per user also to ignore multiple downloads of the same file.
In the case of these users I tested, one user has download some files many times, like
Even after this, there were mismatches in the count. The reason is that was, the query "Number of files downloaded per user" counted only distinct filenames.
Now, there are some files with the same name, but with different content, like Fractions.pdf which occurs with different content in different courses. The listing of files shows these separately.
I can correct for this by choosing
count(distinct f2.contenthash) instead of
count(distinct f2.filename)
But still the numbers won't match. This is because there are some files with exactly the same name and same content, in different courses.
For example,
VI Maths || Fractions || VC_Way of Sharing from the Whole.pptx || 779583
VI TNSSE Maths || Fractions Term III || VC_Way of Sharing from the Whole.pptx || 779583
which have both been downloaded by the user we checked.
So, in order to count these files also separately, I will count
count(distinct concat(f2.contenthash,l2.contextinstanceid) instead of simply
count(distinct f2.contenthash)
Then it matches for this user.
Files downloaded per user query
The reason for the "short name of the course, resource reviewed and files downloaded mismatch" for the other query was wrong mapping of resourceid in the log.
In our query, we have
left join vv_resource r on l.objectid = r.id
i.e. we are mapping the objectid to resourceid. But this seems to be wrong.
To see why, I went through all the columns in the logstore log table.
There is a column which says objecttable. For those entries where objecttable says resource, the mapping is correct. For example, pdf files. The contextinstanceid seems to be unique to each course section.
So, I modified the query so that instead of "Resource viewed", I have put the column "Course Module" which shows the correct course module -
vcs.name, from
left join {course_modules} vcm on l.contextinstanceid = vcm.id
left join {course_sections} vcs on vcm.section = vcs.id
Then the query gives correct results for pdf files as well as other file types like pptx.