Saturday, November 21, 2020

exporting users from a large google group

We wanted to export all the current subscribers to a large google group (>30k), and the export members functionality doesn't seem to work even for owners of the group*. So, brute-forcing to the rescue. Used SikuliX to automate the process of going to each page of subscribers, select all, copy, paste into a text file, save, and move to the next page of 100 users after sorting the users by join date earliest first. Since I didn't have opencv installed using apt on this machine and didn't want to break any opencv3.x configuration I might have done, used hard-coded locations on screen instead of Sikuli's feature detection. So, I had to manually click popups to ensure proper wait times after each operation. The script looked like this - 
 
i=18 #the page from which we're starting
while(True):  
  dragDrop(Location(721,265), Location(821,365)) #drag at the top end
  type("a",KeyModifier.CTRL)        #Ctrl a - select all
  type("c",KeyModifier.CTRL)        #Ctrl c - copy
  click(Location(1301,565))         #click inside text editor
  type("v",KeyModifier.CTRL)        #Ctrl v - paste
  popup("waiting for paste")
  click(Location(1336,450))         #click save toolbar icon
  popup("waiting for save dialog")
  click(Location(726,102))          #click on save dialog
  type(str(i))                      #save filename = pagenumber
  popup("waiting for save")
  click(Location(1311,720))
  popup("waiting for creating new file")
  click(Location(1276,450))
  i=i+1
  click(Location(1207,232))
  popup("waiting for pageload")

Then the following python script helped parse it into a csv - 
#!/usr/bin/env python3
import sys
count = 0
for line in sys.stdin:
    #print(line)
    if line.startswith("Profile image"):
        Profileim = line
        username = next(sys.stdin).strip()
        ExtOrRadiosaiId = next(sys.stdin).strip()
        if (ExtOrRadiosaiId == "External"):
            emailid = next(sys.stdin).strip()
        else:
            emailid = ExtOrRadiosaiId
        print('{},{}'.format(username, emailid))
        count = count + 1
        
print(count)

by calling it as 
cat * | parsedumpinfile.py > mycsvfilename.csv

Earlier, I had thought about importing the txt files into a spreadsheet and processing the fields there, creating a formula to copy every sixth (email id) to column next to username, taking care to select six + six + six before dragging down, using a modified method from
to select every sixth row. Or using autoformat in LibreOffice Calc, which looks like this.



But the python script above is much easier and cleaner, of course. 

*Edit - For a group inside a GSuite account, this method using Directory API might work. But GSuite api and googlegroups.com api are different. Only brute-forcing works for googlegroups.com :)

No comments:

Post a Comment