#plugin #data #scraping #bizdev_utils #pe_utils #_2023 This is a hacky Brave plugin to extract links from a page, save as .csv files, and write them to a sqlite database. I use this for sites wherein lots of copying and pasting links (e.g. email addresses, domains, etc) is necessary. (Note: this is on my GitHub but posted here too for quick ref) This plugin can be adapted for any browser, but is specifically set up for Brave. To install and make use of the plugin, follow these instructions: - [ ] Install [Brave](https://brave.com/) - [ ] Clone this repo (or make a directory on your computer and add the background.js, content.js, and manifest.json files to it) - [ ] Open the Brave browser and navigate to brave://extensions. - [ ] In the top-right corner, enable "Developer mode" by clicking the toggle switch. - [ ] Click on the ‘Load unpacked’ button. - [ ] Navigate to this extension directory (this repo) and click Select. Once the plugin has been installed, you can update the default save location for the csv downloads by following these instructions: - [ ] Open the Brave browser and navigate to brave://settings/downloads - [ ] Change update the location tab to reflect the directory you prefer (usually, the same directory wherein this repo has been cloned) Optional, but you can also use the upload_csvs_to_sqlite_py file to upload all of the csv files in the directory to a sqlite database. background.js ```js // background.js chrome.browserAction.onClicked.addListener(function(tab) { chrome.tabs.sendMessage(tab.id, {"message": "clicked_browser_action"}); }); ``` content.js ```js // content.js chrome.runtime.onMessage.addListener( function(request, sender, sendResponse) { if( request.message === "clicked_browser_action" ) { var links = []; var anchors = document.getElementsByTagName('a'); for (var i = 0; i < anchors.length; i++) { links.push(anchors[i].href); } // Create a CSV string var csv = 'links\n' + links.join('\n'); // Create a Blob object from the CSV string var blob = new Blob([csv], { type: 'text/csv;charset=utf-8;' }); // Create a link element and click it to start the download var url = URL.createObjectURL(blob); var link = document.createElement('a'); link.href = url; link.download = 'links.csv'; link.click(); // Clean up by revoking the Object URL and removing the link element URL.revokeObjectURL(url); link.remove(); } } ); ``` manifest.json ```json { "manifest_version": 2, "name": "Link Scraper", "version": "1.0", "permissions": ["activeTab"], "background": { "scripts": ["background.js"], "persistent": false }, "browser_action": { "default_title": "Link Scraper" }, "content_scripts": [ { "matches": ["<all_urls>"], "js": ["content.js"] } ] } ``` upload_csvs_to_sqlite.py ```python # upload_csvs_to_sqlite.py import os import glob import sqlite3 import pandas as pd # Get the current working directory current_dir = os.getcwd() # Connect to SQLite database conn = sqlite3.connect('link_scraper.db') # Get a list of all CSV files in the specified directory csv_files = glob.glob(os.path.join(current_dir, '*.csv')) # For each CSV file, read its contents into a DataFrame and append that to the 'links' table in the database for csv_file in csv_files: df = pd.read_csv(csv_file) df.to_sql('links', conn, if_exists='append', index=False) # Close the connection to the database conn.close() ```