#plugin #data #scraping #bizdev_utils #pe_utils #_2023
This is a hacky Brave plugin to extract links from a page, save as .csv files, and write them to a sqlite database. I use this for sites wherein lots of copying and pasting links (e.g. email addresses, domains, etc) is necessary. (Note: this is on my GitHub but posted here too for quick ref)
This plugin can be adapted for any browser, but is specifically set up for Brave. To install and make use of the plugin, follow these instructions:
- [ ] Install [Brave](https://brave.com/)
- [ ] Clone this repo (or make a directory on your computer and add the background.js, content.js, and manifest.json files to it)
- [ ] Open the Brave browser and navigate to brave://extensions.
- [ ] In the top-right corner, enable "Developer mode" by clicking the toggle switch.
- [ ] Click on the ‘Load unpacked’ button.
- [ ] Navigate to this extension directory (this repo) and click Select.
Once the plugin has been installed, you can update the default save location for the csv downloads by following these instructions:
- [ ] Open the Brave browser and navigate to brave://settings/downloads
- [ ] Change update the location tab to reflect the directory you prefer (usually, the same directory wherein this repo has been cloned)
Optional, but you can also use the upload_csvs_to_sqlite_py file to upload all of the csv files in the directory to a sqlite database.
background.js
```js
// background.js
chrome.browserAction.onClicked.addListener(function(tab) {
chrome.tabs.sendMessage(tab.id, {"message": "clicked_browser_action"});
});
```
content.js
```js
// content.js
chrome.runtime.onMessage.addListener(
function(request, sender, sendResponse) {
if( request.message === "clicked_browser_action" ) {
var links = [];
var anchors = document.getElementsByTagName('a');
for (var i = 0; i < anchors.length; i++) {
links.push(anchors[i].href);
}
// Create a CSV string
var csv = 'links\n' + links.join('\n');
// Create a Blob object from the CSV string
var blob = new Blob([csv], { type: 'text/csv;charset=utf-8;' });
// Create a link element and click it to start the download
var url = URL.createObjectURL(blob);
var link = document.createElement('a');
link.href = url;
link.download = 'links.csv';
link.click();
// Clean up by revoking the Object URL and removing the link element
URL.revokeObjectURL(url);
link.remove();
}
}
);
```
manifest.json
```json
{
"manifest_version": 2,
"name": "Link Scraper",
"version": "1.0",
"permissions": ["activeTab"],
"background": {
"scripts": ["background.js"],
"persistent": false
},
"browser_action": {
"default_title": "Link Scraper"
},
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["content.js"]
}
]
}
```
upload_csvs_to_sqlite.py
```python
# upload_csvs_to_sqlite.py
import os
import glob
import sqlite3
import pandas as pd
# Get the current working directory
current_dir = os.getcwd()
# Connect to SQLite database
conn = sqlite3.connect('link_scraper.db')
# Get a list of all CSV files in the specified directory
csv_files = glob.glob(os.path.join(current_dir, '*.csv'))
# For each CSV file, read its contents into a DataFrame and append that to the 'links' table in the database
for csv_file in csv_files:
df = pd.read_csv(csv_file)
df.to_sql('links', conn, if_exists='append', index=False)
# Close the connection to the database
conn.close()
```