![]() It was written in a couple of minutes, so it can probably be made better. Here's a short Python program I wrote to complete this task. Note that if any of the table cells contain commas, you may need to escape them first, or use a different delimiter. Remove TABLE and TR tags | sed 's/]*>//Ig' I only want Table elements (return only lines with TABLE,TR,TH,TD tags) | grep -i -e ' with newline | sed 's/]*>/\n/Ig' Get the Contents of the URL using cURL, dump stderr to null (no progress meter) curl "" 2>/dev/null curl "" 2>/dev/null | grep -i -e ']*>/\n/Ig' | sed 's/]*>//Ig' | sed 's/^]*>\|]*>$//Ig' | sed 's/]*>]*>/,/Ig'Īs you can see I've got the page source using curl, but you could just as easily feed in the table source from elsewhere. The below was bashed out very quickly, and so could be made much more elegant, but I'm just getting started really with sed/awk etc. So here's my solution using only grep and sed. Sorry for resurrecting an ancient thread, but I recently wanted to do this, but I wanted a 100% portable bash script to do it.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |