An itunes update recently enabled a 'feature' that automatically deletes podcasts older than a few days (note to self - never update that software again). If you're a year behind on some casts you might be annoyed by this kind of thing. Fortunately, this didn't delete anything older than about one year, so the gap should be managable.
First step: check the source to see if we can download stuff from the website. iTunes only has a few weeks of stuff, but checking around adamcarolla.com I see there's an archive. Looks like they keep the downloadable links going back about 9 months. That's better than nothing, so let's write a quick script to page through the archive and download anything we can find that will cover as much of our time frame as possible.
$ cat archive_retriever.sh
#!/bin/bash +x
BASE_DIR="/home/wwwknapster"
ARCH_DIR="archive"
URL_BASE="https://adamcarolla.com/category/podcasts/page/"
MAX_PAGE=31
MIN_PAGE=3
for XX in $(seq $MIN_PAGE $MAX_PAGE)
do
THISPAGE="${URL_BASE}${XX}/"
echo "Processing $THISPAGE"
for SUBPAGE in $(curl -s $THISPAGE | grep '^\s*<a itemprop="url" href="https:\/\/adamcarolla\.com\/[a-zA-Z0-9_-]*/\" title=.*>$' | sed -e 's/^\s*<a itemprop="url" href="//' | sed -e 's/".*$//')
do
echo "Check for download links on ${SUBPAGE}"
for DLURL in $(curl -s ${SUBPAGE} | grep noxsolutions.com | grep download |sed -e s'/^\s*<a href="//' | sed -e 's/".*$//')
do
echo "Downloading $DLURL"
FNAME=$(echo ${DLURL} | sed -e 's/^https:\/\/aw\.noxsolutions\.com\/launchpod\/adswizz\/[0-9]*\///' | sed -e 's/\.mp3\?.*$/\.mp3/')
echo "curl -L -b cookies.txt \"${DLURL}\" -o ${BASE_DIR}/${ARCH_DIR}/${FNAME}"
curl -L -b cookies.txt "${DLURL}" -o ${BASE_DIR}/${ARCH_DIR}/${FNAME} > somelog.mp3
done
done
done
Ok so now we have a few hundred mp3s, but they have zero id3 tag info so itunes won't be able to make sense of it. We'll have to install something called kid3 (ignore the other search results, they are for software that is way out of date). On fedora 'sudo dnf install kid3', which installs a UI and a cli to try and reverse engineer the correct tags. Most importantly, it seems itunes sorts by a tag called RELEASE DATE, and which appears to be called 'TDRL' in the id3 world and is in the format YYYY-mm-DDT00:00:00Z (which is insanely poorly documented). So we need to parse the filename to figure out the title and release date, and use the kid3-cli to convert the ID3v2.3 tags to ID3v2.4 so we can write the TDRL tag along with the usual artist, title, etc.
In this case, the filenames have some strange naming conventions we can script around (ie, we know nearly all ASC podcasts are divided into two parts in this era of ACS, but they frequently number their 2nd part as '4' instead of '2'. If there's no guest (and thus no title) we can safely assume it's just a Adam, Gina, Bryan show. A lot of this code is fairly ACS specific).
$ cat id3tagger.sh
#!/bin/bash +x
# files look like 2020-05-15_ACS_HoustonCurtis_1.mp3
WORKDIR="$1"
if [ "${WORKDIR}x" == "x" ]
then
echo "Need an work dir"
exit 1
fi
shift
SCRIPT_ACTION="$1"
if [ "${SCRIPT_ACTION}x" == "x" ]
then
echo "Need an arg (fix or test)"
exit 1
fi
if [ "${SCRIPT_ACTION}" == "fix" -o "${SCRIPT_ACTION}" == "test" ]
then
echo "Running ${SCRIPT_ACTION} on mp3s in ${WORKDIR}"
for FNAME in $(find $WORKDIR -maxdepth 1 -type f -name "*.mp3" -printf '%f\n')
do
echo "$FNAME"
PARTCHECK=$(echo "${FNAME}" | sed -E -e 's/^.*_//' | sed -e 's/\.mp3$//')
if [[ $PARTCHECK =~ ^[0-9]$ ]]
then
if [ $PARTCHECK -gt 1 ]
then
PARTNUM=2
elif [ $PARTCHECK -eq 1 ]
then
PARTNUM=1
else
PARTNUM=0
fi
fi
RAW_DATE=$(echo "${FNAME}" | sed -e 's/_.*$//')
RELEASE_DATE="${RAW_DATE}T00:00:00Z"
PLAIN_DATE=$(echo "${RAW_DATE}" | sed -e 's/-.*$//')
RAW_TITLE=$(echo "${FNAME}" | sed -E -e 's/^[0-9]{4}-[0-9]+-[0-9]+(_ACS)?_//' | sed -E -e 's/(_[0-9])?\.mp3$//')
#echo "$RAW_TITLE ${#RAW_TITLE}"
if [ ${#RAW_TITLE} -le 1 ]
then
RAW_TITLE="Adam,Gina,Bryan"
fi
TITLE="ACS: ${RAW_TITLE}"
if [ $PARTNUM -gt 0 ]
then
TITLE="${TITLE} Part ${PARTNUM}"
fi
echo "kid3-cli -c to24 -c \"set TDRL '${RELEASE_DATE}'\" -c \"set title '${TITLE}'\" -c \"set Artist 'Adam Carolla Show'\" -c \"set Album 'Adam Carolla Show'\" -c \"set genre 'Podcast'\" -c \"set date '${PLAIN_DATE}'\" -c \"set 'Podcast Feed' 'http://feeds.feedburner.com/TheAdamCarollaPodcast'\" \"${WORKDIR}${FNAME}\""
if [ "${SCRIPT_ACTION}" == "fix" ]
then
kid3-cli -c to24 -c "set TDRL '${RELEASE_DATE}'" -c "set title '${TITLE}'" -c "set Artist 'Adam Carolla Show'" -c "set Album 'Adam Carolla Show'" -c "set genre 'Podcast'" -c "set date '${PLAIN_DATE}'" -c "set 'Podcast Feed' 'http://feeds.feedburner.com/TheAdamCarollaPodcast'" "${WORKDIR}${FNAME}"
fi
done
fi
Now this looks better:
$ kid3-cli -c "get all" acs/2019-11-22_ACS_VinnieTortorich_4.mp3
File: MPEG 1 Layer 3 64 kbps 44100 Hz 1 Channels 1:10:11
Name: 2019-11-22_ACS_VinnieTortorich_4.mp3
Tag 2: ID3v2.4.0
Title ACS: VinnieTortorich Part 2
Artist Adam Carolla Show
Album Adam Carolla Show
Date 2019
Genre Podcast
Encoder Settings Lavf57.41.100
Release Date 2019-11-22T00:00:00Z
Podcast Feed http://feeds.feedburner.com/TheAdamCarollaPodcast
Now copy all these to the windows machine where itunes screwed the pooch. Add the files using the itunes add files dialog, then select all the imported tracks in itunes and right click 'song info'. Then go to options -> media kind -> change to 'podcast'. If the artist matches the original podcast it should just get seemlessly added to the list of entries for that podcast.
In my case I also had some older podcasts on my ipod. Apple is evidently run by idiots so there's no built-in way to copy from ipod -> itunes, but I was able to use something called 'sharepod' that does support this and was pretty easy to use.
Between these two methods I recovered the missing year of podcasts. I really need to setup a reliable backup for these ephmeral types of media...