Some notes about this pattern before we start. If you just want to use the pattern (included in the installer from v0.6 on), skip down to the notes delimiter header.
A user had a My Clippings file that include date/times in the following format:
Added on Monday, 23 April 12 22:51:41 GMT+01:00
This was a significant difference from the US Kindle Touch and, unfortunately, was unsupported by the library used to parse dates. My solution was to exclude the “GMT+01:00” part of the match on the assumption that DaleyKlippings would correct for it by using the user’s local time. Even if this isn’t true, the difference should be trivial in the greater scheme of things.
A few other differences are present (especially if you’re using these samples to learn):
- This pattern excludes the phrase “Your ” (including the trailing space) in front of the word Note/Highlight/Bookmark
- This pattern includes bookmarks by adding the phrase inside the Type tag, e.g. (Highlight|Note|Bookmark)
- Location was presented as Loc. so this was added to the matching text as a valid option
- Note that “GMT” is outside the (?P<Date>…) tag. This ensures that it is properly matched by the regular expression, but excludes it from the tag.
- Because we had a two-digit year and there is a bug in our standard date/time object (QDateTime, which uses this formatting), this code uses a pattern compatible with Python’s datetime object (see Python’s datetime object for full documentation). The code should automatically recognize and use the right parser based on your date/time pattern.
# Import notes and highlights from "My Clippings.txt" and ignore # bookmarks. Warnings with information on ignored bookmarks # will be added to the log - this is the app normal behaviour # Note that VERBOSE and UNICODE options are always on ^\s* # (?P<Book>.*?) # Book name (\s*\((?P<Author>[^\(]*)\))? # Author name (optional) \s*-\ # (?P<Type>(Highlight|Note|Bookmark)) # Clipping types accepted (\ on\ Page\ # (?P<Page>[\d-]*)\ \|)? # Page (optional) (.*(Location|Loc\.)\ # (?P<Location>[\d-]*))? # Location (optional) .*?Added\ on\ # (?P<Date>(.*)) # Date & time (\ GMT(\+|\-)[0-9:]*) # Padding to exclude GMT and +/- number \s* # (?P<Text>.*?) # Text \s*$ #
%A, %d %B %y %H:%M:%S
UTF-8 (all languages)