Non-English My Clippings Files

With version 1.3.0, DaleyKlippings has significantly improved support for non-English “My Clippings” files.  This tutorial will help you use a built-in pattern (German or Spanish) or build a custom pattern for your language.  This tutorial will continue to refer to the file downloaded from the Kindle as “My Clippings” although your file name may be different (e.g. “meus recortes” in Portuguese)

Basic Settings

Whether you’re using a built-in pattern or customizing a pattern for your own purposes, the first thing you must do is set the “Language” settings on the “Application” tab:

Language Settings

NOTE:  These settings will make NO DIFFERENCE unless you are using an Import Pattern customized for your language.

If these settings are not correct, DaleyKlippings won’t know what to do with each entry in the My Clippings file so you may not get any lines imported correctly.  Please set the following settings:

  • Date Interpreter – If your date includes words for the days-of-week or the months and these words are not in English, you will need to adjust this setting.  Note that this setting will not automatically interpret helper words like the Spanish ‘de’.  These words should be included in the default patter or are discussed in greater detail below.
  • Range Delimiter – By default, DaleyKlippings assumes that page/location ranges use a hyphen (e.g. 213-525).  If ranges on your Kindle use a different pattern (e.g. the Portugese 213 a 525), you can enter that value here.
  • Highlight, Note, and Bookmark – In each field, please place a translation for the terms.  This will ensure that DaleyKlippings recognizes and knows how to handle each entry in the My Clippings file.

Using an Existing Pattern

If you are using a built-in pattern, updating the basic settings should be enough.  If you continue to have issues, there may be small differences between the format your My Clippings file and the format expected by the import pattern.  Different versions of the kindle device generate slightly different My Clippings files and may require minor changes to the pattern.  The next section includes details on this process.

Customizing a Pattern

If none of the built-in patterns work for you, you probably need to make some additional language-related changes.

CUSTOMIZING THE DATE

If the My Clippings entries seem to be matching correctly, but dates are not showing up, you probably need to adjust the date pattern.

  • First, verify that you have set the “Date Interpreter” (in the previous section) to the correct language.
  • Next, confirm that you are using a Qt pattern.  DaleyKlippings localizes dates using a Qt library so it will not work correctly with a Python pattern.
    • If your pattern includes a two-number year (e.g. ’05’ for ‘2005’), Qt will not correctly match years after 1999 (as discussed in our Qt pattern section).  We don’t currently have a solution for these users.  Please contact us and we’ll try to provide a work-around.
  • Finally, ensure that the pattern matches the date in your My Clippings file.  Qt date patterns are discussed in the Qt help.
    • If your date pattern includes helper words like ‘de’, pay particular attention to the section clarifying “All other input characters will be ignored. Any sequence of characters that are enclosed in singlequotes will be treated as text and not be used as an expression.”  As demonstrated in the sample Spanish version, the letter ‘d’ in the helper word ‘de’ would have matched a day number.  By enclosing the word in single quotes, we ensure that this does not occur.

CUSTOMIZING THE REGULAR EXPRESSION

If a sample Import Pattern does not exist for your language or lines are not importing using the sample pattern, you will need to make changes to the Import Pattern.  These patterns are regular expressions as discussed in the tutorial.  The key changes when translating an Import Pattern follow:

  • All of the phrases in the pattern need adjusted to match your local file.  Use the German and Spanish examples to see how this is done.
    • Spaces in the My Clippings text must be preceded by a ‘\’. Regex will ignore any/all spaces that are not preceded by a backslash when matching text.
    • The extra ‘Mi’ after the author in the Spanish example shows you how to add extra words when required.
    • In some places, several words are divided by the pipe character ‘|’.  Each word in that list are possible words in that location.  You MUST use this syntax for the (Highlight|Note|Bookmark) part of the pattern.  In other places (like the location), you can replace this list of options with a single translated word.
  • If your language does not use a hyphen for a date range, the (?P<Page>… and (?P<Location>… patterns will need updated.
    • By default, these sections match numbers (using the ‘\d’ pattern) and the hyphen (together ‘[\d-]’).  This will not work for languages that don’t use a hyphen because it doesn’t match the helper words OR the spaces that are used.
    • For the Portuguese ‘213 a 543’ it may be enough to add the space and the a to the matched pattern with the pattern ‘[\d-\ a]’.  Note that we want to explicitly include the space character so we precede it with a backslash.
    • For a language with a more complicated connector, however, this approach could backfire.  If the text around a range is “Location 213 a 543 and”, the pattern ‘[\d-\a]’ will actually match the space after the range PLUS the first letter of the word “and”. This could result in an empty range or could cause the pattern to fail to match specific rows.
  • The (?<Date>… section may need to be updated.  This is one of the most difficult parts of the pattern.
    • Most patterns match any character (denoted explicitly by ‘.*’ and implicitly by ‘[^\r\n]*’) only stopping when they find a pattern indicating a date end. These patterns will be most forgiving to different dates because they will automatically match helper words like the Spanish ‘de’.
    • Some of the sample English patterns use the end line character to find the end of the date. The block [^\r\n]* means “include any character until you hit \r or \n” where ‘\r’ and ‘\n’ are special codes for the end of a line.
    • Other patterns have used part of the date like (AM|PM) or the GMT value to determine the end of the date.

If you manage to get rows to import but continue to have issues with the date, please refer back to the Date Pattern section.