Reverse-engineering the EMZ karaoke format, or watch out the API calls

Worked on adding support for the EMZ karaoke format to the Karaoke Player application, and would like to share another good reverse-engineering technique.

EMZ is a Karaoke format similar to the old Karafun, based on a password-protected ZIP archive. Unlike Karafun, the password is not embedded into the archive, but is derived from a some kind of “login”. The editor is able to create the password-protected files, and can open them without Internet connection, so the password derivative algorithm is built into the software, and is not stored on the remote server. This is a classic example of “security through obscurity”, and, as usual it does not work well.

Normally finding out the algorithm would require reverse-engineering the Editor binary. It is written in Deplhi, so this would be less than a pleasant way to spend the weekend. But due to Delphi’s heavy usage of Win32 API, this is not needed at all! Leave your IDA and Ollydbg on a shelf, and see how easy this could be done.

Enter wine. No, not the drinkable wine – but the open source not-an-emulator of Microsoft Windows on Linux. Its primary purpose is to run Windows applications on non-Windows platforms such as Linux. But few people know that it also contains great tracing capabilities, which far exceed IDA and Ollydbg.

If you have never used WINEDEBUG, I suggest start reading from this Wiki entry. An important switch is +relay which logs all the Win32 API calls together with the arguments and return values. So the log would be quite large, but also very complete. Even more important is that Wine logs the return address for the call, which allows you, if necessary, attach the debugger to a running application and quickly breakpoint at the specific address – no matter how obfuscated/packed/encrypted is the binary!

So let’s run the binary with the debug logging enabled:

WINEDEBUG=+relay,+snoop wine KaraokeEditor.exe 2> log

then make a new lyrics, and save them in EMZ format using the login “qqww”. The result is, as expected, a ZIP archive which is password-protected. And the login itself is not a password (this would make protection extremely weak).

Now let’s take a look at what kind of information gives us:

0009:Call KERNEL32.WideCharToMultiByte(0000fde9,00000000,00a62f0c L"Tahoma",00000006,0033ecc9,000000ff,00000000,00000000) ret=0040d388
0009:Ret KERNEL32.WideCharToMultiByte() retval=00000006 ret=0040d388

Here you can see the function called, its arguments (including dereferenced strings where necessary), and the return address at the end. Now let’s scroll down to where our file is being saved:

0009:Call KERNEL32.CreateFileW(00aab53c L"C:\\dump\\test.emz",c0000000,00000000,00000000,00000002,00000080,00000000) ret=00423226
0009:Ret  KERNEL32.CreateFileW() retval=000000e4 ret=00423226

and look just a little bit up:

0009:Call KERNEL32.MultiByteToWideChar(000004e4,00000000,00aab1cc "505fefb6ac1925225879a3cdedae7f15",00000020,0033e950,000007ff) ret=0040d3a8

Why is it here, right before the encrypted file is written? Is it possible it is an archive password? Testing it – and yes, it is. And it looks like an MD5 checksum. However it is not MD5 of a string ‘qqww’, as we can quickly check:

echo -n ‘qqww’ | md5sum –
587b5c5a7899d8652cf24850a60e73e8 –

This does get us a little bit closer, but not enough – we need to find out how the archive password is derived from the login. So let’s look above in the log for the string ‘qqww’, maybe we find more clues there? And yes, it is.

0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00a648ac L"qqww",00000004,00a6af1c,00000004,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000004 ret=0040d388
0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00a63b8c L"QQWW",00000004,00000000,00000000,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000004 ret=0040d388
0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00a6adfc L"50",00000002,00000000,00000000,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000002 ret=0040d388
0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00aab1cc L"505F",00000004,00000000,00000000,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000004 ret=0040d388
0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00aab1cc L"505F",00000004,00a6ae5c,00000004,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000004 ret=0040d388
0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00a6456c L"505FEF",00000006,00a6ae5c,00000006,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000006 ret=0040d388
0009:Call KERNEL32.WideCharToMultiByte(000004e4,00000000,00a8df0c L"505FEFB6",00000008,00000000,00000000,00000000,00000000) ret=0040d388
0009:Ret  KERNEL32.WideCharToMultiByte() retval=00000008 ret=0040d388

It seems to be pretty obvious what happens here – the string is uppercased before the MD5 is calculated. And indeed the checksum matches:

echo -n ‘QQWW’ | md5sum –
505fefb6ac1925225879a3cdedae7f15  –

So to open the EMZ archives, the Karaoke player which wants to support this format needs to:

  1. Ask the user for a login the original song was encoded (it is required), for example “qqww”;
  2. Uppercase the login, making it “QQWW”;
  3. Calculate the MD5 checksum of this string, store it in hex and lowercase it;
  4. Use this string as a password to zip_open or whatever API you use to access the archives.

As a side note, this is also a poor security design because it allows to brute force the archive much easier – while a “32-character password” sounds impressive, this password only uses 16 possible characters. And this brute force would allow access to every single song in one’s collection, so it would certainly make sense.

Lessons learned:

  1. When you are using a high-level language, consider the amount of information exposed to the API while processing sensitive information. In this case the amount exposed is clearly excessive, so most likely the original source code could be rewritten to limit or eliminate this exposure (there certainly should be no need to convert MD5 hex characters to/from multibyte one by one!)
  2. Password derivation algorithms are there for a reason; use them instead of MD5. The “derivation” in this case is clearly inadequate.
  3. Some time reverse engineers just get lucky, as what otherwise would take a few days’ work took less than fifteen minutes.
This entry was posted in reverse engineering, Uncategorized.

2 Responses to Reverse-engineering the EMZ karaoke format, or watch out the API calls

  1. Eleanor Rosevelt says:

    I just discovered Karlyriceditor 15 minutes ago, and the ONLY reason it took me so long to install was due to me reading through the Q/A comments others had posted. (Im a “read the manual first/avoid things not working” kind of person).

    I had everything installed and Karlyriceditor running in 60 seconds!
    (ps: The Ubuntu Software Center has this program available for those whom do not wish to compile from source. Go to Software Center/Karlyriceditor / Install . Also, after installation it can be found in your applications. If you are unable to find this, simply open terminal, type Karlyriceditor and the program will instantly open!

    As of reading this page I have become a fan. Thank you for your hard work! Looking at the versions which have come out, I see that you have spent years perfecting this program!

    I personally first created custom CD+G as early as 2005. The software was buggy, and the process was complicated. (especially if you didn’t feel like paying for “payware”).

    My original move from the DOS (barely)operating system(s) have proven to be a sound one, and programs like Karlyriceditor reinforce that decision! (Im creating my first CD+G now as Im typing this haha! So far so GREAT!!! 🙂 )

    Sorry this was so long, but alot of people simply do not understand the devotion developers have, especially ones whom do NOT receive and financial fruit for their labor.

    For newcomers to linux, please understand that developers are like marijuana farmers: They spent a TREMENDOUS amount of time reading and researching code (similar to studying the anatomy of plants, how they interact with their environment, biology, microbiology, chemistry, and the great ones who study physics/anything else that can make their end product AMAZING) just to learn HOW things work… BEFORE they EVIN BEGIN authoring the first line of their own code (harvest their crop/trim it/cure it/etc). THEN they spend an EVEN MORE TREMENDOUS AMOUNTS OF TIME reading, researching, testing, testing, and oh yeah, testing some more before publishing their work (as the farmer continues striving to perfect their craft/end product).

    Newcomers to the linux scene, please understand that when a marijuana farmer has dedicated YEARS of their lives perfecting a consumable product, there IS an inherent level of frustration when they are asked questions from common street thugs such as “you got some dro/loud/fire/kush?” having no idea what the difference between crap marijuana without seeds it vs high quality vs the manner in which it was grown vs a million other variables. …newcomers to linux are akin to these very street level thugs, uninformed and most of the time unknowingly making really stupid inquiries that a mere search engine query could answer; if you do NOT understand basics like APT or GEDIT or MAKE, the developer is the tired farmer who is too busy to answer questions like HOW DOES CHLOROPHYL WORK? WHY DO PLANTS NEED SUNSHINE? HOW DO I MAKE A BOWL? DO YOU KNOW HOW TO PASS A DRUG TEST? Srsly you may as well go ask Stephen Hawking to teach you how to multiply. Because he has NOTHING better to do, right?
    Software developers are not compensated financially. More often than not they are DRIVEN by a passion. Most of the time THEY ARE willing to help people figure out even the most basic questions (HOW DO I KNOW IF MY COMPUTER DOES NOT TURN ON?)
    because they DO believe in the spirit of open source, realize that they once has stupid questions of their own and WANT people to enjoy the software they create.

    In conclusion, this software works AWESOME and everyone needs to show some gratitude to the purely genius developer who created this program, the very one who has at least 150+ more IQ points than all of us combined.

    Once again, thank you and I hope this message DOES show that there ARE people in the open source community WHO DO appreciate the level of dedication and selflessness
    that software developers have! Keep up the great work! 🙂

    • Eleanor Rosevelt says:

      EDIT: When I said that “the software was buggy … in 2005” I wanted to make it clear that I was NOT refering to Karlyriceditor; the program I was using was called something like SUPERSPECTACULARKaraokeCDGAMAZINGAWESOME+G and was created for the DOS operating system.

      In sharp contrast, Karlyriceditor IS NOT BUGGY and actually DOES WORK and WORKS GREAT!!!
      ok, end edit

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.