Reverse-engineering the KaraFun file format. Part 2, the directory

In the first part we found out the header format, and that it does not provide us with the directory location. However we know there must be a directory, as the KaraFun application must know where exactly in a file the files are stored, and how large are they. At minimum there should be the directory offset and either the total size or the number of files. At the first thought the DIFW header value may contain the number of files, and the MUSL value contains the directory offset (its value is 0x11D which is after 0×117). However if we check other KaraFun files at the same page, we would see that for some files the MUSL value is less than header length. Therefore it cannot be the offset, and probably is the music length in seconds. Nor DIFW is the number of files. A quick search for the JPEG signature “JFIF” finds out at least three JPG files, so there are more than two files in this archive.

So where it is the directory? Since the header length varies (because it uses the strings with variable length), it could be in one of two places. Either it is at the end of the file (not the case as we saw above), or it is supposed to follow the header directly. Let’s look carefully at the bytes following the header:

 

00000110 XX XX XX XX XX XX XX 09 00 00 00 1b 00 00 00 30 |...............0|
00000120 39 2d 43 68 75 61 20 62 69 65 74 20 72 6f 2d 68 |9-Chua biet ro-h|
00000130 61 74 2d 6d 69 78 2e 6d 70 33 02 00 00 00 bc ac |at-mix.mp3......|
00000140 45 00 00 00 00 00 bc ac 45 00 00 00 00 00 18 00 |E.......E.......|
00000150 00 00 30 39 2d 20 43 68 75 61 20 62 69 65 74 20 |..09- Chua biet |
00000160 72 6f 2d 6d 69 78 2e 6d 70 33 02 00 00 00 5e ae |ro-mix.mp3....^.|
00000170 45 00 bc ac 45 00 5e ae 45 00 00 00 00 00 10 00 |E...E.^.E.......|
00000180 00 00 4e 67 61 69 20 62 69 65 74 20 72 6f 2e 6a |..Ngai biet ro.j|
00000190 70 67 03 00 00 00 01 66 01 00 1a 5b 8b 00 01 66 |pg.....f...[...f|
000001a0 01 00 00 00 00 00 08 00 00 00 6e 62 72 31 2e 6a |..........nbr1.j|
000001b0 70 67 03 00 00 00 71 50 01 00 1b c1 8c 00 71 50 |pg....qP......qP|
000001c0 01 00 00 00 00 00 08 00 00 00 6e 62 72 32 2e 6a |..........nbr2.j|
000001d0 70 67 03 00 00 00 b2 bd 00 00 8c 11 8e 00 b2 bd |pg..............|
000001e0 00 00 00 00 00 00 08 00 00 00 6e 62 72 33 2e 6a |..........nbr3.j|
000001f0 70 67 03 00 00 00 6c cb 00 00 3e cf 8e 00 6c cb |pg....l...>...l.|
00000200 00 00 00 00 00 00 08 00 00 00 6e 62 72 34 2e 6a |..........nbr4.j|
00000210 70 67 03 00 00 00 26 a1 01 00 aa 9a 8f 00 26 a1 |pg....&.......&.|
00000220 01 00 00 00 00 00 08 00 00 00 6e 62 72 35 2e 6a |..........nbr5.j|
00000230 70 67 03 00 00 00 88 92 00 00 d0 3b 91 00 88 92 |pg.........;....|
00000240 00 00 00 00 00 00 08 00 00 00 53 6f 6e 67 2e 69 |..........Song.i|
00000250 6e 69 01 00 00 00 f3 12 00 00 58 ce 91 00 f3 12 |ni........X.....|
00000260 00 00 00 00 00 00 ff fa 92 60 9e d6 00 00 02 45 |.........`.....E|
                           ^^ ^^ MP3 file start

This is the complete directory since at the offset 0x266 the MP3 file starts with its FF FA signature . Looking at the beginning we see the following at the offsets starting at 0x117:

Offset 0x00: the value is 0x00000009 - not sure what it is. Cannot be lenght or offset; some information about the file?
Offset 0x04: 0x0000001B followed by the file name. Considering our experience with the string storage in the header, we assume this is the string length. The length of "09-Chua biet ro-hat-mix.mp3" is 27, which is 1B.
Offset 0x23: 0x00000002 - also makes no sense, cannot be length. Leave it as of now.
Offset 0x27: 0x0045Ae5e - looks like the file length, Let's verify it. We know the MP3 starts at 0x266, so it should end at 45B0C4. Let's look there:

0045b0c0 20 71 91 8c ff fa 92 60 75 c7 73 00 01 2c 02 c7 | q.....`u.s..,..|

Yes! Not only the MP3 ends there, but there is another MP3 starting right there. Which makes sense because the second filename in the directory is also an MP3 file. Therefore we can be sure this is length.

Offset 0x2B: 0x00000000 - another set of flags? leave it as of now
Offset 0x2F: 0x0045Ae5e - another length. This is not good as it means the file may be packed, in which case one of the lengths is the unpacked length and another one is packed length. This file seem to be fine though.
Offset 0x33: 0x00000000 - unclear as of now. May be start of the next file information (like 0x00000009 above).

And then it follows up with 0x00000018 and another file name, meaning the information for out file ends here. We need to find out where the information about the first file ends and the information about the second file starts. For this we look at the Song.ini information to see how much information follows the file name:

At 0x0252: 0x00000001
At 0x0256: 0x000012F3 (we know this is file length)
At 0x025A: 0x0091CE58 - looks like an offset. However if we seek there, the Song.ini doesn't start at this offset. Now if we look at the same field for the first MP3 file, the value is 0. But obviously the MP3 file is not stored at the offset 0, it starts from 0x266. So we can assume the file offsets starts from where the directory ends, which is in our case 0x266. Adding At 0x266 + 0x0091CE58 we 're getting 0x0091D0BE and it is obvious from the hex dump we're correct:

 0091d0a0 a2 e3 0f 81 fc fd 9f f0 a2 ba bf 03 f8 1e e7 c2 |................|
 0091d0b0 d7 d1 fe ec f9 19 1d be ed 15 2c 67 ff d9 5b 47 |..........,g..[G|
 0091d0c0 65 6e 65 72 61 6c 5d 0d 0a 54 69 74 6c 65 3d 43 |eneral]..Title=C|

0x25E: 0x000012F3 again, this is the length, either packed or unpacked, we don’t care as of now
0×262: 0×00000000 – some kind of information. This is the last field.

So we can see that the each file name is followed up by exactly 20 bytes. Go back to the first file entry and count 20 bytes. You end up at the length of the second file name. Therefore the directory has the following structure:

4 bytes string length followed by the string
4 bytes some information (file type)
4 bytes file length, we call it length1
4 bytes offset (starts from the end of the directory)
4 bytes another file lenght (lenght2)
4 bytes some other information (file flags)

All the entries match this structure except the first one which starts with 0×00000009 followed by the lenght. Now, could this be the total number of files? After all, the directory header ends without any end header mark so we need to know how many files are there. And indeed if we count the file names, there are 9 of them. So the first 4 bytes in the directory specify the number of files in the archive.

So let’s modify the dumper so it prints the information about the directory:

import java.io.File;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.charset.Charset;

class KFNDumper
{
	public boolean parse( String fontFilename )
	{
		try
		{
			m_file = new RandomAccessFile( fontFilename, "r" );

			// Read the file signature
			String signature = new String( readBytes(4) );

			if ( !signature.equals("KFNB") )
				return false;

			// Parse the header fields
			while ( true )
			{
				signature = new String( readBytes(4) );
				int type = readByte();
				int len_or_value = readDword();

				switch ( type )
				{
					case 1:
						break;

					case 2:
						byte[] buf = readBytes( len_or_value );
						break;
				}

				if ( signature.equals("ENDH") )
					break;
			}

			// Read the number of files in the directory
			int numFiles = readDword();

			// Parse the directory
			for ( int i = 0; i < numFiles; i++ )
			{
				int filenameLen = readDword();
				byte[] filename = readBytes( filenameLen );
				int file_type = readDword();
				int file_length1 = readDword();
				int file_offset = readDword();
				int file_length2 = readDword();
				int file_flags = readDword();

				System.out.println( "File " + Charset.forName( "UTF-8" ).decode( ByteBuffer.wrap( filename ) ).toString() 
                                                   + ", type: " + file_type + ", length1: " + file_length1 + ", length2: " 
                                                   + file_length2 + ", offset: " + file_offset + ", flags: " + file_flags  );
			}

			System.out.println( "Directory ends at offset " + m_file.getFilePointer() );
			return true;
		}
		catch (IOException e)
		{
			// Most likely a corrupted font file
			return false;
		}
	}

	// KFN file; must be seekable
	private RandomAccessFile m_file = null;

	// Helper I/O functions
	private int readByte() throws IOException
	{
		return m_file.read() & 0xFF;
	}

	private int readWord() throws IOException
	{
		int b1 = readByte();
		int b2 = readByte();

		return b2 << 8 | b1;
	}

	private int readDword() throws IOException
	{
		int b1 = readByte();
		int b2 = readByte();
		int b3 = readByte();
		int b4 = readByte();

		return b4 << 24 | b3 << 16 | b2 << 8 | b1;
	}

	private byte [] readBytes( int length ) throws IOException
	{
		byte [] array = new byte [ length ];

		if ( m_file.read( array ) != length )
			throw new IOException();

		return array;
	}

	private String dumpHex( byte [] array )
	{
		String out = "";

		for ( int i = 0; i < array.length; i++ )
		{
			if ( i > 0 )
				out += " ";

			out += String.format("%02X", array[i] & 0xFF);
		}

		return out;
	}

	private String readUtf8String( int length ) throws IOException
	{
		// Allocate the buffer and read into it
		byte[] buf = readBytes( length );

		// And decode the UTF-8 string
		return Charset.forName( "UTF-8" ).decode( ByteBuffer.wrap( buf ) ).toString();
	}

	private String readUtf8String() throws IOException
	{
		// First four bytes define the length
		return readUtf8String( readDword() );
	}

    public static void main( String [] args ) throws Exception
    {
		if ( args.length == 0 )
		{
			System.out.println( "Usage: app <KFN file>\n" );
			return;
		}

		KFNDumper dumper = new KFNDumper();
		dumper.parse( args[0] );
    }
}

Compile it and run, and we get the following output:

 File 09-Chua biet ro-hat-mix.mp3, type: 2, length1: 4566204, length2: 4566204, offset: 0, flags: 0
 File 09- Chua biet ro-mix.mp3, type: 2, length1: 4566622, length2: 4566622, offset: 4566204, flags: 0
 File Ngai biet ro.jpg, type: 3, length1: 91649, length2: 91649, offset: 9132826, flags: 0
 File nbr1.jpg, type: 3, length1: 86129, length2: 86129, offset: 9224475, flags: 0
 File nbr2.jpg, type: 3, length1: 48562, length2: 48562, offset: 9310604, flags: 0
 File nbr3.jpg, type: 3, length1: 52076, length2: 52076, offset: 9359166, flags: 0
 File nbr4.jpg, type: 3, length1: 106790, length2: 106790, offset: 9411242, flags: 0
 File nbr5.jpg, type: 3, length1: 37512, length2: 37512, offset: 9518032, flags: 0
 File Song.ini, type: 1, length1: 4851, length2: 4851, offset: 9555544, flags: 0
 Directory ends at offset 614

It is clear now that the first file information double word is the file type, with the value 2 meaning the music file, the value 3 meaning the image file, and the value 1 meaning the song file. Looking at other KFN files we can also find out that the value 4 represents a font file, and the value 5 represents a video file. Also the file name is stored in the native file system encoding and not in UTF-8 which makes it impossible to recover the original filename without knowing (or guessing) the original encoding. However since we have the file type, we don’t need the file name for the player.

Let’s modify our dumper so it can extract the files as well:

import java.io.File;
import java.io.IOException;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;

class KFNDumper
{
	public static final int TYPE_SONGTEXT = 1;
	public static final int TYPE_MUSIC = 2;
	public static final int TYPE_IMAGE = 3;
	public static final int TYPE_FONT = 4;
	public static final int TYPE_VIDEO = 5;

	// KFN file; must be seekable
	private RandomAccessFile m_file = null;

	class Entry
	{
		public int type;
		public String filename;
		public int length1;
		public int length2;
		public int offset;
		public int flags;
	};

	public KFNDumper( String fontFilename ) throws IOException
	{
		m_file = new RandomAccessFile( fontFilename, "r" );
	}

	public List<Entry> list() throws IOException
	{
		List<Entry> files = new ArrayList<Entry> ();

		// Read the file signature
		String signature = new String( readBytes(4) );

		if ( !signature.equals("KFNB") )
			return new ArrayList<Entry> ();

		// Parse the header fields
		while ( true )
		{
			signature = new String( readBytes(4) );
			int type = readByte();
			int len_or_value = readDword();

			switch ( type )
			{
				case 1:
					break;

				case 2:
					byte[] buf = readBytes( len_or_value );
					break;
			}

			if ( signature.equals("ENDH") )
				break;
		}

		// Read the number of files in the directory
		int numFiles = readDword();

		// Parse the directory
		for ( int i = 0; i < numFiles; i++ )
		{
			Entry entry = new Entry();

			int filenameLen = readDword();
			byte[] filename = readBytes( filenameLen );

			// This is definitely not correct as the native encoding is used, but that's the best we can come out with
			entry.filename = Charset.forName( "UTF-8" ).decode( ByteBuffer.wrap( filename ) ).toString();

			entry.type = readDword();
			entry.length1 = readDword();
			entry.offset = readDword();
			entry.length2 = readDword();
			entry.flags = readDword();

			files.add( entry );
		}

		// Since all the offsets are based on the end of directory, readjust them
		for ( int i = 0; i < files.size(); i++ )
			files.get(i).offset += m_file.getFilePointer();

		return files;
	}

	public void extract( final Entry entry, String outfilename ) throws IOException
	{
		// Seek to the file beginning
		m_file.seek( entry.offset );

		// Create the output file
		FileOutputStream output = new FileOutputStream( outfilename );

		byte[] buffer = new byte[8192];
		int totalRead = 0;

		while ( totalRead < entry.length1 )
		{
			int toRead = buffer.length;

			if ( toRead > entry.length1 - totalRead )
				toRead = entry.length1 - totalRead;

			int bytesRead = m_file.read( buffer, 0, toRead );
			output.write( buffer, 0, bytesRead );
			totalRead += bytesRead;
		}

		output.close();
	}

	// Helper I/O functions
	private int readByte() throws IOException
	{
		return m_file.read() & 0xFF;
	}

	private int readDword() throws IOException
	{
		int b1 = readByte();
		int b2 = readByte();
		int b3 = readByte();
		int b4 = readByte();

		return b4 << 24 | b3 << 16 | b2 << 8 | b1;
	}

	private byte [] readBytes( int length ) throws IOException
	{
		byte [] array = new byte [ length ];

		if ( m_file.read( array ) != length )
			throw new IOException();

		return array;
	}

    public static void main( String [] args ) throws Exception
    {
		if ( args.length == 0 )
		{
			System.out.println( "Usage: app <KFN file>\n" );
			return;
		}

		KFNDumper kfnfile = new KFNDumper( args[0] );
		List<Entry> entries = kfnfile.list();

		for ( Entry entry : entries )
		{
			System.out.println( "File " + entry.filename + ", type: " + entry.type + ", length1: " 
                                           + entry.length1 + ", length2: " + entry.length2 + ", offset: " 
                                           + entry.offset + ", flags: " + entry.flags  );
			kfnfile.extract( entry, entry.filename );
		}
    }
}

And once we run it, all the files are extracted and we can see them. Now we need to find out how the lyrics are encoded and timed so we can play it. Let’s take a look at the Song.ini file, which is covered in part 3.

This entry was posted in android, reverse engineering.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>