Reverse-engineering the KaraFun file format. Part 3, the Song.ini file

This is quite simple. We look at the song.ini file and it is obvious immediately where the text and the timing information is as those are the only lines with enough numbers.

Text2=Ngài biết rõ
Text3=những nhu cầu
Text4=của đời sống tôi

As we see, the timing is stored separately from the text, and we need to find out the way to merge them. Let’s calculate how many of Sync numbers are there total:

> grep -a ^Sync Song.ini | sed -e 's/,/\n/g' | wc -l

So we have 217 timing marks and only 79 text lines. So obviously more than one timing mark applies to each text line, which is actually reasonable. Let’s assume each timing mark applies to a word. For this we need to calculate the total number of words in the Text fields:

grep -ae '^Text[0-9]*= Song.ini | sed -e 's/[ \/]/\n/g' | wc -l

Close, but not exactly. The numbers do not match, so obviously some text fields do not use the timings. Looking at the dump above we see the “Text1=” empty line. Does it make sense to have a timing mark for an empty line? Not really. Let’s remove them from the calculation:

grep -aE '^Text[0-9]*=\w' Song.ini | sed -e 's/ /\n/g'| wc -l

Almost here. Let’s convert the Song.ini to an LRC file and play it to check if it is valid. One remaining issue is to guess how the time is encoded. This is however quite easy – the largest time value is 26736 so it is clearly in tens of milliseconds (i.e. divide by 100 to get the seconds). Any other divider provides a very unreasonable value, so it is easy to guess.

Here’s the converter script written in Perl:


use warnings;
use strict;

die "Usage: $0 <file>\n" if !defined $ARGV[0];

open F, "<", $ARGV[0] or die "Couldn't open $ARGV[0]: $!\n";
binmode F, ":utf8";
my @content = <F>;
close F;

# Get the sync info
my (@syncs, @text);

foreach my $line ( @content )
	# CRLFs
	$line =~ s/\r/\n/;
	$line =~ s/\n+/\n/;

	# Add the sync markers into the sync array
	push @syncs, split( /,/, $1 ) if $line =~ /^Sync\d+=(.*)$/;

	if ( $line =~ /^Text\d+=(\w+.*)$/ )
		push @text, split( /[ \/]/, $1 );
		push @text, ""; # end of line

# Print a fake LRC header
binmode STDOUT, ":utf8";
print "[ti: test]\n[ar: test]\n";
my $last_time;

foreach my $word ( @text )
	if ( $word eq "" )
		print "[$last_time]\n";

	# Convert the time to
	my $time = shift @syncs;
	my $min = int ( $time / 6000 );
	my $sec = int ( ($time - ($min * 6000)) / 100 );
	my $msec = int ( $time - ($min * 6000 + $sec * 100) );
	$last_time = "$min:$sec.$msec";
	print "[$last_time]$word ";

We test it, and voila – everything works fine. We have reverse-engineered the format, and we can integrate it into the player!

Some files however are encrypted. How to deal with encryption? See part 4!

This entry was posted in android, reverse engineering.

One Response to Reverse-engineering the KaraFun file format. Part 3, the Song.ini file

  1. MadLord says:

    Hi, i found something strange…
    I have a KFN file with a 3:46 track.I exported a song.ini file from it.
    It has 72 lines of text (no empty lines). If we consider all the markers (spaces and slashes), then they are about 430.
    And in the Song lines, there are more than 500 timing marks. Moreover, there are timing marks that go beyond the track length (4:10 and more).
    Have not met with this?

Leave a Reply

Your email address will not be published. Required fields are marked *

Warning: Use of undefined constant XML - assumed 'XML' (this will throw an Error in a future version of PHP) in /home/ulduzs/public_html/wp-content/plugins/wp-syntaxhighlighter/wp-syntaxhighlighter.php on line 1048