Archive for the 'rosetta' Category

Nasty printing bug fixed

Thursday, January 4th, 2007

Sometimes printing would crash Find It! Keep It!… The problem was the “sometimes”. It happened extremely rarely on my Mac Mini, but often on my Intel Macbook. Clearly Rosetta was to blame!

Well, no. Finally someone other than me experienced it so I decided to dig in deeply.

The first problem is that debugging a PPC process under Rosetta is an order of magnitude slower. Having written a debugger and having been fascinated by emulators for a long time (ZX Spectrum emulator on the Atari ST, UAE’s 68000 emulator, Bochs, and Awesim), I know why it’s slow: when debugging you have to keep track of state such as the instruction pointer you can simplify away when running at full speed. So the first task was to find a webpage that would crash regularly on PPC.

As luck would have it, displaying a big database with my new theme would cause the crash every second time you tried to print it. My first guess was memory being freed that shouldn’t be…

WebLibrarian(216,0x1b37e00) malloc: *** Deallocation of a pointer not malloced: 0x3927e70; This could be a double free(), or free() called with the middle of an allocated block; Try setting environment variable MallocHelp to see tools to help debug

Look at that! Clearly memory allocation failing!

Well, no. I had noticed that a separate thread was being started by the printing process, but the pieces only fell into place when I saw:

(gdb) info threads
* 12 process 216 thread 0x802f 0x95eb4e88 in khtml::main_thread_malloc ()
10 process 216 thread 0x6807 0x9001f08c in select ()
8 process 216 thread 0x7103 0x9000ab48 in mach_msg_trap ()
7 process 216 thread 0x6903 0x9000ab48 in mach_msg_trap ()
6 process 216 thread 0x450f 0x90049ea8 in syscall_thread_switch ()
4 process 216 thread 0x3603 0x9000ab48 in mach_msg_trap ()
2 process 216 thread 0x2303 0x9000ab48 in mach_msg_trap ()
1 process 216 local thread 0xf03 0x95eb4e88 in khtml::main_thread_malloc ()

Main thread malloc probably should not be called in two threads. Indeed, googling on it found Added assertions to ensure that main_thread_malloc and friends are only called on the main thread.

Let’s look at the two threads. The printing thread looks like this:

#0 0x95eb4e88 in khtml::main_thread_malloc ()
#1 0x95c8000c in KWQListImpl::insert ()
#2 0x95cfbb5c in khtml::RenderBlock::insertFloatingObject ()
#3 0x95cfb520 in khtml::RenderBlock::skipWhitespace ()
#4 0x95cf9d4c in khtml::RenderBlock::findNextLineBreak ()
#5 0x95cf8cec in khtml::RenderBlock::layoutInlineChildren ()
#6 0x95cf54d4 in khtml::RenderBlock::layoutBlock ()
#7 0x95cf886c in khtml::RenderBlock::layoutInlineChildren ()
#8 0x95cf54d4 in khtml::RenderBlock::layoutBlock ()
#9 0x95cf886c in khtml::RenderBlock::layoutInlineChildren ()
#10 0x95cf54d4 in khtml::RenderBlock::layoutBlock ()
#11 0x95cf886c in khtml::RenderBlock::layoutInlineChildren ()
#12 0x95cf54d4 in khtml::RenderBlock::layoutBlock ()
#13 0x95cf886c in khtml::RenderBlock::layoutInlineChildren ()
#14 0x95cf54d4 in khtml::RenderBlock::layoutBlock ()
#15 0x95cf886c in khtml::RenderBlock::layoutInlineChildren ()
#16 0x95cf54d4 in khtml::RenderBlock::layoutBlock ()
#17 0x95cf5fac in khtml::RenderBlock::layoutBlockChildren ()
#18 0x95cf54ec in khtml::RenderBlock::layoutBlock ()
#19 0x95cf5fac in khtml::RenderBlock::layoutBlockChildren ()
#20 0x95cf54ec in khtml::RenderBlock::layoutBlock ()
#21 0x95cf224c in khtml::RenderCanvas::layout ()
#22 0x95cf1b04 in KHTMLView::layout ()
#23 0x95dd3750 in KWQKHTMLPart::forceLayoutWithPageWidthRange ()
#24 0x95de84a4 in -[WebCoreBridge forceLayoutWithMinimumPageWidth:maximumPageWidth:adjustingViewSize:] ()
#25 0x95adc92c in -[WebHTMLView layoutToMinimumPageWidth:maximumPageWidth:adjustingViewSize:] ()
#26 0x95b103dc in -[WebHTMLView _setPrinting:minimumPageWidth:maximumPageWidth:adjustViewSize:] ()
#27 0x95b107ec in -[WebHTMLView knowsPageRange:] ()
#28 0x9392df4c in -[NSView(NSPrinting) _knowsPagesFirst:last:] ()
#29 0x9392dc68 in -[NSView(NSPrinting) _setUpOperation:helpedBy:] ()
#30 0x9392d7b8 in -[NSView(NSPrinting) _realPrintPSCode:helpedBy:] ()
#31 0x9392d6f4 in -[NSConcretePrintOperation _doActualViewPrinting] ()
#32 0x9392d520 in -[NSConcretePrintOperation _continueModalOperationToTheEnd:] ()
#33 0x92961194 in forkThreadForFunction ()
#34 0x9002b508 in _pthread_body ()

There are other variations, but basically the printing thread renders the window differently to print it. Clearly you have to do that if CSS has different settings for display and print media. In fact you always do it.

So what’s happening in the main thread?

#0 0x95eb4e88 in khtml::main_thread_malloc ()
#1 0x95c8000c in KWQListImpl::insert ()
#2 0x95d21f1c in DOM::NodeImpl::dispatchGenericEvent ()
#3 0x95d21d0c in DOM::NodeImpl::dispatchEvent ()
#4 0x95d2682c in KHTMLView::dispatchMouseEvent ()
#5 0x95d24294 in KHTMLView::viewportMouseMoveEvent ()
#6 0x95d23994 in KWQKHTMLPart::mouseMoved ()
#7 0x95adb2dc in -[WebHTMLView(WebPrivate) _updateMouseoverWithEvent:] ()
#8 0x95adb040 in -[WebHTMLView(WebPrivate) _updateMouseoverWithFakeEvent] ()
#9 0x9296bbf8 in __NSFireDelayedPerform ()
#10 0x907f0550 in __CFRunLoopDoTimer ()
#11 0x907dcec8 in __CFRunLoopRun ()
#12 0x907dc47c in CFRunLoopRunSpecific ()
#13 0x93208740 in RunCurrentEventLoopInMode ()
#14 0x93207d4c in ReceiveNextEventCommon ()
#15 0x93207c40 in BlockUntilNextEventMatchingListInMode ()
#16 0x93730ae4 in _DPSNextEvent ()
#17 0x937307a8 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#18 0x9372ccec in -[NSApplication run] ()
#19 0x9381d87c in NSApplicationMain ()
#20 0x0060910c in init_AppKit ()
...
#45 0x00006e00 in py2app_main (argc=-1870946304, argv=0x7204, envp=0x3986548) at src/main.c:952
#46 0x00007964 in main (argc=1, argv=0xbffffaec, envp=0xbffffaf4) at src/main.c:1007

Each time the mouse moves over the WebView, WebKit tells javascript about it (to enable popups, etc). So my main thread was pottering along as normal, blindly oblivious of the other thread.

Telling the printOperation NOT to be threaded should fix the bug, preventing non reentrant code from being reentered. So I changed

printOperation.setCanSpawnSeparateThread_(YES)

to

printOperation.setCanSpawnSeparateThread_(NO)

and suddenly printing no longer crashes. Not only that, but some other bugs went away too: sometimes printing would make the browser redisplay the website incorrectly, sometimes the printout would have massive gaps in it. All these suddenly were fixed.

Now another piece fell into place. I had noticed that the crashes happened more often on large complex pages than simple ones. The bigger a page, the longer it takes the printing thread to render it, and the more likely the two threads would interfere with each other. Furthermore, it happened more often on the Intel Mac… which is a dual core processor. If the threads were running concurrently on different cores, they would interfere much more often.

So where did I get that crazy idea of threading the print? From the apple developer mailing lists. It runs out that the code is also in Shiira and Adium.

For kicks, I tried Shiira on a large complex document which crashed my program regularly… and BOOM! This is apparently a known problem which is worse on dual core systems… The bug’s still in Shiira 2.0, so I’m trying to email Shiira’s author to tell him.

Sunday’s Beta feedback

Monday, November 27th, 2006

The beta has been out two days.

Spam Filters

Again it seems some spam filters are preventing people from receiving the download information… No one has been turned away from this beta, so if you requested it, please tell your spam filter that mail from ansemond.com is not spam. If you email me again, I’ll be happy to send you the information a second time.

Bugs

  • Input Managers, aka Plugins: Two crashes on launch were due to third party plugins
  • Cocoa: One bug seems to be that Cocoa gives me the wrong information (weird!)
  • Flash plugins disabled: I believe this is due to a misconfigured computer, but I need to dig into it further.
  • Rosetta: One report of Rosetta crashing on an Intel Mac, and thereafter Find It! Keep It! would not start again. To deal with this case I made a small program for Intel Mac owners that restarts Rosetta before starting Find It! Keep It!

Overall observations

Because people who are willing to run betas are people who try new things, they run plugins I’ve never even heard of, and have interestingly configured computers. Beta testing is trial by fire for the software being tested! :-)

People downloading the tool use a wide spread of browsers: 60% use Safari, 21% use Firefox (1.5 & 2.0), followed by OmniWeb, Camino, Opera and something called iGetter

A few people seem to be hoping that it will work on 10.3.x… I’m afraid it won’t.