Log for on 23rd February 2014:
Times are UTC Toggle Colours
02:27:16  *** LuHa has joined
02:27:16  *** ChanServ sets mode: +v LuHa
03:16:05  *** Supercheese has joined
03:16:05  *** ChanServ sets mode: +v Supercheese
07:34:51  *** Supercheese has quit IRC
09:37:53  *** Alberth has joined
09:37:53  *** ChanServ sets mode: +v Alberth
10:11:39  *** LuHa has quit IRC
11:21:48  *** frosch123 has joined
11:21:48  *** ChanServ sets mode: +v frosch123
11:46:40  * fonsinchen is just trying to reproduce FS#5867 on a different computer with a VM and a larger screen.
11:51:10  <frosch123> btw. have you tried starting with -v:win32:no_trheads ?
11:51:28  <frosch123> i am quite sure it is not causes by the threading, i.e. not by missing mutex calls
11:52:04  <frosch123> "-v win32:no_threads"
11:53:43  <fonsinchen> I'll try. But I'm also pretty convinced it's not some kind of data race.
11:54:48  * fonsinchen downloads zBase first and goes to get a coffee in the mean time
11:55:10  <frosch123> :p
12:04:38  <fonsinchen> I can't reproduce it on the VM :(
12:05:11  <frosch123> different win version?
12:05:17  <frosch123> 32bit/64bit?
12:06:50  <fonsinchen> Yes, the physical machine is 32bit XP, the virtual one is 64bit Windows 7.
12:09:16  <fonsinchen> Also it seems it runs into a deadlock on startup sometimes, but only when I'm trying to debug it. And the stupid debugger cannot properly interrupt it then.
12:09:27  <fonsinchen> Maybe the debugger is broken, after all.
12:09:36  <frosch123> no, i know that thing
12:09:44  <frosch123> i encountered that last week
12:10:03  <frosch123> PaintWindowThread does a very silly sleep(0) thingie
12:10:10  <frosch123> in the hope that the other thread runs in that time
12:10:41  <frosch123> that is very likely to fail with a debugger or on high cpu load
12:10:58  <frosch123> i.e. everything that messes with thread scheduling :)
12:12:25  <fonsinchen> oh my
12:13:50  <frosch123> so maybe if you increase that sleep time, it also works in the debugger :p
12:14:20  <fonsinchen> And it fails to reserve 512MiB of spritecache if I switch a couple of times.
12:14:38  <frosch123> memory fragmentation?
12:15:04  <frosch123> but, actually... it shouldn't free it iirc
12:15:39  <frosch123> oh, it does
12:15:57  <fonsinchen> It still has another GiB of free RAM. But it could be memory fragmentation
12:16:24  <fonsinchen> It automatically downgrades to 256M spritecache and becomes even slower than it was before.
12:16:31  <frosch123> with a 32bit build fragmentation is easy with 512 MiB :p
12:21:16  <fonsinchen> So where does it delete the sprite cache?
12:21:52  <frosch123> GfxInitSpriteCache
12:21:53  <fonsinchen> If it goes on to draw the mouse cursor afterwards we know what's going on.
12:22:26  <frosch123> resp. GfxInitSpriteMem
12:22:48  <frosch123> afterwards?
12:23:08  <frosch123> the problem should only exist if it tries to draw it _before_ the reset of the spritecache
12:23:31  <frosch123> if it reloads the sprite from disk and encodes it with the new blitter, it should be fine
12:23:44  <frosch123> unless the problem is with fetching the screen-backup
12:30:25  <fonsinchen> It does that from GameLoop()
12:30:34  <fonsinchen> It also does UpdateCursorSize() there
12:31:22  <fonsinchen> Can we be sure we have run the game loop at least once before trying to draw the cursor and can we be sure that it has actually updated the sprites?
12:32:09  <frosch123> "run the game loop at least once" is definitely not the case
12:33:40  <frosch123> it processes all messages from the queue, before entering the gameloop
12:34:02  <frosch123> so, maybe "toggle fullscreen" triggers two messages
12:34:18  <frosch123> something like create window -> resize window -> mouse enter window
12:41:46  <fonsinchen> ClientSizeChanged is called long before UpdateCursorSize
12:41:56  <fonsinchen> multiple times actually
12:45:25  <fonsinchen> Actually the debugger reports some access violation just at the place where I suspect the invalid sprite read. It doesn't break or crash, though.
12:45:47  <fonsinchen> So, at AfterBlitterChange we are sure the blitter expects 32bit sprites.
12:46:12  <fonsinchen> Then ClientSizeChanged is called twice, resulting in invalid reads
12:46:34  <fonsinchen> Then UpdateCursorSize is called, providing the updated cursor sprite
12:46:44  * fonsinchen goes on to validate that story
12:48:39  <michi_cc> Hmm, I remember having a patch for the Sleep(0) somewhere... :)
12:50:17  <michi_cc> Ah, here it is, if it works better I might commit it:
12:56:20  <fonsinchen> also, std::bad_alloc somewhere
12:56:31  <fonsinchen> I have to find out how to make the debugger break on that
12:56:56  <fonsinchen> That may be the 512MiB thing, though
13:01:02  <fonsinchen> Actually on windows 7, that invalid read triggers an exception which travels up the stack and *gasp* is caught somewhere. WTF??
13:31:11  <frosch123> <- i read that yesterday
13:31:21  <frosch123> win64 catches exceptions from the windowproc
13:31:25  <frosch123> win32 does not
13:32:47  <fonsinchen> The original suggestion of just leaving out that UpdateWindows seems valid now
13:33:08  <frosch123> does that match sdl?
13:33:22  <frosch123> i think the goal should be to make sdl and win32 make more similar
13:34:17  <frosch123> sdl_v seems to call that function only in one place
13:34:49  <fonsinchen> SDL doesn't call UpdateWindows there.
13:35:03  <fonsinchen> I don't know if it does other silly things, though.
13:37:03  <frosch123> sdl calls Blitter::PostResize, InitPalette, Blitter::PaletteAnimate, GameSizeChanged
13:37:21  <frosch123> all at the end after the whole sdl drawing surface thing has been set up
13:37:47  <frosch123> win32 has this weird split between MakeWindow and AllocateDibSection, which sdl does not have
13:46:08  <frosch123> hmm, so in total we have now:
13:46:18  <frosch123> 1) remove the call to UpdateWindows (matches sdl)
13:46:29  <fonsinchen>
13:46:30  <frosch123> 2) remove the call to DrawMouseCursor (matches sdl)
13:46:46  <frosch123> 3) always set _screen.dst_ptr immediately (matches sdl)
13:46:54  <fonsinchen> which call to DrawMouseCursor?
13:47:00  <frosch123> 4) add more mutexes in ChangeResolution and ToggleFullscreen (matches sdl)
13:47:41  <frosch123> my frist patch removes DrawMouseCursor from WM_MOUSEMOVE in WndProcGdi
13:47:51  <frosch123> which is the original position where it crashed
13:48:08  <frosch123> given your explanation that cursors should only be drawn by the gameloop, that removal is still valid
13:48:27  <fonsinchen> I have to test that again.
13:48:56  <fonsinchen> Why are you removing all the _screen.dst_pointer stuff?
13:49:06  <frosch123> i am moving it to a diferent place
13:49:27  <frosch123> it is assigned directly after freeing the old one, and allocating the new one
13:49:35  <fonsinchen> OK, let's see how that behaves on Windows 7
13:49:40  <frosch123> there is no point in delaying assigning an pointer, which is no longer valid
13:50:38  <frosch123> <- it is moved to AllocateDibSection
13:50:44  <frosch123> where also the other _screen members are set
13:52:01  <frosch123> calling UndrawMouseCursor otoh is fine, since ScreenSizeChanged resets _cursor.visible
13:57:29  <fonsinchen> BTW, that "return ret" in EditBoxLostFocus is wrong.
13:57:48  <frosch123> :)
13:57:54  <frosch123> trivial syntax errors
14:10:41  <fonsinchen> So, we end up with those two:
14:11:14  <fonsinchen> I've fixed the syntax error and also removed the UpdateWindows from ClientSizeChanged().
14:11:49  <frosch123> yup, i agree with those
14:13:05  <frosch123> no need to credit me, you did more work :)
14:15:56  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26365 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
14:16:36  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26366 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
14:18:42  <frosch123> CmdCompanyCtrl causes the desync
14:19:05  <frosch123> it is called from the gameloop and resets the persistent storage in the cmd-testrun, before the storage is saved
14:20:35  <fonsinchen> Congratulations
14:20:47  <frosch123> bankruptcy could be a reason
14:21:45  <frosch123> i guess we have to rework the ClearPersistentStorageChanges thingie, it's way to intransparent when stuff is reset, and whether it shall be reset or not
14:23:13  <frosch123> is clientid 1 the server?
14:24:31  <frosch123> could also be autoclean
14:25:00  <frosch123> both CompaniesMonthlyLoop and NetworkServerMonthlyLoop are capable of calling DoCommandP with delete-company
14:31:30  <frosch123> ai start is another case
14:41:39  *** Hirundo has quit IRC
14:42:45  *** akuryou has quit IRC
14:43:00  *** Hirundo has joined
14:43:53  *** Rubidium has quit IRC
14:43:55  *** Rubidium has joined
14:56:58  <frosch123> <- some desync debug instructions
15:01:21  <frosch123> <- improved wording and spelling
15:05:01  <fonsinchen> This is a fairly obscure place for that. I'd make that a file in doc/
15:05:21  <fonsinchen> also, you should probably mention how to get to the commands-out.log on the server
15:05:45  <fonsinchen> it's some -d desync=x IIRC
15:05:50  <frosch123> hmm, true
15:06:54  <fonsinchen> Otherwise: very informative. I've been searching for that information multiple times before
15:07:17  <frosch123> well, it's only what i gathered in the last week :p
16:08:52  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26367 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
16:54:09  <frosch123> <- phew... everything i have to tell on the subject
18:45:14  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26368 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
19:10:55  <planetmaker> hm, nice summary for desync debugging, frosch123 :)
19:14:26  <Alberth> line 28 s/deterministically/deterministic/
19:15:06  <Alberth> 29 s/interfering a vehicle/interfering, a vehicle/    add comma
19:17:01  <Alberth> 64 s/is send/is send
19:17:20  <Alberth> 64 s/is send/is sent/      change d -> t
19:19:26  <Alberth> 75 s/movement happens the same/    either /movement is the same/  or /movement happens at the same time/
19:20:02  <Alberth> 77 s/result in the same/have the same result/
19:20:45  <Alberth> 81 s/the right one/the right line/
19:22:53  <Alberth> 127 s/in an undefined order or result/in an undefined order or with an undefined result/
19:26:30  <Alberth> 205 s/appropiately/appropriately/
19:29:10  <Alberth> 217 s/ section 3.3/Section 3.2/   uppercase, point to existing section
19:29:15  <Alberth> nice work !
19:30:53  <frosch123> thanks for review :)
19:32:01  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26369 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
19:32:33  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26370 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
20:13:18  *** Supercheese has joined
20:13:18  *** ChanServ sets mode: +v Supercheese
21:09:28  *** ChanServ sets mode: +v Rubidium
21:09:40  <Rubidium> "the checksum may have matches 10 times or more since..." <- well, we had a desync that happened after about 20 game years once ;)
21:10:50  <frosch123> :p
21:12:54  <Rubidium> nice bit of desync docs though ;)
21:23:47  <frosch123> <- i would hope that fixes the desync
21:24:54  <frosch123> it replaces the ClearPersistentStorageChanges-affects-everything-previously with more defined enter-leave-thingies
21:25:27  <frosch123> it also allows nesting the command-scope inside gameloop- and gui-scope (which is the actual fix)
21:36:32  <Rubidium> would it make sense to test whether any of the three states is 'enabled' in the AreChangesPersistent using an assert?
21:36:51  <Rubidium> or is there a valid case where it's not game loop, command or test run?
21:37:45  <frosch123> yes, in the gui
21:38:03  <frosch123> firs even triggers that case
21:38:50  <frosch123> some sea industry does weird things with persistent storage during drawing
21:39:04  <frosch123> i haven't quite checked what it actually tries to do :p
21:39:41  <frosch123> likely andy uses it for debug purposes because the storage is displayed in the debug window
21:39:43  <frosch123> or something silly :p
21:40:39  *** Alberth has left
21:41:04  <Rubidium> hmm, each state change => clear storages. Is that going to be nice with nesting stuff?
21:41:27  <frosch123> the changes that are cleared are always temporary now
21:41:44  <frosch123> StoreValue no longer does a copy if stuff shall be persistent
21:41:50  <Rubidium> true
21:41:54  <frosch123> so, it does not really matter if you call clearchanges more often
21:43:07  <Rubidium> this looks like a good step forward; can't think of any conceptual issues with it
22:03:10  *** DorpsGek changes topic to "OpenTTD Dev Channel || Latest SVN: r26371 || Logs: || Voice (talk-right) upon request via #openttd; make sure you are registered to NickServ before asking"
22:43:58  *** frosch123 has quit IRC

Powered by YARRSTE version: svn-trunk